Looking Out for Number One: CS201 Programming Assignment 1
The natural world is full of hidden and beautiful mathematics. The
whorls of a conch shell hide the Fibonacci sequence and its Golden
Ratio, plants grow in fractal patterns, and comets trace hyperbolic
patterns through the solar system. All those beautiful patterns hide
in the grungy data of human observation.
So, what are the populations of every town in Illinois and the daily number of
people requesting a new web account at a particular popular web site
hiding from you?
Assignment
Due date: Wed, Jan 24th, 2007 10:00 PM.
Your task is to write a program that determines the distribution of a specified digit in a given set of data.
In other words, your program takes as input a number n followed by a list of numbers, and output a list of 10 values:
the values represent the frequency with which each digit, 0-9 appears as the nth digit of the list of numbers.
The numbering of the digit starts from 0. I.e, for the number 1234 the 0th digit is 1, the 1st digit is 2, etc. Also, print the
percentage of the frequency over the total number of numbers in the given list. This program has to be implemented in Java.
You are required to write a class NumberOne whose main method opens the input file.
The input to the program is taken from a file whose name is provided as
a command line argument. The output is written to the standard output. The output should appear in exactly the same
format as shown in the example.
Note:
- The last column must be printed exactly to one decimal place.
- The width of each column must be the same.
- You Do NOT have to handle any errors in the format or content of the input file.
-
However, you HAVE to handle a bad file name given in the command line argument. You can print a helpful error
message and exit the program in this case.
- Here is an example input file which you can use to test your program.
Its data is the number of accounts requested per day (on days over several years) at the web site LiveJournal.com
Once you're done writing your program, check out the distribution of initial digits in that data.
Example
The format of the input file and the output should be as shown in the small example below.
Input
0
12176
5476
543
3490
24892
28619
2595
603
2527
1465
1236
Output
0s: | | 0 | | 0.0 |
1s: | | 3 | | 27.3 |
2s: | | 4 | | 36.3 |
3s: | | 1 | | 9.0 |
4s: | | 0 | | 0.0 |
5s: | | 2 | | 18.2 |
6s: | | 1 | | 9.0 |
7s: | | 0 | | 0.0 |
8s: | | 0 | | 0.0 |
9s: | | 0 | | 0.0 |
Explanation: The first line of the input is the digit whose frequency is to be obtained. For the above example,
the frequency of digits in the 0th position of the list of the numbers is required. The 0th digit for the number 12176 is 1.
Line 2 onwards is the list of numbers to be used in your calculation (one number per line). The output consists of frequency of the digit and the percentage of the frequency.
I.e, "1s: 3 27.3" indicate that 2 numbers have 1s in the 0th position. This frequency is 27.3% of the total number (3 out of 11 numbers in the input.)
(Really the percentage data should line up perfectly right justified; I know how to do that in Java using printf
but I don't remember exactly how to do that in HTML.)
Tips to write the program
-
- An easy way to write the program is to divide into three methods.
- countDigits: To calculates the number of digits in the given integer.
countDigits() should evaluate to 1 for 0-9, 2 for 10-99, 3 for 100-999, etc.
- nthDigit:nthDigitBack(n, num) finds the nth lowest order digit in num, i.e., the nth digit from the right. The rightmost digit is taken to be the 0th digit
- nthDigitBack:nthDigit(n, num) finds the nth highest order digit of num, i.e., the nth digit from the left. The leftmost digit is taken to be the 0th.
- The above method is only a suggestion. But it is not required to follow that.
- Please provide appropriate comments to your code.
What to submit?
Submit your Java program (NumberOne.java) along with any other files that are required to run your program using the turnin command before Jan 24th 10:00PM.
turnin -c cs201 -p program1 [your project directory]
If you have any problems submitting using the turnin command, please email your files to stata@cs.uic.edu. Please
name your files appropriately and please do not forget to include your name along with your submission.
Optional extra credit
Use your program to explore the distribution of 1st and 2nd digits in some real data set that you find and prepare as input.
Optional: If you want to find the patterns hidden in
the numbers around you, try the following three-part bonus problem:
- Find a data source on the web that no one else has used
(see next part) and transform it into a format suitable for input to your program.
The data must all be separate
measurements of a single type of phenomenon. For example:
measurements of university/college enrollments across different
institutions, or at the same institution across different
years; measurements of the flow rates of all the major rivers in Illinois, measurements of the number of sunspots per month;
measurements of the height of 10000 randomly chosen
Chicago residents; measurements of the number of hits per day on the
UIC computer science web site over three years; measurements of the
length in characters of each article in the Wikipedia; measurements of
the population of the 1000 largest cities and townships in the U.S.;
etc. Furthermore, there must be at least 250 measurements in
the list (but more would be better!).
- Post all of the following items to Blackboard (If they ever give me the site I'll create a course newsgroup there!) with the
title "Number One Data": the URL for your data source, a
description of the data source and an attachment with bare data suitable
for input to the program.
- Submit with your assignment the URL of your data, a description of
the data source, and digit tallies for digit 1 and digit 2 of your
data (using your program). Are there any oddities in
the tallies? What about in other students' data?
Acknowledgment
This Assignment is based on a Nifty Assignment of Steve Wolfman.