Looking Out for Number One: CS201 Programming Assignment 1

The natural world is full of hidden and beautiful mathematics. The whorls of a conch shell hide the Fibonacci sequence and its Golden Ratio, plants grow in fractal patterns, and comets trace hyperbolic patterns through the solar system. All those beautiful patterns hide in the grungy data of human observation.

So, what are the populations of every town in Illinois and the daily number of people requesting a new web account at a particular popular web site hiding from you?

Assignment

Due date: Wed, Jan 24th, 2007 10:00 PM.

Your task is to write a program that determines the distribution of a specified digit in a given set of data. In other words, your program takes as input a number n followed by a list of numbers, and output a list of 10 values: the values represent the frequency with which each digit, 0-9 appears as the nth digit of the list of numbers. The numbering of the digit starts from 0. I.e, for the number 1234 the 0th digit is 1, the 1st digit is 2, etc. Also, print the percentage of the frequency over the total number of numbers in the given list. This program has to be implemented in Java. You are required to write a class NumberOne whose main method opens the input file. The input to the program is taken from a file whose name is provided as a command line argument. The output is written to the standard output. The output should appear in exactly the same format as shown in the example.

Note:

Its data is the number of accounts requested per day (on days over several years) at the web site LiveJournal.com Once you're done writing your program, check out the distribution of initial digits in that data.

Example


The format of the input file and the output should be as shown in the small example below. Input
0
12176
5476
543
3490
24892
28619
2595
603
2527
1465
1236

Output
0s:   0    0.0
1s:   3   27.3
2s:   4   36.3
3s:   1    9.0
4s:   0    0.0
5s:   2   18.2
6s:   1    9.0
7s:   0    0.0
8s:   0    0.0
9s:   0    0.0
Explanation: The first line of the input is the digit whose frequency is to be obtained. For the above example, the frequency of digits in the 0th position of the list of the numbers is required. The 0th digit for the number 12176 is 1. Line 2 onwards is the list of numbers to be used in your calculation (one number per line). The output consists of frequency of the digit and the percentage of the frequency. I.e, "1s: 3 27.3" indicate that 2 numbers have 1s in the 0th position. This frequency is 27.3% of the total number (3 out of 11 numbers in the input.)

(Really the percentage data should line up perfectly right justified; I know how to do that in Java using printf but I don't remember exactly how to do that in HTML.)

Tips to write the program

What to submit?

Submit your Java program (NumberOne.java) along with any other files that are required to run your program using the turnin command before Jan 24th 10:00PM.

turnin -c cs201 -p program1 [your project directory]

If you have any problems submitting using the turnin command, please email your files to stata@cs.uic.edu. Please name your files appropriately and please do not forget to include your name along with your submission.

Optional extra credit

Use your program to explore the distribution of 1st and 2nd digits in some real data set that you find and prepare as input. Optional: If you want to find the patterns hidden in the numbers around you, try the following three-part bonus problem:
  1. Find a data source on the web that no one else has used (see next part) and transform it into a format suitable for input to your program. The data must all be separate measurements of a single type of phenomenon. For example: measurements of university/college enrollments across different institutions, or at the same institution across different years; measurements of the flow rates of all the major rivers in Illinois, measurements of the number of sunspots per month; measurements of the height of 10000 randomly chosen Chicago residents; measurements of the number of hits per day on the UIC computer science web site over three years; measurements of the length in characters of each article in the Wikipedia; measurements of the population of the 1000 largest cities and townships in the U.S.; etc. Furthermore, there must be at least 250 measurements in the list (but more would be better!).

  2. Post all of the following items to Blackboard (If they ever give me the site I'll create a course newsgroup there!) with the title "Number One Data": the URL for your data source, a description of the data source and an attachment with bare data suitable for input to the program.

  3. Submit with your assignment the URL of your data, a description of the data source, and digit tallies for digit 1 and digit 2 of your data (using your program). Are there any oddities in the tallies? What about in other students' data?

Acknowledgment

This Assignment is based on a Nifty Assignment of Steve Wolfman.