CS 340 - Software Design

Spring 2006


MP 5 - File Compression

Due: Monday April 10, 2006 at 11:59 pm

For this assignment, you are to use the Huffman Coding algorithm to create a file compression/decompression utility. We will use a static Huffman Tree, based on the HTML file const.html which is from the web site http://www.usconstitution.net/. While this file is large, it does not use all of the ASCII characters (for example: capital Z does not appear in the file). Since we are using a static Huffman Tree (we will encode all files using the same tree), we need to make sure that all possible characters are included. For more information check out the Wikipedia page on Huffman Coding or http://www.data-compression.com/lossless.shtml.

Your project is to preform three tasks:

  1. Create the Huffman Tree information
  2. Use the information from part 1 to compress a file
  3. Use the information from part 1 to uncompress a file.

The program to create the Huffman tree information should take the name of a file as a command line argument. The ASCII characters from this file will be used do determine the frequency of the characters as needed when building the Huffman tree. This program is to write out to a file (or files) the information that will be used by the compress program and the uncompress program. The names of the output file(s) will be hardcorded in your program and you may assume that thier names will not comflict with other files.

THe program to compress a file will take the name of the file that is it to compress as a command line argument. The program will create a file that contains the compressed information. This new file's name will add a ".H" to the filename of original file. The information needed to compress the file will be read from the file written by first program. The filename of the file containing this information will be hardcorded in your program.

The program to uncompress a file will take the name of the file to be uncompress as a command line argument. The name of the file is to end with a ".H". The resulting file should use the name of the compressed file with the .H ending. The information needed to compress the file will be read from the file written by first program. The filename of the file containing this information will be hardcorded in your program.

Submission of the Program

Your program is to be submitted electronically via the turnin command on the LINUX machines. The project name for this is mp5. All programs are expected to be written in good programming style.

Turnin your program electronically using the "turnin" command from your CS account as follows:

turnin -c cs340 -p mp5  [your project directory]
where the [your project directory] is the directory name under which you have all your files related to this programming problem. The turnin command will automatically compress the data under your directory, so there is no need to do the compression by yourself.

Notice you can only invoke turnin command on the Linux machines in the lab or after logging into the server machine oscar.cs.uic.edu.

If you want to verify that your project was turned in, look in the turnin directory for a file with your userid. For instance for this project, from your CS account you would type:

turnin -c cs340 -p mp5 -v

Note that you can execute turnin as many times as you would like, up until the program deadline when turnin will be disabled for this project. Each time you execute turnin for a project, you overwrite all of what you had turned in previously for that project. It does not work in an incremental way.

CS 340 - Fall 2005