Homework 0 - due Monday 8/30 at 2 pm.

The main objective of this homework is to get you familiarized with the versioning system we will be using for homework turn-in, called subversion (svn). You will also get a quick introduction to network programming in C, which will be our main language in the course.

Use the credentials you were given during lecture to check out your class directory:

svn checkout svn://bits.cs.uic.edu/cs450f10/students/mynamehere --username mynamehere

a directory with the same name as your username will appear. In this directory, create a directory for homework zero. Call it hw0. Add this directory to the repository through the command:

svn add hw0

and then

svn commit -m "Added my first homework directory!"

to store it permanently in the central repository. Make sure you got it right by deleting the checked out directory (named as your username), and then running

svn checkout svn://bits.cs.uic.edu/cs450f10/students/mynamehere --username mynamehere

again. This time, the checked out directory should already contain your hw0 directory. Any file you want to turn in as your homework submission needs to be added to the correct directory using "svn add". Before the submission deadline, make sure that you have committed the most recent version of your submission directory, by running "svn commit" inside the directory. You can always double-check the status of all your files by running "svn status". You are encouraged to add files and commit new revisions as often as you like, before the submission deadline: only the most recent revision, committed to the repository before the submission deadline, will be used.

The Programming Part!

For this week's programming exercise, we will create a truly barebones web client. Based on the example tcp client code in svn://bits.cs.uic.edu/cs450f10/examples/sockets/, and the http example sessions shown in class, write a command-line program called hw0 that takes a URL as its only parameter, retrieves the indicated file, and stores it in the local directory with the appropriate filename. If the URL does not end in a filename, use 'index.html'. Make sure it works for both text and images by opening the stored file in a web browser. You may assume that the URL is on the form http://host/path, where path may or may not be an empty string, may or may not contain multiple slashes (for subdirectories), and may or may not contain a file name. You may assume files to be no larger than one megabyte, and you are not expected to handle HTTP redirect (3xx) return codes other than report them.

The hostname may be a name like www.google.com, but the example code requires an ip address (like 128.30.87.92). To look up the IP address of a given host name, use gethostbyname() or getaddrinfo(). "man 3 gethostbyname" on the command line will give you the details, or use this link, and see the gethostbyname.c example.

A few hints

Use http version 1.0. Version 1.1 can get a lot more complicated. Good functions to use for handling filenames and text include:

sprintf, sscanf, strstr, strchr

Read more about these using the "man pages". For example, try "man sprintf" on the command line.

NOTE: Newlines in http are represented as "\r\n", not just "\n".

Your program will be tested (at least) on these urls:

http://www.google.com/
http://www.google.com/intl/en_ALL/images/logo.gif
http://www.google.com/thispagedoesnotexist
http://www.thissitedoesnotexist1000.com
http://www.engadget.com/2010/08/27/amazon-kindle-review
http://www.engadget.com/2010/08/27/amazon-kindle-review/

make sure you handle all these cases gracefully. The first should produce a file index.html. The second should produce a logo.gif (containing the picture). The third and fourth should produce nice error messages. Beej's Guide to Network Programming is a great resource you may want to make use of. For the engadget URLs, you need to supply an extra "host:" parameter in the request.

If you're curious, try firing up Wireshark, and then fetching the URL with "wget" or "curl". You'll find the request they sent (which may have a lot of addl. parameters in it) in one of the packets with destination port 80.

Spend some time thinking about how to do the string manipulation. It does not have to be complicated. The complete program, including comments, error handling etc. can be written in about 100 leisurely lines.

Turn-in instructions

For your turn-in, prepare a Makefile that compiles the hw0 target. To make sure your submission is complete, try the following in a temporary directory

svn checkout svn://bits.cs.uic.edu/cs450f10/students/mynamehere/hw0 --username mynamehere 
cd hw0 
make 
./hw0 http://www.google.com/index.html 

this should produce a file called index.html, containing the source for the google front page.

./hw0 http://www.google.com/intl/en_ALL/images/logo.gif 
should produce a file called logo.gif, containing the google logo.

For this homework, there is a prepared skeleton directory that you may use. To import a copy of this skeleton into your directory, first delete any old hw0 directory you may have with

svn rm hw0

svn commit -m "deleted old hw0 directory"

and then copy the skeleton directory from the svn like this

svn cp svn://bits.cs.uic.edu/cs450f10/homeworks/hw0 .

this will automatically copy all the contents of the skeleton directory to your local directory, and add these new files to your own repository.


This topic: CS450fall10 > WebLeftBar > Homeworks > Homework0
Topic revision: r9 - 2010-08-27 - 19:59:38 - Main.jakob
 
Copyright 2016 The Board of Trustees
of the University of Illinois.webmaster@cs.uic.edu
WISEST
Helping Women Faculty Advance
Funded by NSF