Homework 1: AWS, git, and a web client

Accessing your virtual machine

For this class, you will be using a virtual machine hosted by Amazon Web Services. This virtual machine is running Ubuntu 12.04. You have received instructions on how to access this virtual machine via email.

git, class repositories (repos)

The main objective of this homework is to get you familiarized with the versioning system we will be using for homework turn-in, called git. Git is a decentralized revision control system. You will also get a quick introduction to network programming in C, which will be our main language in the course.

Using your private key, check out the public course repository:

git clone git@git.uicbits.net:cs450-f13/public.git

Once you’ve cloned the public repo, you will need to add some packages and prepare your VM for connecting to our IPv6 VPN. To prepare your new ubuntu install for class, run these commands from within the public repo you just cloned:

cd utils
tar xzf bitsvpn.tgz
cd bitsvpn
sudo ./install.sh

If all goes well, your virtual machine is now on the IPv6 Internet. You can verify this by trying the command:

$ ping6 ipv6.google.com

If that ping command works, you’re in business! Now you’re ready for the fun part.

Refer to the discussion site if you have further questions.

your personal repository

You can also check out your personal repository:

git clone git@git.uicbits.net:cs450-f13/YOURUSERNAME.git

You will want to copy the skeleton code from the public direcotry to your new YOURUSERNAME directory:

ubuntu@ip-10-143-165-210:~/ckanich-student$ tree
.
└── hw1
    ├── hw1.c
    ├── Makefile
    └── SUBMISSION_COMMENTS.txt

1 directory, 3 files

Now, just to test things out, let’s add these files to the repository, commit them, and send the changes to the central server.

ubuntu@ip-10-143-165-210:~/ckanich-student$ git add hw1
ubuntu@ip-10-143-165-210:~/ckanich-student$ git commit -a -m'added hw1 skeleton'
[master e66c87f] added hw1 skeleton
 3 files changed, 77 insertions(+)
 create mode 100644 hw1/Makefile
 create mode 100644 hw1/SUBMISSION_COMMENTS.txt
 create mode 100644 hw1/hw1.c
ubuntu@ip-10-143-165-210:~/ckanich-student$ git push
Counting objects: 7, done.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (6/6), 1.13 KiB, done.
Total 6 (delta 0), reused 0 (delta 0)
To git@git.uicbits.net:cs450-f13/ckanich-student.git
   4781aea..e66c87f  master -> master
ubuntu@ip-10-143-165-210:~/ckanich-student$

Remember, if you don’t push, we don’t see it! We will be making copies of the repositories at each deadline (full credit, 10% off, etc), so do not leave this for the last minute.

The Programming Part!

For this week’s programming exercise, we will create a truly barebones ipv6 web client. Based on the example tcp client code in examples/sockets in the public git repo, and the http example sessions shown in class, you will write a command-line program called hw1 that takes a URL as its only parameter, retrieves the indicated file, and stores it in the local directory with the appropriate filename. If the URL does not end in a filename, use ‘index.html’. Make sure it works for both text and images by opening the stored file in a web browser. You may assume that the URL is on the form http://host/path, where path may or may not be an empty string, may or may not contain multiple slashes (for subdirectories), and may or may not contain a file name. You may assume files to be no larger than one megabyte, and you are not expected to handle HTTP redirect (3xx) return codes other than report them.

The hostname may be a name like www.google.com, but the example code requires an ip address (like 64:ff9b::83c1:201d). To look up the IP address of a given host name, use getaddrinfo(). man 3 getaddrinfo on the command line will give you the details, or use this link, and see the getaddrinfo.c example.

Template

For this homework, there is a prepared skeleton directory that you may use located in the public git repository.

A few hints

Use http version 1.0. Version 1.1 can get a lot more complicated. Good functions to use for handling filenames and text include sprintf, sscanf, strstr, strchr.

Read more about these using the man pages. For example, try man sprintf on the command line.

NOTE: Newlines in http are represented as \r\n, not just \n.

Your program will be tested (at least) on these urls:

http://www.google.com
http://c.skype.com/i/images/logos/skype_logo.png
http://images.google.com/intl/en_ALL/images/logos/images_logo_lg.gif
http://pdos.csail.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf
http://cs450.uicbits.net/cs450/cs450.html
http://131.193.34.207/cs450/ipv4.html
http://www.google.com/thispagedoesnotexist
http://www.thissitedoesnotexist1000.com
http://www.skype.com
http://cs450.uicbits.net/cs450/
http://www.adorama.com/alc/0012691/article/Lenses-Product-Reviews-AdoramaTV

make sure you handle all these cases gracefully. The first should produce a file index.html. The second should produce an error because it does not give you a 200 OK response, i.e. exit(1). The third and fourth should save that named file to the current directory. Beej’s Guide to Network Programming is a great resource you may want to make use of.

If you’re curious, try firing up Wireshark, and then fetching the URL with wget or curl. You’ll find the request they sent (which may have a lot of additional parameters in it) in one of the packets with destination port 80.

Spend some time thinking about how to do the string manipulation. It does not have to be complicated. The complete program, including comments, error handling etc. can be written in about 100 leisurely lines.

Turn-in instructions

For your turn-in, prepare a Makefile that compiles the hw1 target. To make sure your submission is complete, try the following in a temporary directory

git clone git@git.uicbits.net:cs450-f13/MYUSERNAME.git
cd hw1
make
./hw1 http://www.google.com/index.html 

this should produce a file called index.html in the current working directory, containing the source for the google front page.

./hw1 http://www.google.com/intl/en_ALL/images/logo.gif 
should produce a file called logo.gif, containing the google logo.

Grading

Grading will be done automatically using a script. We will publish this script after grading has completed; you are responsible for writing your own test cases. If you wish, you can share test cases you have written with the class.