Homework 1: Virtualbox, git, and a web client

Accessing your virtual machine

For this class, you will be using a virtual machine. The easiest thing to do is to host this virtual machine on your own computer using VirtualBox. You should run 32 bit Ubuntu 14.04.3 LTS as that is what we will be using for testing the assignments and grading your submissions. If you need help installing VirtualBox or Ubuntu, you can google for something sensible like “VirtualBox ubuntu install” (for instance, this link looks like it might be helpful for installing on a windows PC) and follow those directions, or you can ask on Piazza and other students/the TA/the professor can help you.

git, class repositories (repos)

The main objective of this homework is to get you familiarized with the versioning system we will be using for homework turn-in, called git. Git is a decentralized revision control system. You will also get a quick introduction to network programming in C, which will be our main language in the course.

Using your private key, check out the public course repository:

git clone git@git.uicbits.net:cs450-f15/public.git

Once you’ve cloned the public repo, you will have a directory public filled with useful files for all students. Some of those files will allow you to use IP version 6 on your virtual machine. You will first need to add some packages and prepare your VM for connecting to our IPv6 VPN. To prepare your new ubuntu install for class, run these commands from within the public repo you just cloned:

cd public
cd utils
tar xzf bitsvpn.tgz
cd bitsvpn
sudo ./install.sh

If all goes well, your virtual machine is now on the IPv6 Internet. You can verify this by trying the command:

$ ping6 ipv6.google.com

If that ping command works, you’re in business! Now you’re ready for the fun part.

Refer to the discussion site if you have further questions.

your personal repository

You can also check out your personal repository:

git clone git@git.uicbits.net:cs450-f15/YOURUSERNAME.git

You will want to copy the skeleton code from the public direcotry to your new YOURUSERNAME directory. Here, my username is ckanich-student. When you first clone the repository, it will be empty. Your first task is to copy the hw1 skeleton code to your repository. If you’ve done that correctly, the tree program will print a graphical (ascii art) representation of the directories and files within the current directory:

ubuntu@ip-10-143-165-210:~$ cd ckanich-student
ubuntu@ip-10-143-165-210:~/ckanich-student$ tree
.
└── hw1
    ├── hw1.c
    ├── Makefile
    └── SUBMISSION_COMMENTS.txt

1 directory, 3 files

Now, just to test things out, let’s add these files to the repository, commit them, and send the changes to the central server.

ubuntu@ip-10-143-165-210:~/ckanich-student$ git add hw1
ubuntu@ip-10-143-165-210:~/ckanich-student$ git commit -a -m'added hw1 skeleton'
[master e66c87f] added hw1 skeleton
 3 files changed, 77 insertions(+)
 create mode 100644 hw1/Makefile
 create mode 100644 hw1/SUBMISSION_COMMENTS.txt
 create mode 100644 hw1/hw1.c
ubuntu@ip-10-143-165-210:~/ckanich-student$ git push
Counting objects: 7, done.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (6/6), 1.13 KiB, done.
Total 6 (delta 0), reused 0 (delta 0)
To git@git.uicbits.net:cs450-f15/ckanich-student.git
   4781aea..e66c87f  master -> master
ubuntu@ip-10-143-165-210:~/ckanich-student$

Remember, if you don’t push, we don’t see it! We will be making copies of the repositories at each deadline (full credit, 10% off, etc), so do not leave this for the last minute. If you submit it one second too late, you’ll be in the next lateness bracket, no exceptions.

The Programming Part!

For this week’s programming exercise, we will create a barebones web client (think wget). Based on the example tcp client code in the hw1 in the public repository, and the HTTP example sessions shown in class, you will write a command-line program called hw1 that takes a URL as its only parameter, retrieves the indicated file, and stores it in the local directory with the appropriate filename. If the URL does not end in a filename, your program should automatically request the file ‘index.html’. Make sure it works for both text and images by opening the stored file in a web browser.

You may assume that the URL is on the form http://host/path, where path may or may not be an empty string, may or may not contain multiple slashes (for subdirectories), and may or may not contain a file name. You may assume files to be no larger than one megabyte, and you are not expected to follow any HTTP redirect (3xx); your code can simply exit without saving any file.

The hostname may be a name like www.google.com, but the example code requires an ip address (like 64:ff9b::83c1:201d). To look up the IP address of a given host name, use getaddrinfo(). man 3 getaddrinfo on the command line will give you the details, or use this link, and see the getaddrinfo.c example.

Template

For this homework, there is a prepared skeleton directory that you may use located in the public git repository.

A few hints

  • Use http version 1.0. Version 1.1 can get a lot more complicated.
  • Good functions to use for handling filenames and text include sprintf, sscanf, strstr, strchr.
  • Section 2.2.2-2.2.3 in the book should also be helpful. Your book talks about the “request line” and “header lines” for an HTTP request. You will only need to use the request line and the host line of the header.

Read more about these using the man pages. For example, try man sprintf on the command line.

  • A “newline” in http consists of two ascii characters: \r\n, not just \n.

  • Your program will be tested (at least) on these urls:

http://www.google.com/
http://www.google.com/intl/en_ALL/images/logo.gif
http://www.google.com/thispagedoesnotexist
http://www.thissitedoesnotexist1776.com 
http://www.engadget.com/2010/08/27/amazon-kindle-review
http://www.engadget.com/2010/08/27/amazon-kindle-review/
  • Make sure you handle all these cases gracefully. If you don’t send a host
  • header, the first should produce a file index.html, the second the google logo saved into the file logo.gif, and the rest should produce an error because they do not give you a 200 OK response; you can thus quit your program by calling exit(1). Beej’s Guide to Network Programming is a great resource you may want to make use of.

  • If you’re curious, try firing up Wireshark, and then fetching the URL with wget or curl. You’ll find the request they sent (which may have a lot of additional parameters in it) in one of the packets with destination port 80.

  • Spend some time thinking about how to do the string manipulation. It does not have to be complicated. The complete program, including comments, error handling etc. can be written in about 100 leisurely lines.

Turn-in instructions

For your turn-in, prepare a Makefile that compiles the hw1 target. To make sure your submission is complete, try the following in a temporary directory, i.e. create and change into a temporary directory under the /tmp filesystem:

mkdir /tmp/hw1-temp
cd /tmp/hw1-temp
git clone git@git.uicbits.net:cs450-f15/MYUSERNAME.git
cd hw1
make
./hw1 http://www.google.com/index.html 

This process should produce a file called index.html in the current working directory, containing the source for the google front page.

./hw1 http://www.google.com/intl/en_ALL/images/logo.gif 

Running this line should produce a file called logo.gif, containing the google logo.

Grading

Grading will be done automatically using a script. We will publish this script after grading has completed; you are responsible for writing your own test cases. If you wish, you can share test cases you have written with the class. Students who share test cases publicly will very likely receive extra participation credit.

Due Date

This assignment is due Wednesday, September 2nd, at 3pm. See the syllabus for the late turnin policy. This assignment is worth just as much as every other homework, so getting as much credit on it as possible is important (don’t turn in late!).