Homework 1: Virtualbox, git, and a web client
Accessing your virtual machine
For this class, you will be using a virtual machine. The easiest thing to do is to host this virtual machine on your own computer using VirtualBox. You should run 32 bit Ubuntu 14.04.3 LTS as that is what we will be using for testing the assignments and grading your submissions. If you need help installing VirtualBox or Ubuntu, you can google for something sensible like “VirtualBox ubuntu install” (for instance, this link looks like it might be helpful for installing on a windows PC) and follow those directions, or you can ask on Piazza and other students/the TA/the professor can help you.
git, class repositories (repos)
The main objective of this homework is to get you familiarized with the versioning system we will be using for homework turn-in, called git. Git is a decentralized revision control system. You will also get a quick introduction to network programming in C, which will be our main language in the course.
Using your private key, check out the public course repository:
git clone git@git.uicbits.net:cs450-f15/public.git
Once you’ve cloned the public repo, you will have a directory
public
filled with useful files for all students. Some of those
files will allow you to use IP version 6 on your virtual machine. You will
first need to add some packages and prepare your VM for connecting to our IPv6
VPN. To prepare your new ubuntu install for class, run these commands from
within the public repo you just cloned:
cd public cd utils tar xzf bitsvpn.tgz cd bitsvpn sudo ./install.sh
If all goes well, your virtual machine is now on the IPv6 Internet. You can verify this by trying the command:
$ ping6 ipv6.google.com
If that ping command works, you’re in business! Now you’re ready for the fun part.
Refer to the discussion site if you have further questions.
your personal repository
You can also check out your personal repository:
git clone git@git.uicbits.net:cs450-f15/YOURUSERNAME.git
You will want to copy the skeleton code from the public direcotry to your new
YOURUSERNAME
directory. Here, my username is
ckanich-student
. When you first clone the repository, it will be
empty. Your first task is to copy the hw1
skeleton code to your
repository. If you’ve done that correctly, the tree
program will
print a graphical (ascii art) representation of the directories and files
within the current directory:
ubuntu@ip-10-143-165-210:~$ cd ckanich-student ubuntu@ip-10-143-165-210:~/ckanich-student$ tree . └── hw1 ├── hw1.c ├── Makefile └── SUBMISSION_COMMENTS.txt 1 directory, 3 files
Now, just to test things out, let’s add these files to the repository, commit them, and send the changes to the central server.
ubuntu@ip-10-143-165-210:~/ckanich-student$ git add hw1 ubuntu@ip-10-143-165-210:~/ckanich-student$ git commit -a -m'added hw1 skeleton' [master e66c87f] added hw1 skeleton 3 files changed, 77 insertions(+) create mode 100644 hw1/Makefile create mode 100644 hw1/SUBMISSION_COMMENTS.txt create mode 100644 hw1/hw1.c ubuntu@ip-10-143-165-210:~/ckanich-student$ git push Counting objects: 7, done. Compressing objects: 100% (4/4), done. Writing objects: 100% (6/6), 1.13 KiB, done. Total 6 (delta 0), reused 0 (delta 0) To git@git.uicbits.net:cs450-f15/ckanich-student.git 4781aea..e66c87f master -> master ubuntu@ip-10-143-165-210:~/ckanich-student$
Remember, if you don’t push, we don’t see it! We will be making copies of the repositories at each deadline (full credit, 10% off, etc), so do not leave this for the last minute. If you submit it one second too late, you’ll be in the next lateness bracket, no exceptions.
The Programming Part!
For this week’s programming exercise, we will create a barebones web client
(think wget
). Based on the example tcp client code in the
hw1
in the public repository, and the HTTP example sessions shown
in class, you will write a command-line program called hw1
that
takes a URL as its only parameter, retrieves the indicated file, and stores it
in the local directory with the appropriate filename. If the URL does not end
in a filename, your program should automatically request the file ‘index.html’.
Make sure it works for both text and images by opening the stored file in a web
browser.
You may assume that the URL is on the form http://host/path, where path may or may not be an empty string, may or may not contain multiple slashes (for subdirectories), and may or may not contain a file name. You may assume files to be no larger than one megabyte, and you are not expected to follow any HTTP redirect (3xx); your code can simply exit without saving any file.
The hostname may be a name like www.google.com, but the example code requires
an ip address (like 64:ff9b::83c1:201d). To look up the IP address of a given
host name, use getaddrinfo()
. man 3 getaddrinfo
on the command line will give
you the details, or use this link, and see the
getaddrinfo.c
example.
Template
For this homework, there is a prepared skeleton directory that you may use located in the public git repository.
A few hints
- Use http version 1.0. Version 1.1 can get a lot more complicated.
- Good functions to use for handling filenames and text include
sprintf, sscanf, strstr, strchr
. - Section 2.2.2-2.2.3 in the book should also be helpful. Your book talks about the “request line” and “header lines” for an HTTP request. You will only need to use the request line and the host line of the header.
Read more about these using the man pages. For example, try man sprintf
on
the command line.
-
A “newline” in http consists of two ascii characters:
\r\n
, not just\n
. -
Your program will be tested (at least) on these urls:
http://www.google.com/ http://www.google.com/intl/en_ALL/images/logo.gif http://www.google.com/thispagedoesnotexist http://www.thissitedoesnotexist1776.com http://www.engadget.com/2010/08/27/amazon-kindle-review http://www.engadget.com/2010/08/27/amazon-kindle-review/
- Make sure you handle all these cases gracefully. If you don’t send a host
-
header, the first should produce a file
index.html
, the second the google logo saved into the filelogo.gif
, and the rest should produce an error because they do not give you a200 OK
response; you can thus quit your program by callingexit(1)
. Beej’s Guide to Network Programming is a great resource you may want to make use of. -
If you’re curious, try firing up Wireshark, and then fetching the URL with
wget
orcurl
. You’ll find the request they sent (which may have a lot of additional parameters in it) in one of the packets with destination port 80. - Spend some time thinking about how to do the string manipulation. It does not have to be complicated. The complete program, including comments, error handling etc. can be written in about 100 leisurely lines.
Turn-in instructions
For your turn-in, prepare a Makefile that compiles the hw1 target. To make sure your submission is complete, try the following in a temporary directory, i.e. create and change into a temporary directory under the /tmp
filesystem:
mkdir /tmp/hw1-temp cd /tmp/hw1-temp git clone git@git.uicbits.net:cs450-f15/MYUSERNAME.git cd hw1 make ./hw1 http://www.google.com/index.html
This process should produce a file called index.html in the current working directory, containing the source for the google front page.
./hw1 http://www.google.com/intl/en_ALL/images/logo.gif
Running this line should produce a file called logo.gif, containing the google logo.
Grading
Grading will be done automatically using a script. We will publish this script after grading has completed; you are responsible for writing your own test cases. If you wish, you can share test cases you have written with the class. Students who share test cases publicly will very likely receive extra participation credit.
Due Date
This assignment is due Wednesday, September 2nd, at 3pm. See the syllabus for the late turnin policy. This assignment is worth just as much as every other homework, so getting as much credit on it as possible is important (don’t turn in late!).