TWiki> CS361fall13 Web>Homework7 (2013-10-25, Main.jakob)EditAttach

Homework 7: A Basic Web Proxy

In this homework, we create an HTTP proxy. HTTP proxies sit between a browser and a web server - they receive the browser's request, and perform it on behalf of the browser. In the standard setup, the browser is configured to use a certain proxy - this is what we'll do here as well.

HTTP proxies have several uses, such as caching, compression, and anonymization. A caching web proxy stores previously loaded contents, and sends the browser a cached copy rather than requesting the document again from the server. A compression proxy typically requests the document from the server, but makes some transparent modifications to the page to improve page load times on bandwidth-limited devices like mobile phones, including downsizing and or re-compressing images.

Finally, an anonymizing proxy receives a request from a browser, performs the request on behalf of the browser after removing or replacing identifying information, and delivers the response back to the user. This is a particularly important service in countries where censorship blocks access to certain websites, but has its elsewhere as well.

Setup

Most of our experiments will be done using wget on the command line. wget is a command line http client, which simply fetches the document pointed to by a supplied URL. For example, =wget http://www.google.com= fetches Google's front page, and stores it in a local file.

To configure wget to use a proxy, simply set the environment variable http_proxy to point to your proxy, like this export http_proxy=http://localhost:8080, for example. This will redirect all wget requests to port 8080 on the local machine, which is then responsible for servicing the request however it sees fit. Try it out by starting an nc server listening on that port and then issuing a request with wget. You should see the HTTP request on the nc console. You can also type in a response on the nc console and hit ctrl-d if you'd like.

For browsers, proxy settings differ between browsers and OS'es. Look up the http proxy settings, and point it at localhost port 8080 to get the same effect in a browser. In OS X, I use Firefox, which allows me to use a "manual" proxy setting - there's something wrong with the system proxy settings on my machine.

Note, however, that browsers have fairly specific and complex requirements on proxy operations. In particular, browsers expect persistent connections to their proxies, which complicates things. For this homework, we focus on evaluation with wget.

Assignment Details

Your job for this assignment is to create a basic HTTP proxy, with some rudimentary anonymization services. One of the more challenging tasks is that the proxy needs to work well even if the server on the other end is being difficult.

  1. Handle a basic HTTP request with wget, forwarding the request and returning the full server response to the client.
  2. Handle server failures: in cases where the server cannot be reached, return proper error pages rather than crashing, hanging, or otherwise misbehaving.
  3. Handle concurrent HTTP sessions. While one (slow) HTTP request is being served, promptly serve several faster requests.
  4. Anonymization: remove any Cookie headers, and replace the User-Agent with something non-identifying.

Test Cases

This set of test cases will grow over the coming days.

1. Basic

wget http://www.google.com

2. Server failures

wget http://hostdoesnotexist.cs.uic.edu/

wget http://words.cs.uic.edu/doesnothaveawebserver/

3. Start a bogus web server on port 9090:

nc -l 9090

wget http://localhost:9090/ &
sleep 1;
time wget http://www.google.com;

This should report a very short time to fetch google.com, not block awaiting the response from the bogus server.

4. Anonymization (sort of):

Start a packet capture with Wireshark or tcpdump, then fetch http://www.google.com with a browser configured to use the proxy. There should be no Cookie: headers in the request going to google, and the User-Agent should contain the word CS361.

</verbatim>

Turn-in Instructions

Create and add a directory called hw7 in your turn-in folder, in which you place a file called hw7.c. Add a Makefile that compiles hw7.c into an executable called hw7.

Topic revision: r5 - 2013-10-25 - 23:45:41 - Main.jakob
 
Copyright 2016 The Board of Trustees
of the University of Illinois.webmaster@cs.uic.edu
WISEST
Helping Women Faculty Advance
Funded by NSF