Homework 9 - parsing packet traces

In this homework, we use libpcap to analyze packet traces captured with tcpdump. You can use "man pcap" to learn about the pcap API. libpcap gives us one packet at a time, in the order that they originally arrived. It is up to us to process the packets, and try to learn something from them.

Our interest in this homework is to reconstruct the data flowing between hosts, based on tcpdump traces. We will focus on TCP flows, and a key aspect of the homework is the reassembly of TCP packets into the original data. A correct submission will contain the following

  • A Makefile that, given simply the command "make", produces an executable called 'hw9'.
  • The hw9 binary takes two command line arguments: an input file (produced by tcpdump -w), and a directory for output files.
  • Running hw9 produces a table of flows, per (unidirectional) flow, identified by src ip/port, dst ip/port. For each flow, the number of segments and data payload bytes (not counting duplicate packets) in each direction should be listed.
  • Only count flows for which you see a SYN packet: ignore flows that started before the beginning of the dump.
  • Do count flows for which you do not have a FIN, if there are any.
  • In the directory indicated, a file for each flow named as follows: SRCIP.SRCPORT-DSTIP.DSTPORT.log
  • Each of these files should contain all payload data (no IP/TCP headers) sent over each flow. Take care to handle packet duplicates and reordering!

For example, it may say

~> hw9 thetrace thedirectory
SRC IP/PORT       DST IP/PORT       BYTES          PACKETS
a.b.c.d/8484       e.f.g.h/80       12205          2115
e.f.g.h/80      a.b.c.d/8484       3555           223
a.b.c.d/22       e.f.g.i/19495       1205           211
e.f.g.i/19495       a.b.c.d/22      335            32

and thedirectory would contain the files

~> ls thedirectory

a.b.c.d.8484-e.f.g.h.80
e.f.g.h.80-a.b.c.d.8484
a.b.c.d.22-e.f.g.i/19495
e.f.g.i/19495-a.b.c.d.22

An example tcpdump tracefile is included in the hw9 template directory. However, it would be advisable to record your own traces and try your solution on them as well. When doing the final grading, we will use this file as well as another dump containing some tcp flows.

Hints

Read the ip and tcp header structure definitions in /usr/include/netinet/ip.h and tcp.h.

Use tcpdump / wireshark to verify that your code is parsing the packets correctly.

Use lseek to jump to an arbitrary point in a file, even beyond its current size. fopen truncates the file when opening for writing, so open() may be a better idea.

inet_ntoa() is a handy function for printing IP addresses. However, beware: it uses a static char array internally, no memory is allocated for the return value!

Make sure to store the initial sequence number of each flow when it gets established.

Grading

The grading script is at /pub/grading-scripts/hw9/. To run the grading script:

./hw9_score username score_file

This script will create a directory with your username, download hw9 from svn and create score_file inside hw3. The trace file will be exported from the svn and you don't need to commit the trace file with your submission. The grading script redirect stdout of your program to a file named output and check for total flow count, size and packets of individual flow from that file. Correctness of a image file content is verified using md5sum of that image file produced by extract_images.sh

Output of the grading script will look like following:

[SUCCESS] flow count correct? [1]: 1
[SUCCESS] 72.26.203.98.80-192.168.1.100.63052 size correct? [0.75]: 0.75
[SUCCESS] 72.26.203.98.80-192.168.1.100.63052 packet count correct? [0.75]: 0.75
[SUCCESS] 72.26.203.98.80-192.168.1.100.63052 md5 correct? [1]: 1
[SUCCESS] 98.124.60.211/80-192.168.1.100/63176 size correct? [0.75]: 0.75
[SUCCESS] 98.124.60.211/80-192.168.1.100/63176 packet count correct? [0.75]: 0.75
[SUCCESS] 98.124.60.211/80-192.168.1.100/63176 md5 correct? [1]: 1
Total score: 6

The grading script doesn't check for everything exhaustively. However, we may look different things manually and point may be reduced if there are inconsistencies such as following:

  • Images that are not checked by grading script are not valid in terms of size, packet count, md5
  • The program doesn't work for other traces
Topic revision: r4 - 2011-11-11 - 15:46:54 - Main.jakob
 
Copyright 2016 The Board of Trustees
of the University of Illinois.webmaster@cs.uic.edu
WISEST
Helping Women Faculty Advance
Funded by NSF