Homework 8 - who are the big fish on the Internet?

In this homework, we again use PlanetLab, but this time we look in more detail at the routes taken between hosts.

Using the command `traceroute', you are to collect the routes taken between each pair of hosts (using the same hosts as in homework 7). We want to find out some statistics about various networks that make up the Internet. These networks are called Autonomous Systems, or AS:es, and each has its own AS number (ASN).

You can find out the ASN that controls a given IP address using the command 'whois'. I've found that 'whois -H whois.cymru.com' gives reasonably reliable ASN results, certainly much better than just 'whois'.

For large sets of lookups, CYMRU provides a much faster 'bulk' service. Here is an example of how you use it. Make a file with all the IPs you want to look up, with an additional 'begin' line at the top, and 'end' line at the bottom:

begin
119.136.12.130
119.136.12.133
119.136.12.134
119.145.47.1
119.145.47.10
end

say we call the file 'bulkwhois'. Then use netcat to do the bulk lookup, thus:

nc whois.cymru.com 43 < bulkwhois

A lookup of 5000+ IPs took me about 3 seconds!

Submission requirements

We are interested in learning about the organizational structure of the Internet, in terms of autonomous systems. Submit a Makefile (and any necessary scripts) that collects traceroute data for each pair of hosts when 'make collect' is called. I would expect it to be 10000-20000 pairs. The exact number is not important, but it needs to be several thousand pairs.

Once the data is collected, we need to process it to learn the following statistics:

  1. What 30 AS:es are the most frequently occurring ASNs between our hosts? Take care not to double-count ASNs when they appear more than once in a route.
  2. What 30 AS:es have the largest `degree' or 'number of neighboring ASNs'.
  3. What 30 AS:es have the largest number of observed hosts?
  4. What AS:es occur in all three lists above? Try googling for "ASxxxxx" to find out what networks these are!
Submit your code together with all files generated by the post-processing step (not the initial logs). Your makefile should support the following commands:
  • make clean - remove all logs and temporary files
  • make collect - collect new traceroute measurements
  • make postprocess - do all time-consuming post-processing here
  • make report - generate a final report on stdout
'make report' should generate as output the 3 lists above, with descriptive titles. Submit an example of your output from 'make report' in a file called REPORT.txt.

Hints

Sometimes traceroute does not finish due to nonresponsive hosts (asterisks). If you limit the max TTL to 25 hops, and the wait time to 2 seconds, it'll finish faster.

Use the whois.cymru.com whois database. If the ASN of a given IP does not exist in this databasee (it happens), you can ignore it.

'head -30' gives you the top 30 lines of a file or stdin.

traceroute -n gives you the IP addresses instead of hostnames (and it runs faster)

to run a shell command in awk, and get the results, use getline:

echo . | awk '{ "du -s "$0 | getline wcresult; close("wc "$0); print wcresult}'

don't forget to close the "file" as shown, you'll have a ton of file descriptors open at once.

to concatenate strings in awk, put them next to each other: "one"$1"two"

if you want to use awk in a Makefile, be careful about your dollar signs. Make will replace any $X with the value of X, so double all your awk $ into $$.

if you start lots of processes with & from inside a script, you can wait for all of them to finish with "wait".

traceroute -z 500 makes traceroute be a little more gentle with its transmissions


This topic: CS450 > Homework8PL
Topic revision: r2 - 2013-01-03 - 00:11:35 - Main.ckanich
 
Copyright 2016 The Board of Trustees
of the University of Illinois.webmaster@cs.uic.edu
WISEST
Helping Women Faculty Advance
Funded by NSF