Homework 5: Recursive DNS resolver II: harder better faster stronger

In the last homework, we implemented a recursive dns server. The easiest way to implement that code would lead to an example that looks like:

  1. client sends request “what is www.google.com?”
  2. server request to root: “what is www.google.com?”
  3. root responds: “If you want to ask about the com zone, ask these servers”
  4. server requests to com server: “what is www.google.com?”
  5. com server responds: “If you want to ask about the google.com zone, ask these servers”
  6. server requests to google.com server: “what is www.google.com?”
  7. google.com server responds: “www.google.com has address X.Y.Z.W”
  8. server responds to client: “www.google.com has address X.Y.Z.W”

As anyone who was debugging with dig knows, if your code isn’t fast enough or one of the nameservers you contact isn’t functioning properly, sometimes dig will send a new packet to the client when it is expecting a response from some other nameserver, and this throws a monkey wrench in the works.

The easiest solution to this problem is to put each interaction in its own flow of control: either with a thread or a process to itself. However, to write truly scalable network software, event driven code is a much faster alternative.

The pseudocode for implementing the above algorithm recursively for a single client looks a little like:

function handle_request(requested_name, nameserver_list){
  query for requested_name at one of nameserver_list
  if (answer is found || no authority server is returned)
    return result
  else
    return handle_request(requested_name,authority_server_list)

This coding strategy implicitly carries state about each request with it - by pushing and popping through stack frames, authority server lists and other state variables are maintained so that when a new packet comes in, it’s easy for the program to continue where it left off.

For this assignment, you’ll need to break that assumption: your code will need to handle any packet at any time. This code will typically look more like:

while true:
  receive packet
  if packet is part of a pending request:
    determine next step in algorithm
    update state for this instance of the algorithm
    send packet
  if packet is part of a new request:
    generate clean state variables for this instance
    send packet

Things to watch out for

  1. Plan your state variable. The easiest way to keep your pending request states in order is a linked list. When a new response comes in, walk the list to find which request (if any) it is responding to, and then perform any necessary actions. a function over (state_variable,new_data) should be able to make any decisions that are necessary to running your algorithm.
  2. Timeouts: note that the reference implementation has a short timeout on the main call to recv(). Your state variable should keep a “deadline” timestamp so that they can be timed out and retried/failed as necessary. It is okay to walk your linked list to determine if any timeouts have occurred even though that is suboptimal. (A great interview question is how to improve upon that implementation)
  3. In addition to prev/next links, the list should also contain dependency links: in the above example, the client generates a new request A. The server can then create a state entity for that request, as well as a state entity for the request to the root nameserver (call this B). B can have a dependent_query pointer, such that when a satisfactory answer has been found, the program will know to revisit that request element to advance its state.
  4. There are two types of dependency - the first is that answering a client request is dependent on a recursive query. The second is when an authoritative server’s IP needs to be resolved to continue a recursive lookup. These are two different codepaths (calls to resolve_name). You’ll probably want to have three different flavors of query for your state machine: a client query that sends its response back to the end user after finding the answer, a recursive query meant to help answer client queries, and a glue query that helps complete recursive queries (but also uses recursive queries!).
  5. Although result caching and event driven work very well together, the majority of your points are for getting the event driven portion working correctly. I recommend working on that first and adding caching after.
  6. You don’t have to use the template code if you don’t want to. Different design decisions might have made alternative implementation strategies easier. Although I am suggesting certain details (linked list, dependency links), you are free to go your own way.
  7. Your cache must invalidate entries within 1 second of their TTL expiration - don’t worry about setting timers/alarms or efficiently finding expiring records. It is okay to walk the entire cache once per second to find and remove expiring domains. Style points will definitely be given for efficient implementations (e.g. priority queue), but sadly those don’t count toward your grade.

Requirements

  1. 6 points: refactor the reference solution so that new requests can be serviced in parallel with existing requests. You may only use one thread/process.
  2. 2 points: maintain a cache of at least 1000 entries and use it to respond to all queries.

Due Date

This assignment is due at 3pm on Wednesday, October 23rd.