CS 476/MCS 415: Programming Language Design

Spring 2007

Homework 3: Random Sentence Generator

Due: Monday April 16, 2007 at 11:59 pm

Your assignment is to write a Lisp program to generate random sentences. The assignment was written with the language of Lisp in mind; however, if you wish to use a different functional programming language, contact the TA to get approval of your language and document this in a README file. You may work in groups of two on this assignment.

You should have a function named GENERATE-SENTENCE that takes no argument, and returns a list representing a sentence, where each element is a word. For example:
(generate-sentence)
RESULT --> (the boy sees a big red troll)

You can generate sentences using a probabilistic context-free grammar (PCFG). A PCFG is like a standard context-free grammar, where each rule has a probability associated to it. The probabilities represent how a particular rule is likely to be "fired" during the generation process. Notice that the probabilities of rules starting from the same non-terminal should always sum up to 1.

Here is a simple example of PCFG. You can start with this one, but feel free to add/modify rules as you choose. Make sure to explain how you came up with your grammar in the README file.

Notice that the not-terminals of the grammar can represent phrases (like NP and VP) or part of speech categories (like N, V, ADJ), and the terminals represent English words. This table explains the meaning of the abbreviations used in the example grammar:

Abbreviation Stands for Examples
S sentence the boy sees the big red hairy red troll
N noun boy, ring, hobbit, troll, moon, telescope
V verb hits, runs, sees
ADJ adjective big, red, hairy
ADV adverb quickly, quietly, repeatedly
DET determiner a, an, the, this, that, each, every
PRON pronoun he, she, it
PREP preposition in, on, around, about
CONJ conjunction and, but
NP noun phrase the boy, the man in the moon
VP verb phrase runs, hits the tree, sees the moon with the telescope
PP prepositional phrase in the moon, with the telescope

Here are some things you need to pay attention to:

Here are some things you should not worry about:

To represent your grammar, you can take two different approaches.

  1. Store it into a (nested) list representing your rules. To access it, you can create a "constant" function that returns the whole grammar, or you can define a global name (using setq) containing the grammar.
  2. Create a set of functions, each of them representing a rule. For example, you can have a function (pron) that returns "he" 30% of the times, "she" 30% of the times, or "it" 40% of the times. The apply function can be useful in this case.

    (apply fun args) Calls the function fun with the argument list args. For example, (apply 'cons '(a (b c))) gives (a b c), the same result as if we called (cons 'a '(b c)). This function is especially useful if you stored your rules as functions.

    Example of using apply with defun

    >(defun n () '(boy girl gorilla))
    n

    Now suppose y has the value 'n (this could be a paremeter in some other function)
    > y
    n
    Then
    > (apply y ())
    (boy girl gorilla)

If you are having trouble getting started, here are some suggestions.

Spend some time to carefully plan the representation of your grammar. A clever choice of the representation can make the rest of the implementation much easier.

Start with the list '(S), because you want to generate sentences. Write a recursive function that finds a rule for each nonterminal and replaces (expands) it. The rule should be randomly chosen, but make sure to make the choice consistent to the probabilities defined in the grammar.

If you look at the rules above, S --> NP VP says to replace S with NP VP, so change your list from (S) to (NP VP). Next, NP is still a nonterminal, NP --> DET N | DET N PP | DET ADJLIST N, so you need to choose any one of these alternatives, say, DET N PP, and change your list from (NP VP) to (DET N PP VP). Continue in this fashion until you get a list composed entirely of terminals.

Here's a possible sequence (terminals are shown in blue):

Start with: (S)
Use the rule S --> NP VP (NP VP)
Use the rule NP --> DET N PP (DET N PP VP)
Use the rule DET --> EACH (EACH N PP VP)
Use the rule N --> BOY (EACH BOY PP VP)
Use the rule PP --> PREP NP (EACH BOY PREP NP VP)
Use the rule PREP --> IN (EACH BOY IN NP VP)
Use the rule NP --> DET N (EACH BOY IN DET N VP)
Continue... ...
...until you have only terminals left: (EACH BOY IN EVERY RING SEES)

Basically, every time you have a nonterminal, you need to call a recursive function to turn it into a possible production based on the rules for that nonterminal. Eventually each nonterminal will result in a sequence of terminals. Putting all these terminals together gives you the final list.

What and how to submit

You are to submit your code electronically via turnin on the CS machines using the name of hw3. You are to submit two files: You will submit your files using the turnin command on the CS server:
     turnin -c cs476 -p hw3 hw3.lisp  README

You may work on this project by yourself or in a group of two. If you work on this with another member of the class, only submit your project once using turnin. Be sure that you clearly state in your README file the names and NETIDs of both members in the group.

This assignment was based on work done by Dave Matuszek at the University of Pennsylvania.