CS 476 - HW 3: Random Sentence Generator

CS 476/MCS 415: Programming Language Design

Spring 2007

Homework 3: Random Sentence Generator

Due: Monday April 16, 2007 at 11:59 pm

Your assignment is to write a Lisp program to generate random sentences. The assignment was written with the language of Lisp in mind; however, if you wish to use a different functional programming language, contact the TA to get approval of your language and document this in a README file. You may work in groups of two on this assignment.

You should have a function named GENERATE-SENTENCE that takes no argument, and returns a list representing a sentence, where each element is a word. For example:
(generate-sentence) RESULT --> (the boy sees a big red troll)

You can generate sentences using a probabilistic context-free grammar (PCFG). A PCFG is like a standard context-free grammar, where each rule has a probability associated to it. The probabilities represent how a particular rule is likely to be "fired" during the generation process. Notice that the probabilities of rules starting from the same non-terminal should always sum up to 1.

Here is a simple example of PCFG. You can start with this one, but feel free to add/modify rules as you choose. Make sure to explain how you came up with your grammar in the README file.

S --> NP VP [0.7]
S --> S CONJ S [0.3]
NP --> DET N [0.4]
NP --> DET N PP [0.2]
NP --> DET ADJLIST N [0.2]
NP --> PRON [0.2]
VP --> V [0.3]
VP --> V NP [0.5]
VP --> ADV VP [0.2]
PP --> PREP NP [1]
ADJLIST --> ADJ [0.8]
ADJLIST --> ADJ ADJLIST [0.2]
N --> boy [0.2]
N --> ring [0.2]
N --> hobbit [0.2]
N --> troll [0.2]
N --> moon [0.1]
N --> telescope [0.1]
V --> hits [0.3]
V --> runs [0.3]
V --> sees [0.4]
ADJ --> big [0.4]
ADJ --> red [0.3]
ADJ --> hairy [0.3]
ADV --> quickly [0.5]
ADV --> quietly [0.5]
DET --> a [0.2]
DET --> an [0.2]
DET --> the [0.2]
DET --> this [0.1]
DET --> that [0.1]
DET --> each [0.1]
DET --> every [0.1]
PRON --> he [0.3]
PRON --> she [0.3]
PRON --> it [0.4]
PREP --> in [0.25]
PREP --> on [0.25]
PREP --> around [0.25]
PREP --> about [0.25]
CONJ --> and [0.6]
CONJ --> but [0.4]

Notice that the not-terminals of the grammar can represent phrases (like NP and VP) or part of speech categories (like N, V, ADJ), and the terminals represent English words. This table explains the meaning of the abbreviations used in the example grammar:

Abbreviation	Stands for	Examples
`S`	sentence	the boy sees the big red hairy red troll
`N`	noun	boy, ring, hobbit, troll, moon, telescope
`V`	verb	hits, runs, sees
`ADJ`	adjective	big, red, hairy
`ADV`	adverb	quickly, quietly, repeatedly
`DET`	determiner	a, an, the, this, that, each, every
`PRON`	pronoun	he, she, it
`PREP`	preposition	in, on, around, about
`CONJ`	conjunction	and, but
`NP`	noun phrase	the boy, the man in the moon
`VP`	verb phrase	runs, hits the tree, sees the moon with the telescope
`PP`	prepositional phrase	in the moon, with the telescope

Here are some things you need to pay attention to:

You don't want to generate infinitely long sentences. Some rules, such as those for S, NP and ADJLIST, can allow this to happen. Adjust your probabilities so that very long sentences are possible, but unlikely.
During the generation process, your sentences will probably be embedded in a nested list, like ((the boy) (sees (the (big (red (hairy troll)))))). Make sure you "flatten" it, i.e., (the boy sees the big red hairy troll).



Here are some things you should not worry about:

  Upper- and lower-case letters. All one case is fine.
  Punctuation (commas, periods, etc.)
  Perfect grammar. Some of your sentences will be grammatically wrong. 
    Improve on things a bit if you can, by tweaking either the grammar rules or 
    the vocabulary (for example, make all your sentences third-person singular), 
    but don't worry overmuch about it. I'd rather see more interesting sentences 
    than grammatically perfect, boring sentences.


To represent your grammar, you can take two different approaches.

Store it into a (nested) list representing your rules. To access it,
you can create a "constant" function that returns the whole grammar,
or you can define a global name (using setq) containing the grammar.

Create a set of functions, each of them representing a rule.
For example, you can have a function
(pron) that returns "he" 30% of the times, "she" 30% of the times,
or "it" 40% of the times.
The apply function can be useful in this case.

   
    (apply fun args)
    Calls the function fun with the argument list args. 
      For example, (apply 'cons '(a (b c))) gives 
      (a b c), the same result as if we called
      (cons 'a '(b c)). This function is especially useful if you 
      stored your rules as functions.
  




   
    Example of using apply with defun
  
   
     
      >(defun n () '(boy girl gorilla))

        n

        Now suppose y has the value 'n (this could be a 
         paremeter in some other function)

        > y

        n

        Then 

        > (apply y ())

        (boy girl gorilla) 
    
  



If you are having trouble getting started, here are some suggestions.

Spend some time to carefully plan the representation of your grammar.
A clever choice of the representation can make the rest of the implementation much easier.

Start with the list '(S), because you want to generate sentences. 
  Write a recursive function that finds a 
  rule for each nonterminal and replaces (expands) it.
  The rule should be randomly chosen, but make sure to make the choice consistent
  to the probabilities defined in the grammar.
If you look at the rules above, S --> NP VP says to replace 
  S with NP VP, so change your list from (S) to 
  (NP VP). Next, NP is still a nonterminal, NP --> DET 
  N | DET N PP | DET ADJLIST N, so you need to choose any one of these 
  alternatives, say, DET N PP, and change your list from 
  (NP VP) to (DET N PP VP). Continue 
  in this fashion until you get a list composed entirely of terminals.
Here's a possible sequence (terminals are shown in blue):

   
    Start with:
    (S)
  
   
    Use the rule S --> NP VP
    (NP VP)
  
   
    Use the rule NP --> DET N PP
    (DET N PP VP)
  
   
    Use the rule DET --> EACH
    (EACH N PP VP)
  
   
    Use the rule N --> BOY
    (EACH BOY PP VP)
  
   
    Use the rule PP --> PREP NP
    (EACH BOY PREP NP VP)
  
   
    Use the rule PREP --> IN
    (EACH BOY IN NP VP)
  
   
    Use the rule NP --> DET N
    (EACH BOY IN DET N VP)
  
   
    Continue...
    ...
  
   
    ...until you have only terminals left:
    (EACH BOY IN EVERY RING SEES)
  

Basically, every time you have a nonterminal, you need to call a recursive 
  function to turn it into a possible production based on the rules for that
  nonterminal. Eventually each nonterminal will result in a sequence of terminals.
  Putting all these terminals together 
  gives you the final list.


What and how to submit

You are to submit your code electronically via turnin on the CS
machines using the name of hw3.  You are to submit
two files:

hw3.lisp - which is to contain the Lisp code for your
assignment.
README - a plain text file with a description of how
you approached and solved the problem.

You will submit your files using the turnin command on the CS server:
     turnin -c cs476 -p hw3 hw3.lisp  README



You may work on this project by yourself or in a group of two. 
If you work on this with another member of the class, only 
submit your project once using turnin. Be sure that you
clearly state in your README file the names and NETIDs of 
both members in the group.


This assignment was based on work done by Dave Matuszek at the
University of Pennsylvania.

Start with:	`(S)`
Use the rule `S --> NP VP`	`(NP VP)`
Use the rule `NP --> DET N PP`	`(DET N PP VP)`
Use the rule `DET --> EACH`	`(EACH N PP VP)`
Use the rule `N --> BOY`	`(EACH BOY PP VP)`
Use the rule `PP --> PREP NP`	`(EACH BOY PREP NP VP)`
Use the rule `PREP --> IN`	`(EACH BOY IN NP VP)`
Use the rule `NP --> DET N`	`(EACH BOY IN DET N VP)`
Continue...	...
...until you have only terminals left:	`(EACH BOY IN EVERY RING SEES)`