Your assignment is to write a Lisp program to generate random sentences. The assignment was written with the language of Lisp in mind; however, if you wish to use a different functional programming language, contact the TA to get approval of your language and document this in a README file. You may work in groups of two on this assignment.
You should have a function named GENERATE-SENTENCE
that takes no argument,
and returns a list representing a sentence, where each element is a word. For example:
(generate-sentence)
RESULT --> (the boy sees a big red troll)
You can generate sentences using a probabilistic context-free grammar (PCFG). A PCFG is like a standard context-free grammar, where each rule has a probability associated to it. The probabilities represent how a particular rule is likely to be "fired" during the generation process. Notice that the probabilities of rules starting from the same non-terminal should always sum up to 1.
Here is a simple example of PCFG. You can start with this one, but feel free to add/modify rules as you choose. Make sure to explain how you came up with your grammar in the README file.
S --> NP VP [0.7]
S --> S CONJ S [0.3]
NP --> DET N [0.4]
NP --> DET N PP [0.2]
NP --> DET ADJLIST N [0.2]
NP --> PRON [0.2]
VP --> V [0.3]
VP --> V NP [0.5]
VP --> ADV VP [0.2]
PP --> PREP NP [1]
ADJLIST --> ADJ [0.8]
ADJLIST --> ADJ ADJLIST [0.2]
N --> boy [0.2]
N --> ring [0.2]
N --> hobbit [0.2]
N --> troll [0.2]
N --> moon [0.1]
N --> telescope [0.1]
V --> hits [0.3]
V --> runs [0.3]
V --> sees [0.4]
ADJ --> big [0.4]
ADJ --> red [0.3]
ADJ --> hairy [0.3]
ADV --> quickly [0.5]
ADV --> quietly [0.5]
DET --> a [0.2]
DET --> an [0.2]
DET --> the [0.2]
DET --> this [0.1]
DET --> that [0.1]
DET --> each [0.1]
DET --> every [0.1]
PRON --> he [0.3]
PRON --> she [0.3]
PRON --> it [0.4]
PREP --> in [0.25]
PREP --> on [0.25]
PREP --> around [0.25]
PREP --> about [0.25]
CONJ --> and [0.6]
CONJ --> but [0.4]
Notice that the not-terminals of the grammar can represent phrases (like NP and VP) or part of speech categories (like N, V, ADJ), and the terminals represent English words. This table explains the meaning of the abbreviations used in the example grammar:
Abbreviation | Stands for | Examples |
S |
sentence | the boy sees the big red hairy red troll |
N |
noun | boy, ring, hobbit, troll, moon, telescope |
V |
verb | hits, runs, sees |
ADJ |
adjective | big, red, hairy |
ADV |
adverb | quickly, quietly, repeatedly |
DET |
determiner | a, an, the, this, that, each, every |
PRON |
pronoun | he, she, it |
PREP |
preposition | in, on, around, about |
CONJ |
conjunction | and, but |
NP |
noun phrase | the boy, the man in the moon |
VP |
verb phrase | runs, hits the tree, sees the moon with the telescope |
PP |
prepositional phrase | in the moon, with the telescope |
Here are some things you need to pay attention to:
S
, NP
and ADJLIST
,
can allow this to happen.
Adjust your probabilities so that very long sentences are possible,
but unlikely.
((the boy) (sees (the (big (red (hairy troll))))))
.
Make sure you "flatten" it, i.e.,
(the boy sees the big red hairy troll).
Here are some things you should not worry about:
To represent your grammar, you can take two different approaches.
setq
) containing the grammar.
(pron)
that returns "he" 30% of the times, "she" 30% of the times,
or "it" 40% of the times.
The apply
function can be useful in this case.
(apply fun args) |
Calls the function fun with the argument list args .
For example, (apply 'cons '(a (b c))) gives
(a b c) , the same result as if we called
(cons 'a '(b c)) . This function is especially useful if you
stored your rules as functions. |
Example of using apply with defun |
|
If you are having trouble getting started, here are some suggestions.
Spend some time to carefully plan the representation of your grammar. A clever choice of the representation can make the rest of the implementation much easier.
Start with the list '(S)
, because you want to generate sentences.
Write a recursive function that finds a
rule for each nonterminal and replaces (expands) it.
The rule should be randomly chosen, but make sure to make the choice consistent
to the probabilities defined in the grammar.
If you look at the rules above, S --> NP VP
says to replace
S
with NP VP
, so change your list from (S)
to
(NP VP)
. Next, NP
is still a nonterminal, NP --> DET
N | DET N PP | DET ADJLIST N
, so you need to choose any one of these
alternatives, say, DET N PP
, and change your list from
(NP VP)
to (DET N PP VP)
. Continue
in this fashion until you get a list composed entirely of terminals.
Here's a possible sequence (terminals are shown in blue):
Start with: | (S) |
Use the rule S --> NP VP |
(NP VP) |
Use the rule NP --> DET N PP |
(DET N PP VP) |
Use the rule DET --> EACH |
(EACH N PP VP) |
Use the rule N --> BOY |
(EACH BOY PP VP) |
Use the rule PP --> PREP NP |
(EACH BOY PREP NP VP) |
Use the rule PREP --> IN |
(EACH BOY IN NP VP) |
Use the rule NP --> DET N |
(EACH BOY IN DET N VP) |
Continue... | ... |
...until you have only terminals left: | (EACH BOY IN EVERY RING SEES) |
Basically, every time you have a nonterminal, you need to call a recursive function to turn it into a possible production based on the rules for that nonterminal. Eventually each nonterminal will result in a sequence of terminals. Putting all these terminals together gives you the final list.
turnin -c cs476 -p hw3 hw3.lisp README
You may work on this project by yourself or in a group of two. If you work on this with another member of the class, only submit your project once using turnin. Be sure that you clearly state in your README file the names and NETIDs of both members in the group.
This assignment was based on work done by Dave Matuszek at the University of Pennsylvania.