This is a link to a sample data file. This data file is supposed to be syntacally correct. Send email to troy@eecs.uic.edu if it is not.
A simple recursive Descent Parser that will validate the syntax of the following grammar. The reserved word are shown underlined. The capital E is being used for the empty set character. The starting non-terminal is "program".
program -> |
global_elements |
global_elements -> |
global_elements ; global_element | global_elemtent |
global_element -> |
var_decl | function | prototype | E |
var_decl -> |
type id |
function -> |
function rtype id ( fparam_list ) { stmt_list } |
prototype -> |
forward rtype id ( pparam_list ) |
rtype -> |
type | void |
type -> |
int | float | char |
fparam_list -> |
fparams | E |
fparams -> |
fparams , var_decl | var_decl |
pparam_list -> |
pparams | E |
pparams -> |
pparams , type | type |
stmt_list -> |
stmt_list ; stmt | stmt |
stmt -> |
var_decl | assign | fcall | if_stmt | while_stmt | E |
assign -> |
id = expr |
expr -> |
expr + term | expr - term | term |
term -> |
term * unary | term / unary | term % unary | unary |
unary -> |
+ factor | - factor | factor |
factor -> |
const_value | id | fcall | ( expr ) |
const_value -> |
integer | floating_point | character |
fcall -> |
id ( aparam_list ) |
aparam_list -> |
aparams | E |
aparams -> |
aparams , expr | expr |
if_stmt -> |
if ( expr ) stmts else_stmt |
else_stmt -> |
else stmts | E |
while_stmt -> |
while ( stmts )
|
stmts -> |
stmt | { stmt_list } |
Weird things about the above grammar:
The above grammar is written using left-recursive rules. In order to get it working with a recursive descent parser these rules must be modified to remove the left-recursiveness.
The tokens of id, integer, floating_point, character are the same as defined in mp1.
There is an ambiguity in the grammar. The ambiguity is that variable names and function names are both identifiers. To resolve this, we will be adding a simple symbol table and require that all identifiers are declared before they are used. An identifier is declared as a variable in a "var_decl" rule. An identifier is declared as a function in either a "function" or a "prototype" rule. When a declaration occurs, the identifier is stored in the symbol table with some information stating whether it is a variable name or a function name. We consider "variables" to include the formal parameters of a function. To allow for multiple scopes (global and local), we will have the symbol table be divided into multiple parts (one part for global scope and one part for each local scope). This use of multiple scopes will allow for the re-use of identifier names; however, each identifier name can only be defined once in each scope. Note that a prototype statement can declare the same identifier name as other prototype statements and as one other function statement. When an identifier is encountered in an assign, fcall or factor rule, check the symbol table to see how this identifier was declared. If it was not declared, print an error message. If it was declared as a function name, follow the rule to the fcall non-terminal. If it was declared as a variable name, follow the rule to the assign non-terminal or the "factor -> id" rule. When looking up an identifier in the symbol table, first look for the identifier in the current local scope. If the identifier is not there, then look for the identifier in the global scope.
The input to this program will be from a file whose name is given as a command line argument.
The output of this program will be statement that there were no parsing (or lexical) errors in the given input file or a statement of all of the encountered errors. When a parsing error is encountered, you are to create an error message that states which token was found, the value of the token, the line number and column number the token begins on, and the token expected or the rule being parsed. Use your own judgement as to whether the expected token or the current rule should be listed in your error message. A general rule is that if there can only be on possible token that should come next, list this token; otherwise, list the current rule. An example error message could be:
At line 12, column 15: unexpected token of type: identifier, value: val1 encountered, expected token of type: operator, value: =This does make long error messages, but should given needed information.
When a lexical error is found, follow the guidelines from machine problem 1 in printing an error message. Find the next valid token in the input and give this token to the parser.
After each error, your program should attempt to recover from the error. First, your program should skip the invalid token and try the next token from the input file. This will allow for easy recovery if the error was adding an extra token. If this doesn't resolve the error, skip ahead until the next semicolon is encountered. If the error was encountered while in a function, resume token matching with the semicolon in the stmt_list rule. If the error was encountered outside of a function, resume token matching with the semicolon in the global_elements rule. If you need to skip to the next semicolon, print a message stating this that also lists the line and column of the next semicolon as follows:
Skipping to the next semicolon at line 12, column 53
The parser will not do any type validation. for expressions. For functions, you are not required to validate number of parameters. If you wish to add code to validate the number of parameters, you may do so for 5 points extra credit. Note that the number of parameters must match for all uses of that identifier with a function (that is for all prototypes, function calls and the function statement itself).
For an addition 10 points extra credit, you can add rules for the relational operators (< , <=, >. >=, ==, and !=) and the boolean operators (&&, ||, !). The relational operators have a precedence lower than addition. The boolean AND operator (&&) has a lower precedence than the relational operators. The boolean OR operator (||) has a lower precedence than the boolean AND operator. The boolean NOT operator (!) has the same precedence as an unary operator. In order to get these 10 points, you must submit a readme file that shows how the grammar was modified to allow for these operators. The readme file may be in ASCII text (with a .txt extension) or in HTML format (with a .htm or .html extension). Note: that when checking for syntax, having the correct precedence may not be needed. That means that having the wrong precedence may still properly check the syntax. The precedence statements are given to help you create the proper grammar rules.
Your program will be submitted electronically using turnin and must run on the EECS department computers. You must also submit a make file to compile your program. Your program is to be the result of individual work and is expected to be written using good programming style.