EECS 473 - MP1: A Simple Lexical Analyzer

Due: January 29, 2001

For this assignment, you are to write a C/C++ program that will take as input a "program" and determine the tokens contained in it. You are not allowed to use lex or any other similar tools for this program.

The tokens for this assignment are as follows:

Your program is to be given the name of the program/file to analyse through the command line. Your program is to have a function getToken() that will return the next token and some information about it. Your main program is to continuously call getToken() until the EOF token is returned. Do not print anything when reaching the end of the file. The getToken() function is to ignore and blanks, tabs, newlines, formfeeds and comments (either /* .. */ style or // style comments) found in the program.

For each token, you are to print out the token name (i.e. identifier, keyword, character, etc.), the token's value, the line number containing the start of the token and the column where the token started. If you encounter a character that is not part of any token, print the token name as "Unknown" and the value as "\xxx" where xxx is the octal value of the character.

The for following input line (assume it is line 24)

     x = 3 + 5.31;  
Your program should produce something similar to:
   identifier     x    	24    	 1
   operator       =    	24    	 3
   integer        3    	24  	 5
   operator       +	24	 7
   float	  5.31	24	 9
   operator	  ;	24	13

Your program will be submitted electronically using turnin and must run on the EECS department computers. You must also submit a make file to compile your program. Your program is to be the result of individual work and is expected to be written using good programming style.

Added 1/21/2001

Based on the converstion in class on Friday 1/19/2001, a few error messages should be used help with missing ending single quotes, double quotes and star-slash of C-style comments. These messages should include the line and columns where the beginning single quote, double quote or slash-star is located. A fourth error message stating improper escape character sequence (for a character that begins with a back-slash, i.e. '\x') can be used also. A comment was made the the fourth error message and missing ending single quote are in fact the same thing. Use your judgement on how to report such an error.