EECS 370 - Machine Problem 1

SML - Simple Markup Language

Due date: Thursday September 16, 1999 at 11:59 pm

Sample data

  1. data1a.sml
  2. data1b.sml

For this assignment, you are to write a text based word processing program that uses SML, Simple Markup Language. SML is loosely based on the old nroff and troff commands and syntax. The syntax is easier to decode than HTML; therefore, this program should be easier to write than if the language was based off of HTML.

SML's commands all start with a period as the first character on a line. The period is followed by a short sequence of characters to identify the command. If the command requires any arguments, they will follow the command and be separated by one or more blank characters. A line will only contain one command. The lines in the SML file that do not start with a period will contain the text that needs to be formatted by the program.

The executable file name of your program is to be called sml and you are required to use a makefile for the compilation of your program. Your program will also be required to use multiple source code files and take command line arguments.

The command line to execute the sml program is as follows:

   sml  file1  file2  file3 ...  [-o outfile]
Following the sml command, there can be a list of one or more input files. These input files are to contain sml commands and text to be formatted. Following the input files there may be a -o flag followed by a filename. If this flag is present, the formatted output of the sml program is to written to this file. If this flag is not present, the formatted output of the sml program is to be written to standard output.

If the sml program encounters any errors while processing an input file, an error message is to be printed to standard error. This error message must contain the name of the input file and the line number were the error was encountered, as well as a descriptive error message explaining the error. Your program is NOT to exit when an error is encountered. Instead it is to ignore any erroneous input and continue processing the remainder of the input.

When multiple input files are given, after all of the contents of an input file has been processed, the program is to immediately begin processing the contents of the next input file without resetting any of the initial default values for the sml commands. The output produces is to be the same as if one file that contained all of the information from the multiple files was given as input to the program. This would allow multiple people to create different parts of a paper simultaneously. If an input file does not exist, an error message should be printed while the program continues processing other input files.

The sml commands:

page length: .pl int
The integer number following the .pl command indicates the number of text lines that can fit on a page. The integer must be greater than zero. The initial default value is 66.

page width: .pw int
The integer number following the .pw command indicates the number of characters that can fit on a line. The integer must be greater than zero. The initial default value is 80.

top margin: .tm int
The integer number following the .tm command indicates the number of lines that will be left blank at the top of a page. The integer must be positive (zero or greater) and the sum of the top margin and bottom margin must be less than the page length. The initial default value is 3.

bottom margin: .bm int
The integer number following the .bm command indicates the number of lines that will be left blank at the bottom of a page. The bottom margin may contain a page number, see .pn below. The integer must be positive (zero or greater) and the sum of the top margin and bottom margin must be less than the page length. The initial default value is 3.

left margin: .lm int
The integer number following the .lm command indicates the number of spaces that are to be left blank at the beginning of a line. The integer must be positive (zero or greater) and the sum of the left margin and the right margin must be less than the page width. The initial default value is 3.

right margin: .rm int
The integer number following the .rm command indicates the number of spaces that are to be left blank at the end of a line. The integer must be positive (zero or greater) and the sum of the left margin and the right margin must be less than the page width. The initial default value is 3.

line spacing: .ls int
The integer number following the .ls command indicates the form of line spacing that should be used between two adjacent lines of text. The integer can have values of 1, 2 or 3 which correspond to 0, 1 or 2 blank lines following each line of text. The initial default value is 1.

new line: .nl
The .nl command indicates the word following the command is to be the first word on a new output line.

blank line: .bl [int]
The integer number following the .bl command indicates the number of blank lines that should be placed in the output. The number is optional which implies that one blank line is to be printed in the output if the number is left out. The number (when present) must be zero or greater. A value of zero causes the same action as the new line command, .nl.

paragraph: .p
The .p command indicates that the word following the command is to be the first word on a new output line. Note, this command does not cause a blank line to be printed. This command differs from the new line command, .nl, since the first word in a paragraph may be indented, see the paragraph indent command below.

paragraph indent: .pi int
The number following the .pi command indicates the number of space the first word in a paragraph is to be indented. This number must be zero or greater and the sum of the paragraph indent, the left margin and the right margin must be less than the page width. The default value is 0.

justification: .j ch
The character following the .j command indicates whether the following lines will be left justified, right justified, fully justified (both left and right justified) or centered. The character must be either: l, r, f or c. The initial default value is l.

tab: .t
This command causes enough space characters to be written on the fill to fill up to the next tab stop. See tab stops command below.

tab stops: .ts int1 int2 int3 ...
This command specifies how which columns should be filled in with space characters when the tab command, .t, is used. A value of 1 always refers to the first column after the left margin. There will always be a tab stop value of 1. The integers given after the .ts command indicate which additional columns will set as tab stops. If the tab command, .t, is used and the current column is already greater than the largest tab stop value or the next tab stop value places the column to the right of the right margin, your program is to used tab stop 1 on the next line of output.

new page: .np
The program is to print out enough blank lines to fill up the current page or output.

page numbering: .pn ch
The character following the .pn command indicates whether the page number of the current page is to be printed in the center of the bottom margin. The character must either be y or n, to indicate yes or no. Yes indicates the page number is to be printed, while no indicates the page number is not to be printed. If the bottom margin value is zero, then no page number can be printed. If the bottom margin has an odd number of lines, the page number will be placed on the center-most line of the bottom margin (line 1 when the bottom margin has 1 line, line 2 when the bottom margin has 3 lines, line 3 when the bottom margin has 5 lines, etc.). If the bottom margin has an even number of lines, the page number will be placed on the first line of the second half of the bottom margin (line 2 when the bottom margin has 2 lines, line 3 when the bottom margin has 4 lines, line 4 when the bottom margin has 6 lines, etc.). The initial default value is no (not to print page numbers).

bullet begin item: .bb
A bullet item is a special form of a paragraph (which uses NONE of the settings from the paragraph command or the paragraph indent command). The first line of the bullet item paragraph is doubly indented: at the first ident mark, the bullet character is printed; at the second indent mark, the text begins. For all following lines of the bullet item paragraph only the second indent is used to indent the text.

The text that comes between the bullet begin command, .bb, and the bullet end command, .be, is the text for the bullet item. Bullet item paragraphs cannot be nested inside any other type of paragraph

bullet end item: .be
The text that comes between the bullet begin command, .bb, and the bullet end command, .be, is the text for the bullet item. Bullet item paragraphs cannot be nested inside any other type of paragraph

bullet character: .bc ch
The .bc command specifies the the character that will be printed a the first indent mark of the the first line of a bullet item paragraph. The initial default character is a dash "-".

numbered begin item: .nb
A numbered item is a special form of a paragraph (which uses NONE of the settings from the paragraph command or the paragraph indent command). The first line of the numbered item paragraph is doubly indented: at the first ident mark, a numeric value followed by a period is printed; at the second indent mark, the text begins. For all following lines of the numbered item paragraph only the second indent is used to indent the text.

The text that comes between the numbered begin command, .bb, and the numbered end command, .be, is the text for the numbered item. Numbered item paragraphs cannot be nested inside any other type of paragraph

numbered end item: .ne
The text that comes between the numbered begin command, .bb, and the numbered end command, .be, is the text for the numbered item. Numbered item paragraphs cannot be nested inside any other type of paragraph

numbered value: .nv int
The .nv command specifies the numeric value that used for the next numbered item. If the .nv command is not specified before a numbered item the default values are to be used: the first numbered item will default to the value of 1, other numbered items will default to the value of the more recent numbered item value + 1.

item indent: .ii int1 int2
The .ii command specifies the two indent marks to be used with the bullet item and numbered item paragraphs. This command takes two integer values: the number of spaces for the first and second indent marks. These values indicate how many spaces from the left margin the bullet items and numbered item paragraphs will be indented. The first integer value must be greater than or equal to zero. The second integer must be greater than the first integer value. The sum of the second integer value, the left margin and the right margin must be less than the page width. The initial default values are 2 and 5.

The commands can be divided into page commands and line commands. The page commands are those whose primary impact is on how a page (a collection of lines) is formatted. The line commands are those whose primary impact is on how a line is formatted. The following are the page commands:
 .pl.pw.tm.bm.np.pn
The other commands would be the line commands (note: this list is for discussion purposes. If you feel that another command should be a page command, you are more than welcome to included it as such for your program).

When breaking down your program into multiple source code files, you need to come up with some method of organizing the functions within each source code file. For example, each source code file should contain functions that are somehow related. One way fo first do this is to divide the page command functions and the line command functions. Note each source code file should have relatively the same number of lines. Since there are more line command functions, perhaps you will need two source code files to hold all of these functions.

When a page command is encountered and we are currently in the middle of a page (after the first word has been encountered that will go on the page), the affects of the page command will take effect at the start of the next page. Line commands are to work in a similar manner. Note that a number of commands force the end of the current page or line, but a new page or line doesn't start until the first word of text for that page or line is encountered.

For justification, each word on a line in the output is to be separated by a single space character. Note that white space characters (spaces, tabs and newlines) between words of text in the input file are to be counted as a single space character in the output. So you will have to remove any extra white space characters (similar to the way HTML removes such characters). Once there are enough words to fill up the current line, the justification must occur.

Your program must be written in good programming style. This includes (but is not limited to) meaningful identifier names, a file header at the beginning of each source code file, a function header at the beginning of the function, proper use of blank lines and indentation to aide in the reading of your code, explanatory "value-added" in-line comments, etc.

The work you turn in must be 100% your own. You are not allowed to share code with any other person (inside this class or not). You may discuss the project with other persons; however, you may not show any code you write to another person nor may you look at any other person's written code.

You are also to write a one to two page program description. This write up must be submitted with your program and be in ASCII text format (perhaps you can use this program to format your program description!!). This description is to explain your internal data structures, code structures and the algorithms used in your program. Remember, this program description will be read by another student when the critiques are done for this assignment. Often the title of "readme" is used for these types of documents.

You are to submit this project using the EECS Department's UNIX machine's turnin command. The project name for this assignment is mp1. Be sure to submit all source code and header files as well as your makefile and program description. Failure to turnin all required pieces will result in a lower grade for the assignment.