EECS 370 - Machine Problem 1
SML - Simple Markup Language
Due date: Thursday September 16, 1999 at 11:59 pm
Sample data
- data1a.sml
- data1b.sml
For this assignment, you are to write a text based word processing
program that uses SML, Simple Markup Language. SML is loosely based
on the old nroff and troff commands and syntax. The
syntax is easier to decode than HTML; therefore, this program should be
easier to write than if the language was based off of HTML.
SML's commands all start with a period as the first character on a line.
The period is followed by a short sequence of characters to identify the
command. If the command requires any arguments, they will follow the
command and be separated by one or more blank characters. A line
will only contain one command.
The lines in the SML file that do not start with a period will contain
the text that needs to be formatted by the program.
The executable file name of your program is to be called sml and
you are required to use a makefile for the compilation of your program.
Your program will also be required to use multiple source code files
and take command line arguments.
The command line to execute the sml program is as follows:
sml file1 file2 file3 ... [-o outfile]
Following the sml command, there can be a list of one or more input files.
These input files are to contain sml commands and text to be formatted.
Following the input files there may be a -o flag followed by a
filename. If this flag is present, the formatted output of the sml
program is to written to this file. If this flag is not present, the
formatted output of the sml program is to be written to standard output.
If the sml program encounters any errors while processing an input file,
an error message is to be printed to standard error. This error message
must contain the name of the input file and the line number were the
error was encountered, as well as a descriptive error message explaining
the error. Your program is NOT to exit when an error is encountered.
Instead it is to ignore any erroneous input and continue processing the
remainder of the input.
When multiple input files are given, after all of the contents of an input
file has been processed, the program is to immediately begin processing the
contents of the next input file without resetting any of the initial default
values for the sml commands. The output produces is to be the same as if
one file that contained all of the information from the multiple files
was given as input to the program. This would allow multiple people to
create different parts of a paper simultaneously. If an input file does
not exist, an error message should be printed while the program continues
processing other input files.
The sml commands:
- page length: .pl int
- The integer number following the .pl command indicates the number
of text lines that can fit on a page. The integer must be greater than
zero. The initial default value is 66.
- page width: .pw int
- The integer number following the .pw command indicates the number
of characters that can fit on a line. The integer must be greater than
zero. The initial default value is 80.
- top margin: .tm int
- The integer number following the .tm command indicates the number of
lines that will be left blank at the top of a page. The integer must
be positive (zero or greater) and the sum of the top margin and bottom
margin must be less than the page length. The initial default value is 3.
- bottom margin: .bm int
- The integer number following the .bm command indicates the number of
lines that will be left blank at the bottom of a page. The bottom
margin may contain a page number, see .pn below. The integer must
be positive (zero or greater) and the sum of the top margin and bottom
margin must be less than the page length. The initial default value is 3.
- left margin: .lm int
- The integer number following the .lm command indicates the number of
spaces that are to be left blank at the beginning of a line. The integer
must be positive (zero or greater) and the sum of the left margin and the
right margin must be less than the page width. The initial default value is 3.
- right margin: .rm int
- The integer number following the .rm command indicates the number of
spaces that are to be left blank at the end of a line. The integer
must be positive (zero or greater) and the sum of the left margin and the
right margin must be less than the page width. The initial default value is 3.
- line spacing: .ls int
- The integer number following the .ls command indicates the form of
line spacing that should be used between two adjacent lines of text.
The integer can have values of 1, 2 or 3 which correspond to 0, 1 or
2 blank lines following each line of text. The initial default value is 1.
- new line: .nl
- The .nl command indicates the word following the command is to
be the first word on a new output line.
- blank line: .bl [int]
- The integer number following the .bl command indicates the number
of blank lines that should be placed in the output. The number is
optional which implies that one blank line is to be printed in the
output if the number is left out. The number (when present) must be zero or greater. A value of
zero causes the same action as the new line command, .nl.
- paragraph: .p
- The .p command indicates that the word following the command is to
be the first word on a new output line. Note, this command does not cause
a blank line to be printed. This command differs from the new line command,
.nl, since the first word in a paragraph may be indented, see the paragraph
indent command below.
- paragraph indent: .pi int
- The number following the .pi command indicates the number of space
the first word in a paragraph is to be indented. This number must be
zero or greater and the sum of the paragraph indent, the left margin and
the right margin must be less than the page width. The default value
is 0.
- justification: .j ch
- The character following the .j command indicates whether the following
lines will be left justified, right justified, fully justified (both left and right justified)
or centered. The character must be either: l, r, f or c. The initial default
value is l.
- tab: .t
- This command causes enough space characters to be written on the fill
to fill up to the next tab stop. See tab stops command below.
- tab stops: .ts int1 int2 int3 ...
- This command specifies how which columns should be filled in with space
characters when the tab command, .t, is used. A value of 1 always refers to
the first column after the left margin. There will always be a tab stop value
of 1. The integers given after the .ts command indicate which additional
columns will set as tab stops. If the tab command, .t, is used and the
current column is already greater than the largest tab stop value or the
next tab stop value places the column to the right of the right margin, your
program is to used tab stop 1 on the next line of output.
- new page: .np
- The program is to print out enough blank lines to fill up the current
page or output.
- page numbering: .pn ch
- The character following the .pn command indicates whether the page number
of the current page is to be printed in the center of the bottom margin. The
character must either be y or n, to indicate yes or no. Yes indicates the
page number is to be printed, while no indicates the page number is not to
be printed. If the bottom margin value is zero, then no page number can be
printed. If the bottom margin has an odd number of lines, the page number
will be placed on the center-most line of the bottom margin (line 1 when the
bottom margin has 1 line, line 2 when the
bottom margin has 3 lines, line 3 when the bottom margin has 5 lines, etc.).
If the bottom margin has an even number of lines, the page number will be
placed on the first line of the second half of the bottom margin (line 2 when
the bottom margin has 2 lines, line 3 when the bottom margin has 4 lines, line
4 when the bottom margin has 6 lines, etc.). The initial default value is no (not
to print page numbers).
- bullet begin item: .bb
- A bullet item is a special form of a paragraph (which uses NONE of the
settings from the paragraph command or the paragraph indent command). The
first line of the bullet item paragraph is doubly indented: at the first ident
mark, the bullet character is printed; at the second indent mark, the text
begins. For all following lines of the bullet item paragraph only the second
indent is used to indent the text.
The text that comes between the bullet begin command, .bb, and the
bullet end command, .be, is the text for the bullet item. Bullet item
paragraphs cannot be nested inside any other type of paragraph
- bullet end item: .be
- The text that comes between the bullet begin command, .bb, and the
bullet end command, .be, is the text for the bullet item. Bullet item
paragraphs cannot be nested inside any other type of paragraph
- bullet character: .bc ch
- The .bc command specifies the the character that will be printed a the
first indent mark of the the first line of a bullet item paragraph. The
initial default character is a dash "-".
- numbered begin item: .nb
- A numbered item is a special form of a paragraph (which uses NONE of the
settings from the paragraph command or the paragraph indent command). The
first line of the numbered item paragraph is doubly indented: at the first ident
mark, a numeric value followed by a period is printed; at the second indent mark, the text
begins. For all following lines of the numbered item paragraph only the second
indent is used to indent the text.
The text that comes between the numbered begin command, .bb, and the
numbered end command, .be, is the text for the numbered item. Numbered item
paragraphs cannot be nested inside any other type of paragraph
- numbered end item: .ne
- The text that comes between the numbered begin command, .bb, and the
numbered end command, .be, is the text for the numbered item. Numbered item
paragraphs cannot be nested inside any other type of paragraph
- numbered value: .nv int
- The .nv command specifies the numeric value that used for the next numbered
item. If the .nv command is not specified before a numbered item the default
values are to be used: the first numbered item will default to the value of 1,
other numbered items will default to the value of the more recent numbered item
value + 1.
- item indent: .ii int1 int2
- The .ii command specifies the two indent marks to be used with the bullet
item and numbered item paragraphs. This command takes two integer values:
the number of spaces for the first and second indent marks. These values
indicate how many spaces from the left margin the bullet items and numbered
item paragraphs will be indented. The first integer value must be greater
than or equal to zero. The second integer must be greater than the first
integer value. The sum of the second integer value, the left margin and the
right margin must be less than the page width. The initial default values
are 2 and 5.
The commands can be divided into page commands and line commands. The
page commands are those whose primary impact is on how a page (a collection
of lines) is formatted. The line commands are those whose primary impact is
on how a line is formatted. The following are the page commands:
The other commands would be the line commands (note: this list is for
discussion purposes. If you feel that another command should be a page
command, you are more than welcome to included it as such for your program).
When breaking down your program into multiple source code files, you need
to come up with some method of organizing the functions within each source
code file. For example, each source code file should contain functions
that are somehow related. One way fo first do this is to divide the
page command functions and the line command functions. Note each source
code file should have relatively the same number of lines. Since there are
more line command functions, perhaps you will need two source code files
to hold all of these functions.
When a page command is encountered and we are currently in the middle of a page
(after the first word has been encountered that will go on the page), the
affects of the page command will take effect at the start of the next page.
Line commands are to work in a similar manner. Note that a number of
commands force the end of the current page or line, but a new page or line
doesn't start until the first word of text for that page or line is encountered.
For justification, each word on a line in the output is to be separated
by a single space character. Note that white space characters (spaces,
tabs and newlines) between words of text in the input file
are to be counted as a single space character in the output.
So you will have to remove any extra white space
characters (similar to the way HTML removes such characters).
Once there are enough words to fill up the current line, the justification
must occur.
-
If the line is to left justified, the line is to be printed
with any extra space at the end of the line.
-
If the line is to right
justified, the current line is to be printed with any extra space at the
beginning of the line.
-
If the line is to centered, the current line is
to be printed with half of the extra space at the beginning of the line
(use a divide by two operation rounded down, 3 extra spaces has 1 space at
the beginning of the line).
-
If the line is to fully justified, extra spaces are needed to be placed
between the words. An extra space is to first be inserted between
the last two words on the line, then between the third to last and
second to last words, then between the fourth to last and third
to last words, etc. If more spaces are needed after an extra space
has been placed between each word on a line, then repeat the process
placing three spaces between the words (starting with the last two
words). If three spaces between words does not fill up then line, then
place four spaces, then five spaces, until the line is filled. For
example if a line has 17 words (separated by 16 spaces) and there are
6 extra spaces at the end of the line, the first ten words would be
followed by one space and the next 6 words would be followed by 2 spaces.
If a line has 5 words (separated by 4 spaces) and there are 10 extra
spaces at the end of the line, the first two words would be followed by
3 spaces and the next 2 words would be followed by 4 spaces.
Your program must be written in good programming style. This includes
(but is not limited to) meaningful identifier names, a file header at
the beginning of each source code file, a function header at the beginning
of the function, proper use of blank lines and indentation to aide in the
reading of your code, explanatory "value-added" in-line comments, etc.
The work you turn in must be 100% your own. You are not allowed to
share code with any other person (inside this class or not). You may
discuss the project with other persons; however, you may not show any
code you write to another person nor may you look at any other person's
written code.
You are also to write a one to two page program description. This write up
must be submitted with your program and be in ASCII text format (perhaps
you can use this program to format your program description!!). This
description is to explain your internal data structures, code structures
and the algorithms used in your program. Remember, this program description
will be read by another student when the critiques are done for this
assignment. Often the title of "readme" is used for these types of
documents.
You are to submit this project using the EECS Department's UNIX machine's
turnin command. The project name for this assignment is mp1.
Be sure to submit all source code and header files as well as your makefile
and program description.
Failure to turnin all required pieces will result in a lower grade for the assignment.