This summary was written by Jean Lazarou.
We received no solution for this quiz. Therefore we are going to
present the one we wrote.
The basic language reference we consider is the one from Dartmouth
college (back in October 1964). The language is rather simple, it only
computes numbers. It does not contain any string handling, except for
output printing. The variable naming is limited to a single letters
followed by an optional digit. It does not support interactive input.
The language
Let us start with a short description of the language. For a deeper
specification see annex or get the Dartmouth College=92s document.
Each line of a basic program is a statement. Most statements are
instructions to execute.
A line must start with a line number. After the line number comes a
word denoting the type of the statement. Basic has fifteen statement
types:
DATA
DEF
DIM
END
FOR
GOSUB
GOTO
IF
LET
NEXT
PRINT
READ
REM
RETURN
STOP
Our implementation does not support the DEF and DIM statements.
Source code does not require lines to be ordered. Basic uses only
capital letters and spaces are optional. Expression must appear on a
single line.
The END statement must be the highest numbered statement, including
the data statements. A number must contain up to nine digits with an
optional minus sign and an optional decimal point.
As basic programs have no input statements, a program uses the data
statements for data input.
A simple program
Next program outputs the first ten square roots.
10 LET X =3D 0
20 LET X =3D X + 1
30 PRINT X, SQR(X)
40 IF X <=3D 10 THEN 20
50 END
The parser
We are going to write a parser and an interpreter. Separating the
parsing phase from the interpretation (or the run) makes writing tests
easier, we separate the syntaxical aspects from the running aspects.
Parsing basic code is not very difficult, the syntax is not complex.
We use a regular expression to validate each line.
01 unless line =3D~ /^(\d+)
*(DATA|DEF|DIM|END|FOR|GOSUB|GOTO|IF|LET|NEXT|PRINT|READ|REM|RETURN|STOP)
*(.*)$/
02 raise error_message("INVALID STATEMENT", line_number, line)
03 end
The regular expression matches a string starting with digits, followed
by optional spaces, one of the valid basic keywords, optional spaces
and anyhting. We use regular expression=92s groups so that if the line
matches our rule the global variables $1 contains the line number, $2
contains the statement and $3 the rest of the line. We discard any
space in between.
After validating the statement type, we need statement specific
validation. We use a hash table that maps the statement type to a
method that parses each statements.
01 STATEMENT_PARSERS =3D {
02 'DATA' =3D>
arse_data,
03 'END' =3D>
arse_end,
04 'FOR' =3D>
arse_for,
05 'GOSUB' =3D>
arse_gosub,
06 'GOTO' =3D>
arse_goto,
07 'IF' =3D>
arse_if,
08 'LET' =3D>
arse_let,
09 'NEXT' =3D>
arse_next,
10 'PRINT' =3D>
arse_print,
11 'READ' =3D>
arse_read,
12 'REM' =3D>
arse_rem,
13 'RETURN' =3D>
arse_return,
14 'STOP' =3D>
arse_stop
15 }
Each parse method needs the statement line number and the rest of the
line (the part to parse). It also needs the current position in the
script and the whole line to be able to generate a valuable error
message.
01 statement =3D self.send(method, line_number, line, $1.to_i, $3)
02 statements << statement
The parser sends the message (the parse method) to it self. If the
parse method successfully parses the statement it returns a Satetement
object. We add the statement object to the statements array.
Once the parser consumes all the input program, we need to sort the
satements because the basic language does not require the statement to
be ordered.
01 statements.sort!
The sorting works because the Statement objects provide an
implementation of the comparison method (<=3D>). The comparison method
compares the line numbers. Here is the Statement base class.
01 class Statement
02
03 attr_reader :line, :arguments
04
05 def initialize line, type, arguments =3D nil
06 @line, @type, @arguments =3D line, type, arguments
07 end
08
09 def <=3D> other
10 @line - other.line
11 end
12
13 def to_s
14 "#{@line} #{@type} #{@arguments}".rstrip
15 end
16
17 end
We are not going to present all the parse methods, as an example we
are going to look at parse_goto which is rather simple. A goto
statement looks like: 10 GOTO 90. Which means: the effect of line 10
is to jump at line 90.
01 def parse_goto line_number, source, statement_line, arguments
02
03 scanner =3D BasicScanner.new(arguments)
04
05 scanner.skip_spaces
06 to_line =3D scanner.scan_line
07
08 raise error_message("MISSING TO LINE", line_number, source)
unless to_line
09
10 scanner.skip_spaces
11 raise error_message("INVALID 'GOTO' STATMENT", line_number,
source) unless scanner.eos?
12
13 GotoStatement.new(statement_line, to_line.to_i)
14
15 end
We use the BasicScanner class which derives from the StringScanner
class. The reason for extending StringScanner is to add methods, like
skip_spaces, that makes it easier to read the code, not to mention
that regular expressions are not easy to read.
The arguments parameter is a string containing everything following
the basic statement type. We first skip spaces (line 5), then we try
retrieving the line number to go to (line 6). If we don=92t find a line
number, we raise an error (line 8). Otherwise we skip trailing spaces
(line 10) and check if we reached the end of the input (line 11).
Finally, input was a valid GOTO statement and we return a
GotoStatement.
Expressions
When a statement expects an expression the parse methods use the
parse_expression method.
parse_expression
a number (a floating point value) for literals,
a string for operators (+, -, *, /, ^), parentheses and built-in functions
a symbol for variables
The array of tokens is still an infix expression. The reason is that
we can easier implement the to_s method use by the to_s method of a
Statement object which produces a valid basic source code.
The to_a method returns the array of tokens.
Expression
Testing the parser
Testing the parser is easy, here are some examples.
Testing the expression parsing.
01 expr =3D @parser.parse_expression('INT(X/Y) + 2')
02 assert_equal ['INT', '(', :X, '/', :Y, ')', '+', 2], expr.to_a
Testing converting the expression to the postfix version.
01 expr =3D @parser.parse_expression('SQR(4) + INT(Y/2) * 3')
02 assert_equal [4, 'SQR', :Y, 2, '/', 'INT', 3, '*', '+'], expr.to_postf=
ix
Testing that spaces are optional.
01 def test_program_witout_spaces
02
03 program =3D <<PROGRAM
04 10PRINT"HELLO"
05 20END
06 PROGRAM
07
08 statements =3D @parser.parse(program)
09
10 assert_equal 2, statements.length
11
12 assert_equal '10 PRINT "HELLO"', statements[0].to_s
13 assert_equal '20 END', statements[1].to_s
14
15 end
We use the to_s method of the Statement objects which returns a
canonical string to make the test is easier to read. (notice that
reading source code without spaces is not easy =96 line 4)
The interpreter
Now that a parser returns an array of Statement objects, executing the
program is easy. We loop through them and keep track of the program
position (we need a bit more, see later). The interpreter uses a
runtime object which stores the programs=92s variables and the program
position. Here is the main loop of the interpreter.
01 def run_statements statements
02
03 runtime =3D create_runtime(statements)
04
05 while runtime.running?
06
07 statement =3D statements[runtime.program_pos]
08
09 statement.execute(runtime)
10
11 runtime.program_pos +=3D 1
12
13 end
14
15 runtime
16
17 end
The runtime object has also a running state.
The interpreter expects the statement objects to implement the execute
method (line 9). The statement classes in the parser file
(basic_parser.rb) do not offer execute methods, we re-open the classes
here in the interpreter file (basic.rb). Therefore our files are
shorter and the separation of responsibilities does not imply doubling
the set of classes.
At the end of the method (line 15) we return the runtime to the
calling code so that it can access the variables.
The runtime object
Let us fisrt look at the runtime class layout.
01 class Runtime
02
03 attr_accessor
rogram_pos
04
05 def initialize console, data
06 end
07
08 def running?
09 end
10
11 def next_data
12 end
13
14 def end
15 end
16
17 def print line
18 end
19
20 def [] var
21 end
22
23 def []=3D var, value
24 end
25
26 def each_var
27 end
28
29 def save_pos name, subject
30 end
31
32 def restore_pos name
33 end
34
35 end
A Runtime instance needs a console where the output is sent and an
array of all the data available for the read statements.
The program position can be read or changed, for instance the GOTO
statement changes it. Notice that the program position is an index in
the program=92s array of statements, not the source code line number.
The next_data returns the sequence of data or nil when all the data is cons=
umed.
The end method puts the state to program ended.
The print method is used by the PRINT statement.
The [], []=3D and each_var give access to the current values of the
program=92s variables.
The last methods, save_pos and restore_pos are used to save the
current program position and to retrieve the saved position. A name is
accociated with each save, it is useful to validate the correct
sequence of the NEXT statements and the RETURN statements. The saved
positions are pushed on a stack and must be consumed in reverse order.
Creating the runtime object
The create_runtime method instantiate a Runtime object.
It loops through all the statements creating:
a map table, mapping the basic line numbers to the index in the array
of statements
a collection of the goto-like statements (GOTO, GOSUB and IF)
an array of all the data found in the statements
After the first loop, it starts a second loop to set the mapped line
number to the program position for all the goto-like statements. The
test to find what statements are goto-like is done by searching all
the objects that respond to the program_pos=3D message, needed to set
the actual program position.
Turn the Statement objects into runnable objects=85
In the interpreter file we re-open the Statement classes and add the
execute methods. The default implementation does nothing (some
statements like DATA and REM just inherit the no-operation execute).
01 class Statement
02
03 def execute runtime
04 end
05
06 end
One interresting thing in Ruby is that we do not need to redeclare
inheritance when we re-open a class. Therefore, if we ever need to
change the class hierarchy we only need to change it at one place. If
we declared the inheritance we would have to give exactly the same.
We are going to look at the STOP and the GOSUB statements.
01 class StopStatement
02
03 def execute runtime
04 runtime.end
05 end
06
07 end
(We do not repeat that the class derives from the Statement class.)
The implementation for stop is very simple, it only needs to notify
the runtime that the program stops running (line 4).
The gosub is just a bit more complicated.
01 class GosubStatement
02
03 def execute runtime
04 runtime.save_pos :gosub, self
05 runtime.program_pos =3D @to_program_pos
06 end
07
08 def program_pos=3D line
09 @to_program_pos =3D line - 1
10 end
11
12 end
We ask the runtime to save the current program position, because when
we later meet the return statement we must continue at the next
statement relative to the current position. We register the save
position with a special name, the :gosub symbol (line 4). And we set
the position where we want to jump (line 5). Actually we set the goto
index =96 1 (see the way we store the line index at line 9).
Saving the current position prepares the runtime for a later use when
the interpreter meets the return statement.
01 class ReturnStatement
02
03 def execute runtime
04
05 gosub =3D runtime.restore_pos
gosub)
06
07 raise "UNMATCHED 'GOSUB' AT LINE #{@line}" unless gosub
08
09 end
10
11 end
Testing the interpreter
The tests that validate the interpreter look like:
01
02 require 'test/unit'
03 require 'basic'
04
05 class TestBasic < Test::Unit::TestCase
06
07 def test_print_one_string
08
09 program =3D <<BASIC
10 10 PRINT 'HELLO'
11 20 END
12 BASIC
13
14 @basic.run program
15
16 assert_equal 1, @console.lines.length
17 assert_equal ["HELLO"], @console.lines
18
19 end
20
21 end
The test requires the unit test framework and the interpreter (line
3). We call the interpreter, at line 14, with the program. Next we
assert we printed out one line (at line 16) and the output content
(line 17). The console attribute is a false console that stores all
the printed lines. The interpreter needs a console and calls its print
method. We create a test console.
01 class TestConsole
02
03 attr_reader :lines
04
05 def initialize
06 @lines =3D []
07 end
08
09 def print line
10 @lines << line
11 end
12
13 end
The test setup creates the interpreter.
01 def setup
02 @console =3D TestConsole.new
03 @basic =3D BasicInterpreter.new(@console)
04 end
If the interpreter was as follow, the test would pass.
01 class BasicInterpreter
02
03 def initialize console
04 @console =3D console
05 end
06
07 def run program
08 @console.print "HELLO"
09 end
10
11 end
$>ruby basic_test.rb
Loaded suite basic_test
Started
E
Finished in 0.000369 seconds.
1) Error:
test_print_one_string(TestBasic):
NoMethodError: undefined method `run' for nil:NilClass
basic_test.rb:14:in `test_print_one_string'
1 tests, 0 assertions, 0 failures, 1 errors
Using programs for tests
We wrote some basic programs that we can use to validate our
interpreter in automated tests. By using the technique presented above
and adding some more methods, the tests become even easier to read.
Here if the factorial program.
10 REM
15 REM COMPUTE FACTORIAL
20 REM
25 LET F =3D 1
30 READ N
31 LET X =3D N
35 IF N =3D 0 THEN 55
40 LET F =3D F * N
45 LET N =3D N - 1
50 GOTO 35
55 PRINT "FACT", X, "IS", F
60 GOTO 25
65 DATA 1, 3, 5, 6, 20
99 END
Here is the test that runs the factorial program.
01 class TestRunningBasicPrograms < Test::Unit::TestCase
02
03 def test_factorial
04
05 script 'fact.bas'
06
07 expects 'FACT', 1, 'IS', 1
08 expects 'FACT', 3, 'IS', 6
09 expects 'FACT', 5, 'IS', 120
10 expects 'FACT', 6, 'IS', 720
11 expects 'FACT', 20, 'IS', 2432902008176640000
12
13 assert_output_as_expected
14
15 end
16
17 end
At line 5 we set the program file under test, from line 7 up to 11 we
declare the expectations and the last line (line 13) checks the
outcome.
Tests using the runtime
As the interpreter returns the runtime object, we can use it during
the assertion phase.
01 def test_1et
02
03 program =3D <<BASIC
04 10 LET A =3D (23 + 5) / 2
05 40 END
06 BASIC
07
08 runtime =3D @basic.run(program)
09
10 assert_equal 14, runtime[:A]
11
12 end
We run the program and store the runtime in the runtime variable (line
8). Then we assert that the value of the variable A of the program is
the one expected, using the runtime object.
Test results
The code attached to this summary contain tests. Here is the tests results.
$>ruby basic_parser_test.rb
Loaded suite basic_parser_test
Started
.............
Finished in 0.008335 seconds.
13 tests, 56 assertions, 0 failures, 0 errors
$>ruby basic_test.rb
Loaded suite basic_test
Started
.............
Finished in 0.020823 seconds.
13 tests, 23 assertions, 0 failures, 0 errors
You can download the code here.
Annex
Here is a BNF description of the language syntax.
<program> ::=3D <statement> {\n <statement>}
<statement> ::=3D <let> | <read> | <data> | <print> | <goto> | <if> |
<for> | <next> |
<end> | <stop> | <def> | <gosub> | <return> | <dim> | <r=
em>
<let> ::=3D LET <variable> =3D <expression>
<read> ::=3D READ <variable> {, <variable>}
<data> ::=3D DATA <number> {, <number>}
<print> ::=3D PRINT <label> | <label> <expression> | <expression>
<goto> ::=3D GOTO <line-number>
<if> ::=3D IF <expression> <relational> <expression> THEN <line-n=
umber>
<for> ::=3D FOR <unsubscripted-var> =3D <expression> TO
<expression> [STEP <expression>]
<next> ::=3D NEXT <unsubscripted-var>
<end> ::=3D END
<stop> ::=3D STOP
<def> ::=3D DEF FN <letter> (<unsubscripted-var>) =3D <expression>
<gosub> ::=3D GOSUB <line-number>
<return> ::=3D RETURN
<rem> ::=3D <string>
<label> ::=3D "<string>"
<relational> ::=3D < | =3D | > | <=3D | >=3D | <>
<unsubscripted-var> ::=3D <letter> [<digit>]
Expressions may contain the following arithmetic operations: +, -, *, /, ^.
Basic has next built-in functions: SIN, COS, TAN, ATN, EXP, ABS, LOG,
SQR, RND, INT