writing a compiler for Monster language using C language

S

Shravani

hi,
i have to write a compiler for Monster language using C.example
programs of the language are:

EXAMPLE 1

/* Test Program - Module Structure */

module fact

export N2N;
export fact;

type N2N is -> int ret int throws;

val fact : N2N;

fact := fun(n) { if n=0 then return 1 else return n*fact(n-1); };

end

module main

import fact.N2N;
import fact.fact;

type TLF is -> int ret throws;

val x:int;

val main : TLF;

main := fun(n) { val result:int; result:= fact(n); print result; };


end

EXAMPLE 2

module Stack

export Stack;

type OverFlowException is * errMsg : str * size : int;
type UnderFlowException is * errMsg : str;
type Stack is * push : -> int ret throws OverflowException
* pop : -> ret throws UnderFlowException
* top : -> ret int throws UnderFlowException
* size : -> ret int throws
* isEmpty : -> ret int throws;
type IntArray is [] int;


val s : Stack;

val maxSize : int;

val ofe : OverFlowException;
maxSize := 100;
ofe := { "push invalid: stack is full", 0 };

s := obj {
val contents : IntArray;
val _size : int;

contents := [0, 0, 0, 0 ,0];
_size := 0;

size := fun() { return _size; };
isEmpty := fun() { return (size()) = 0; };
top := fun() { val ufe : UnderFlowException;
ufe.errMsg := "top invalid: stack is empty" ;
if isEmpty() then throw ufe else return contents[size()];
};
pop := fun() { val ufe : UnderFlowException;
ufe.errMsg := "top invalid: stack is empty" ;
if isEmpty() then throw ufe else _size := _size - 1 ;
};
push := fun(value) { if _size<maxSize then { _size := _size+1;
contents[_size] := value; }
else throw ofe;
};
};

end

module mymodule

import Stack.Stack;
import Stack.s;

type TopLevelFun is -> ret throws;

val stackTest:TopLevelFun;

stackTest:=fun() { val x:int; s.push(0); s.push(1); s.push(2);
s.pop(); s.push(3);
x:= s.top();
print x; s.pop();
y:= s.top();
print y;
};



end



What i have to do is to write a compiler for such programs.in the
first phase i need to write a scanner that produces the token stream
of the input program.that is:
* Scanner
Requirements Specification:
Input: Program File
Output: Token Stream
Side Effects: Comments and White spaces removed
Exceptions: Invalid tokens, Invalid comment
Interface Requirements:
void initializeScanner(char *filename);
Token nextToken(); // should scan and return the next token void
printTokenStream(FILE f);

here's the list of tokens:
/* Keywords */
1. TK_KEY_MOD "module"
2. TK_KEY_END "end"
3. TK_KEY_IMPORT "import"
4. TK_KEY_EXPORT "export"
5. TK_KEY_TYPE "type"
6. TK_KEY_IS "is"
7. TK_KEY_INT "int"
8. TK_KEY_FLOAT "float"
9. TK_KEY_STR "str"
10. TK_KEY_RET "ret"
11. TK_KEY_THROWS "throws"
12. TK_KEY_VAL "val"
13. TK_KEY_FUN "fun"
14. TK_KEY_IF "if"
15. TK_KEY_THEN "then"
16. TK_KEY_ELSE "else"
17. TK_KEY_PRINT "print"
18. TK_KEY_OBJ "obj"
19. TK_KEY_RETURN "return"
20. TK_KEY_THROW "throw"
21. TK_KEY_THIS "this"


/* Separators */
1. TK_SEMI ';'
2. TK_LPAR '('
3. TK_RPAR ')'
4. TK_COMMA ','
5. TK_COLON ':'
6. TK_LSQUA '['
7. TK_RSQUA ']'
8. TK_LBRACE '{'
9. TK_RBRACE '}'


/* Operators */
1. TK_ARROW '->'
2. TK_STAR '*'
3. TK_ASSIGN ':='
4. TK_DOT '.'
5. TK_MINUS '-'
6. TK_PLUS '+'
7. TK_DIV '/'
8. TK_HASH '#'
9. TK_LESS '<'
10. TK_GREAT '>'
11. TK_EQUAL '='
12. TK_AND '&'
13. TK_OR '|'
14. TK_NOT '!'

/* Other */
1. TK_ID (ALPHA | UNDER) (ALPHA | DIGIT | UNDER)*
2. TK_INTVAL DIGIT+
3. TK_FLOATVAL DIGIT+ '.' DIGIT+
4. TK_STRVAL '"' (NOQUOTE)* '"'

where ALPHA is any alphabetic character,
DIGIT is any digit character,
UNDER is the underscore character, and
NOQUOTE is any character except the double-quote.


could anyone please help me out in writing this scanner?i need to take
the source program as input and produce the token stream,consisting of
tokens of the form described above,as the output.
 
M

Morris Dovey

Shravani said:
hi,
i have to write a compiler for Monster language using C.example
programs of the language are:

could anyone please help me out in writing this scanner?i need to take
the source program as input and produce the token stream,consisting of
tokens of the form described above,as the output.

You might find it helpful to write a BNF description of Monster.

You'll probably want:

A function to return the character at the current point in the
input stream.

A function to "eat" whitespace.

A function to match a specific character sequence at the current
point in the input stream.

A function to match a specific single character at the current
point in the input stream.

The ability to revert the current point in the input stream to
where it was before a specification failed, so that another
specification can be tested.

The ability to produce latent output.

The ability to promote latent output to actual output.

Function(s) for symbol table manipulation.
 
M

Malcolm McLean

Shravani said:
hi,
i have to write a compiler for Monster language using C.example
programs of the language are:
Use MiniBasic as the basis for your compiler.

Strip out all the BASIC keywords and add monster ones. You'll also need to
do a considerable amount of work on the variables (MiniBasic's are all
global). Basically start from a simple expression parser and work back up.

I've got a semi-compiled version of MiniBasic I could pass to you. This
tokenises expressions, indexes variables, and precalculates jumps. It also
needs to put expression into reverse Polish and optimise them, which isn't
done yet.
 
K

Keith Thompson

Shravani said:
i have to write a compiler for Monster language using C.example
programs of the language are:

EXAMPLE 1

/* Test Program - Module Structure */ [98 lines deleted]
end



What i have to do is to write a compiler for such programs.in the
first phase i need to write a scanner that produces the token stream
of the input program.that is: [74 lines deleted]


could anyone please help me out in writing this scanner?i need to take
the source program as input and produce the token stream,consisting of
tokens of the form described above,as the output.

What have you done so far? What specific problems have you run into?

Don't expect us to do your homework for you.
 
S

santosh

Shravani said:
hi,
i have to write a compiler for Monster language using C.example
programs of the language are: [...]
could anyone please help me out in writing this scanner?i need to take
the source program as input and produce the token stream,consisting of
tokens of the form described above,as the output.

See:

<http://www.compilers.net/>
<http://www.personal.kent.edu/~rmuhamma/Compilers/compiler.html>
<http://cs.wwc.edu/~aabyan/464/Book/>
<http://www.cs.sjsu.edu/~louden/cmptext/>
<http://channel9.msdn.com/ShowPost.aspx?PostID=305338>
<http://www.gnu.org/software/bison/>
<http://www.garshol.priv.no/download/text/bnf.html>
<http://www.cs.chalmers.se/~markus/BNFC/>

If you have problems in specific pieces of C code, you can post them
here.
 
B

Bartc

Shravani said:
hi,
i have to write a compiler for Monster language using C.example
programs of the language are:

Couldn't resist trying out my new C skills. My attempt at this tokeniser is
below.

The specification isn't quite the same as yours, function names are
different, the tokens are just printed to the console.

Not all punctuation symbols are recognised; these are easy to add. I've
added a couple extra tokens, tk_error, and tk_eof.

Some tokens with an associated value: tk_id, tk_intval, tk_strval, return
the value in global variables (this is frowned on apparently). Also floating
number constants are not recognised. And letter case I've assumed is
significant.

--
Bart

/* tokenise file "input" */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define tk_key_mod 1
#define tk_key_end 2
#define tk_key_import 3
#define tk_key_export 4
#define tk_key_type 5
#define tk_key_is 6
#define tk_key_int 7
#define tk_key_float 8
#define tk_key_str 9
#define tk_key_ret 10
#define tk_key_throws 11
#define tk_key_val 12
#define tk_key_fun 13
#define tk_key_if 14
#define tk_key_then 15
#define tk_key_else 16
#define tk_key_print 17
#define tk_key_obj 18
#define tk_key_return 19
#define tk_key_throw 20
#define tk_key_this 21
#define tk_semi 22
#define tk_lpar 23
#define tk_rpar 24
#define tk_comma 25
#define tk_colon 26
#define tk_lsqua 27
#define tk_rsqua 28
#define tk_lbrace 29
#define tk_rbrace 30
#define tk_arrow 31
#define tk_star 32
#define tk_assign 33
#define tk_dot 34
#define tk_minus 35
#define tk_plus 36
#define tk_div 37
#define tk_hash 38
#define tk_less 39
#define tk_great 40
#define tk_equal 41
#define tk_and 42
#define tk_or 43
#define tk_not 44
#define tk_id 45
#define tk_intval 46
#define tk_floatval 47
#define tk_strval 48
#define tk_eof 49
#define tk_error 50

#define eofchar EOF
#define maxid 257

char* tokennames[] = {
"",
"module", //tk_key_mod
"end", //tk_key_end
"import", //tk_key_import
"export", //tk_key_export
"type", //tk_key_type
"is", //tk_key_is
"int", //tk_key_int
"float", //tk_key_float
"str", //tk_key_str
"ret", //tk_key_ret
"throws", //tk_key_throws
"val", //tk_key_val
"fun", //tk_key_fun
"if", //tk_key_if
"then", //tk_key_then
"else", //tk_key_else
"print", //tk_key_print
"obj", //tk_key_obj
"return", //tk_key_return
"throw", //tk_key_throw
"this", //tk_key_this
";", //tk_semi
"(", //tk_lpar
")", //tk_rpar
",", //tk_comma
":", //tk_colon
"[", //tk_lsqua
"]", //tk_rsqua
"{", //tk_lbrace
"}", //tk_rbrace
"->", //tk_arrow
"*", //tk_star
":=", //tk_assign
".", //tk_dot
"-", //tk_minus
"+", //tk_plus
"/", //tk_div
"#", //tk_hash
"<", //tk_less
">", //tk_great
"=", //tk_equal
"&", //tk_and
"|", //tk_or
"!", //tk_not
"$ident", //tk_id
"$intconst", //tk_intval
"$realconst", //tk_floatval
"$stringconst", //tk_strval
"<EOF>", //tk_eof
"!ERROR!"}; //tk_error

int lookedahead;
int eofreached;
int nextchar;

char alphaname[maxid];
int numbervalue;
char stringvalue[maxid];

#define testfile "input"

void scannerinit(void);
int getnextchar(FILE*);
int peeknextchar(FILE*);
int getnexttoken(FILE*);
int lookupkeyword(char*);

int main(void)
{FILE *f;
int t;

f=fopen(testfile,"rb");
if (f==0)
{puts("Couldn't find input file");
return EXIT_FAILURE;
};

scannerinit();

while(1)
{
t=getnexttoken(f);

switch (t)
{
case tk_id:
printf("Id: %s\n",alphaname);
break;
case tk_intval:
printf("Int const: %d\n",numbervalue);
break;
case tk_strval:
printf("String: %s\n",stringvalue);
break;
default:
printf("Token: %s\n",tokennames[t]);
};

if (t==tk_eof) break;
};

fclose(f);
return EXIT_SUCCESS;
}

void scannerinit()
{
lookedahead=0;
eofreached=0;
}

int getnexttoken(FILE* f)
{int c,d;
char name[maxid];
int i,t;

while(1)
{
c=getnextchar(f);

if (c==eofchar) return tk_eof;

switch (c)
{
case 13: case 10: case 32: case 9:
break;

case ';': return tk_semi;
case '(': return tk_lpar;
case ')': return tk_rpar;
case '{': return tk_lbrace;
case '}': return tk_rbrace;
case ',': return tk_rpar;

case ':':
d=peeknextchar(f);
if (d=='=')
{ getnextchar(f);
return tk_assign;
};
return tk_colon;

case '/':
d=peeknextchar(f);
if (d!='*') return tk_div;
getnextchar(f);

while (1)
{
c=getnextchar(f);
if (c==eofchar) return tk_eof;
if (c=='*' && peeknextchar(f)=='/')
{ getnextchar(f);
break;
}
};
break;

case '"':
i=0;
while(1)
{c=getnextchar(f);
if (c==eofchar) break;
if (c=='"') break;
stringvalue[i++]=c;
};
stringvalue=0;

return tk_strval;
default:
goto xyzzy;

};

};

xyzzy:

if ((c>='A' && c<'Z') || (c>='a' && c<='z') || c=='_')
{
alphaname[0]=c;
i=1;

while(1)
{c=peeknextchar(f);
if ((c>='A' && c<'Z') || (c>='a' && c<='z') || c=='_' || (c>='0' &&
c<='9'))
{getnextchar(f);
alphaname[i++]=c;
}
else
break;
};
alphaname=0;

t=lookupkeyword(alphaname);
if (t) return t;
return tk_id;
};

if (c>='0' && c<'9')
{numbervalue=c-'0';
while (1)
{c=peeknextchar(f);
if (c>='0' && c<='9')
{getnextchar(f);
numbervalue = numbervalue*10+c-'0';
}
else
break;

};

return tk_intval;
};

return tk_error;
}

int getnextchar(FILE* f)
{
int c;

if (eofreached) return eofchar;

if (lookedahead)
{
lookedahead=0;
return nextchar;
}

c=fgetc(f);

if (c==EOF)
{ eofreached=1;
return eofchar;
};
return c;
}

int peeknextchar(FILE* f)
{
if (lookedahead) return nextchar;
nextchar=getnextchar(f);
lookedahead=1;
return nextchar;
}


int lookupkeyword(char* name)
{int i;

for (i=tk_key_mod; i<=tk_key_this; ++i)
if (strcmp(tokennames,name)==0)
return i;
return 0;
}
 
B

Bartc

Bartc said:
Couldn't resist trying out my new C skills. My attempt at this tokeniser
is below.


Oh, and I've assumed ASCII input. If this is homework you might lose marks
for making any such assumptions. The constants 13, 10, 32 and 9 represent
likely whitespace in ASCII.
 
K

Keith Thompson

Bartc said:
Oh, and I've assumed ASCII input. If this is homework you might lose marks
for making any such assumptions. The constants 13, 10, 32 and 9 represent
likely whitespace in ASCII.

And if he submits the solution you posted, he might lose marks for
plagiarism.

We traditionally don't do people's homework for them.
 
B

Bartc

Keith Thompson said:
And if he submits the solution you posted, he might lose marks for
plagiarism.

My code was just to give ideas. And there's enough missing or wrong bits to
give scope for much more work. Anyway I'm sure the OPs style will be very
different and he knows that. And f he's doing a whole compiler, it will get
a lot harder.
We traditionally don't do people's homework for them.

I was using it to exercise my own skills, then decided to upload my attempt.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,240
Members
46,828
Latest member
LauraCastr

Latest Threads

Top