newbie help : lexer in c++

V

vaib

hi to all . this may seem quite out of place . i've opted for building
a lexical analyzer using c++ for my summer project . i know the
required automata theory and i know c++ too . i've also read the 'lex
and yacc' and now i'm stuck and i'm still not able to figure out how
to go about starting the coding part or where to begin it .i would
highly appreciate if any of you expert out there could write me a
simple plan or algorithm that i should follow for this project . also
if you any good book or internet resource that can help me in
designing a lexer , it would be very helpful . thanking all in
anticipation.
vaibhav.
 
R

Ron AF Greve

Hi,

I take it that you are writing a hard coded lexer (and not using flex/lex or
antlr).

I think the following algorithm would make a nice start for a lexer
recognizing float, ints keywords

Assign all keywords and token types a code
Create a record with a string and a long (token and code) to return etc

Read token from input ( stream >> string)
Check if token is a key word (use a map for that) if it is found return the
record( token, code)
if not check if it is a float (integer dot integer) if so return record(
token, code for float)
if not check if is a integer (i,e only 0-9)record( token, code for integer)
if not check if it is an id ( i.e. _ a-z A-Z followed by _ a-z A-Z 0-9)

Though I myself would always use flex/lex or antlr for this.


Regards, Ron AF Greve

http://www.InformationSuperHighway.eu
 
B

BobR

Here, I've re-arranged your message to what your thinking seems to be:

---------
ue.yawhgiHrepuSnoitamrofnI.www//:ptth

everG FA noR ,sdrageR

..siht rof rltna ro xel/xelf esu syawla dluow flesym I hguohT

)9-0 Z-A z-a _ yb dewollof Z-A z-a _ .e.i ( di na si ti fi kcehc ton fi
)regetni rof edoc ,nekot (drocer)9-0 ylno e,i( regetni a si fi kcehc ton fi
)taolf rof edoc ,nekot
(drocer nruter os fi )regetni tod regetni( taolf a si ti fi kcehc ton fi
)edoc ,nekot (drocer
eht nruter dnuof si ti fi )taht rof pam a esu( drow yek a si nekot fi kcehC
)gnirts >> maerts ( tupni morf nekot daeR

cte nruter ot )edoc dna nekot( gnol a dna gnirts a htiw drocer a etaerC
edoc a sepyt nekot dna sdrowyek lla ngissA
sdrowyek stni ,taolf gnizingocer
rexel a rof trats ecin a ekam dluow mhtirogla gniwollof eht kniht I

..)rltna ro xel/xelf gnisu ton dna( rexel dedoc drah a gnitirw era uoy taht
ti ekat I

,iH
 
D

dasjotre

hi to all . this may seem quite out of place . i've opted for building
a lexical analyzer using c++ for my summer project . i know the
required automata theory and i know c++ too . i've also read the 'lex
and yacc' and now i'm stuck and i'm still not able to figure out how
to go about starting the coding part or where to begin it .i would
highly appreciate if any of you expert out there could write me a
simple plan or algorithm that i should follow for this project . also
if you any good book or internet resource that can help me in
designing a lexer , it would be very helpful . thanking all in
anticipation.
vaibhav.

there is so much code available using lex and yacc it
shouldn't take you much to find something to help you learn how
it works. in a nutshell lex and yacc take their input in
their own languages and output c code which you use to
in your code.

i.e.
you write lexical definitions and run lex to produce
lexical processing code.
you write grammar definition file and run yacc
to take that and some output from previous stage to
produce parser code.
you use produced code in your project as you need.

for C++ I would recommend boost::spirit library. It is well
designed and documented and unlike lex/yacc the whole
process is done in C++ only, no specialized languages
or intermediary steps.

regards

DS
 
V

vaib

Hi,

I take it that you are writing a hard coded lexer (and not using flex/lex or
antlr).

I think the following algorithm would make a nice start for a lexer
recognizing float, ints keywords

Assign all keywords and token types a code
Create a record with a string and a long (token and code) to return etc

Read token from input ( stream >> string)
Check if token is a key word (use a map for that) if it is found return the
record( token, code)
if not check if it is a float (integer dot integer) if so return record(
token, code for float)
if not check if is a integer (i,e only 0-9)record( token, code for integer)
if not check if it is an id ( i.e. _ a-z A-Z followed by _ a-z A-Z 0-9)

Though I myself would always use flex/lex or antlr for this.

Regards, Ron AF Greve

http://www.InformationSuperHighway.eu

thank u very much i think i've got it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,292
Messages
2,571,494
Members
48,180
Latest member
DelmarCarv

Latest Threads

Top