How to organize the data structure in a parser?

A

Andy

Hi, all

I am trying to design a parser for C program using C++. Currently what
I did for syntax tree is to design a class for each nontermials in the
grammar, and use inherentance to link them. For example, as for the
expression part, the classes are something like the follows:

.....
class MultiplicativeExpr : public AdditiveExpr
{
MultiplicativeExpr * multi_expr;
int op_type;
PmExpr * pm_expr;
};

class AdditiveExpr : public ShiftExpr
{
int op_type;
AdditiveExpr * additive_expr;
MultiplicativeExpr * multi_expr;
};

class ShiftExpr : public RelationalExpr
{
int op_type;
ShiftExpr * shift_expr;
AdditiveExpr * additive_expr;
};

class RelationalExpr : public EqualityExpr
{
int op_type;
RelationalExpr * relational_expr;
ShiftExpr * shift_expr;
};

.....

I noticed that the bad thing about this solution is that using
inhereitance, the children in the leaf of inheritance tree will be of
large size because it comprised of all data fields of its ancestors.
How can I organize the data structure more efficienty?
Thanks a lot!

Andy
 
S

shez

Andy said:
Hi, all

I am trying to design a parser for C program using C++. Currently what
I did for syntax tree is to design a class for each nontermials in the
grammar, and use inherentance to link them. For example, as for the
expression part, the classes are something like the follows:

....
class MultiplicativeExpr : public AdditiveExpr
{
MultiplicativeExpr * multi_expr;
int op_type;
PmExpr * pm_expr;
};

class AdditiveExpr : public ShiftExpr
{
int op_type;
AdditiveExpr * additive_expr;
MultiplicativeExpr * multi_expr;
};

[snip]

You're going about this the wrong way. Why is 'MultiplicativeExpr' an
'AdditiveExpr'. I guess in your grammar you have something like this,
but it is not the same when designing C++ classes. You express it like
this in your grammar in order to enforce *operator precedence*, not to
enforce *relationships* (which is what OO inheritance expresses). A
more OO approach might be something like:

class Expr {
virtual double evaluate() = 0;
}

class AddExpr : public Expr {
Expr *left;
Expr *right;
virtual double evaluate() { return left->evaluate() +
right.evaluate(); }
}

class MultiplyExpr : public Expr {
Expr *left;
Expr *right;
virtual double evaluate() { return left->evaluate() *
right.evaluate(); }
}

.... and so on ...

Here, an 'AddExpr' IS AN Expr. Likewise a 'MultiplyExpr' IS AN Expr.
However, a 'MultiplyExpr' is NOT an 'AddExpr'.

-shez-
 
A

Andy

Yes, your method is one way to solve the problem. But it will
complicate the parsing process if we design the data structure in this
way. I think a hierarchical way for expressions and declarations will
be more suitable for parser, especially we want to do it in
recursive-descent way.

For example, the following gramar, can you find a proper way to deal
with it?

postfix-expression:
primary-expression
postfix-expression [ expression ]
postfix-expression ( expression-list_opt )
simple-type-specfier ( expression-list_opt )
typename ::_opt nested-name-specifier identifier (
expresion-list_opt )
postfix-expression . template_opt id-expression
postfix-expresion ->template_opt id-expression
.....
dynamic_cast <type-id> ( expression )
static_cast <type-id> ( expression )

Thanks!

Andy
 
J

Jeff Flinn

Andy said:
Hi, all

I am trying to design a parser for C program using C++. Currently what
I did for syntax tree is to design a class for each nontermials in the
grammar, and use inherentance to link them. For example, as for the
expression part, the classes are something like the follows:

Have you looked at the boost::spirit parser framework at www.boost.org?

Jeff
 
S

shez

Andy said:
Yes, your method is one way to solve the problem. But it will
complicate the parsing process if we design the data structure in this
way. I think a hierarchical way for expressions and declarations will
be more suitable for parser, especially we want to do it in
recursive-descent way.

For example, the following gramar, can you find a proper way to deal
with it?

postfix-expression:
primary-expression
postfix-expression [ expression ]
postfix-expression ( expression-list_opt )
simple-type-specfier ( expression-list_opt )
typename ::_opt nested-name-specifier identifier (
expresion-list_opt )
postfix-expression . template_opt id-expression
postfix-expresion ->template_opt id-expression
....
dynamic_cast <type-id> ( expression )
static_cast <type-id> ( expression )

Thanks!

Andy

You can have a class for each of these expressions, all of which derive
from the Expr class. Even if you write a recursive-descent parser,
your data structures can be independent from your grammar. This lets
you generate code that is not tied too closely with the syntax of the
source language. Try to make each expression a class with a virtual
function that can be used to evaluate the expression. This seems like
the easiest & most flexible approach to me.

-shez-
 
A

ab2305

shez said:
Andy said:
Hi, all

I am trying to design a parser for C program using C++. Currently what
I did for syntax tree is to design a class for each nontermials in the
grammar, and use inherentance to link them. For example, as for the
expression part, the classes are something like the follows:

....
class MultiplicativeExpr : public AdditiveExpr
{
MultiplicativeExpr * multi_expr;
int op_type;
PmExpr * pm_expr;
};

class AdditiveExpr : public ShiftExpr
{
int op_type;
AdditiveExpr * additive_expr;
MultiplicativeExpr * multi_expr;
};

[snip]

You're going about this the wrong way. Why is 'MultiplicativeExpr' an
'AdditiveExpr'. I guess in your grammar you have something like this,
but it is not the same when designing C++ classes. You express it like
this in your grammar in order to enforce *operator precedence*, not to
enforce *relationships* (which is what OO inheritance expresses). A
more OO approach might be something like:

class Expr {
virtual double evaluate() = 0;
}

class AddExpr : public Expr {
Expr *left;
Expr *right;
virtual double evaluate() { return left->evaluate() +
right.evaluate(); }
}

class MultiplyExpr : public Expr {
Expr *left;
Expr *right;
virtual double evaluate() { return left->evaluate() *
right.evaluate(); }
}

... and so on ...

Here, an 'AddExpr' IS AN Expr. Likewise a 'MultiplyExpr' IS AN Expr.
However, a 'MultiplyExpr' is NOT an 'AddExpr'.

-shez-


Shezan is that you -- from Columbia????
 
A

Andy

how can we deal with the following grammar:
.....
multiplicative-expr : pm-expr
| multiplicative-expr * pm-expr
| multiplicative-expr / pm-expr
| multiplicative-expr % pm-expr

additive-expr : multiplicative-expr
| additive-expr + multiplicative-expr
| additive-expr - multiplicative-expr

shift-expr : additive-expr
| shift-expr << additive-expr
| shift-expr >> additive-expr
......

If it happend to be
shift-expr ==> additive-expr ==> multiplicative-expr => pm-expr =>
.....
The syntax tree will be unnecessary high. I was thinking that with
inheritance, we can avoid the unnecessary heigth in sytax tree. If we
design a class for each nontermial and use Expr base class and virtual
functions, can we avoid it?

Thanks!

Andy
 
S

shez

Andy said:
how can we deal with the following grammar:
....
multiplicative-expr : pm-expr
| multiplicative-expr * pm-expr
| multiplicative-expr / pm-expr
| multiplicative-expr % pm-expr

additive-expr : multiplicative-expr
| additive-expr + multiplicative-expr
| additive-expr - multiplicative-expr

shift-expr : additive-expr
| shift-expr << additive-expr
| shift-expr >> additive-expr
.....

If it happend to be
shift-expr ==> additive-expr ==> multiplicative-expr => pm-expr =>
....
The syntax tree will be unnecessary high. I was thinking that with
inheritance, we can avoid the unnecessary heigth in sytax tree. If we
design a class for each nontermial and use Expr base class and virtual
functions, can we avoid it?

Thanks!

Andy

To me, a class for each _operation_ seems most useful.

-shez-
 
A

Andy

yes, you are right. The strucutre of the syntax tree can be independent
of the grammar.

Thanks!

andy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,199
Messages
2,571,045
Members
47,643
Latest member
ashutoshjha_1101

Latest Threads

Top