A
Andrew James
Gentlemen,
I'm running into a problem whilst testing the parsing of a language I've
created with TPG . It seems that for some reason, TPG balks when I try
to parse an expression whose first letter is 't' (or, in fact, at any
time when 't' is at the beginning of a token). This doesn't happen with
any other letter (as far as I know), nor if the 'T' is capitalised.
My grammar looks like this:
# Tokens
separator space '\s+';
token Num '\d+(.\d+)?';
token Ident '[a-zA-Z]\w*';
token CharList '\'.*\'';
token CatUnOp '~';
token CatOp '[/\^~]';
token MetaOp '[=\+\-!]';
token Date '\d\d-\d\d-\d\d\d\d';
token FileID '(\w+\.\w+)'
;
# Rules
START -> CatExpr '\?' '[' MetaExpr ']'
| CatExpr
| FileID
;
CatExpr -> CatUnOp CatName
| CatName (CatOp CatName)*
| CatName
;
CatName -> Ident
#| '(' CatExpr ')'
;
MetaExpr -> MetaCrit (',' MetaCrit)*
;
MetaCrit -> Ident MetaOp Value
;
Value -> CharList | Num | Date
;
My test script like this:
if __name__ == '__main__':
""" For testing purposes only """
parseTests = ('This/is/a/simple/test', 'another/simple/test',
"a/test/with/[author='drew']")
for line in parseTests:
try:
print "\nParsing: %s \n%s\n" % (line,"="*(len(line)+9))
qp = MFQueryParser()
print qp(line)
except Exception, inst:
print "EXCEPTION: " + str(inst)
The output when using a letter which is not 't':
Parsing: another/simple/test
============================
[ 1][ 2]START.CatExpr: (1,1) Ident another != CatUnOp
[ 2][ 3]START.CatExpr.CatName: (1,1) Ident another == Ident
[ 3][ 2]START.CatExpr: (1,8) CatOp / == CatOp
[ 4][ 3]START.CatExpr.CatName: (1,9) Ident simple == Ident
[ 5][ 2]START.CatExpr: (1,15) CatOp / == CatOp
[ 6][ 3]START.CatExpr.CatName: (1,16) _tok_2 t != Ident
[ 7][ 1]START: (1,15) CatOp / != _tok_1
[ 8][ 2]START.CatExpr: (1,1) Ident another != CatUnOp
[ 9][ 3]START.CatExpr.CatName: (1,1) Ident another == Ident
[ 10][ 2]START.CatExpr: (1,8) CatOp / == CatOp
[ 11][ 3]START.CatExpr.CatName: (1,9) Ident simple == Ident
[ 12][ 2]START.CatExpr: (1,15) CatOp / == CatOp
[ 13][ 3]START.CatExpr.CatName: (1,16) _tok_2 t != Ident
EXCEPTION: SyntacticError at line 1, row 16: Syntax error near t
The output when using 't' as the first letter:
Parsing: tanother/simple/test
=============================
[ 1][ 2]START.CatExpr: (1,1) _tok_2 t != CatUnOp
[ 2][ 3]START.CatExpr.CatName: (1,1) _tok_2 t != Ident
[ 3][ 3]START.CatExpr.CatName: (1,1) _tok_2 t != Ident
[ 4][ 2]START.CatExpr: (1,1) _tok_2 t != CatUnOp
[ 5][ 3]START.CatExpr.CatName: (1,1) _tok_2 t != Ident
[ 6][ 3]START.CatExpr.CatName: (1,1) _tok_2 t != Ident
[ 7][ 1]START: (1,1) _tok_2 t != FileID
EXCEPTION: SyntacticError at line 1, row 1: Syntax error near t
I'm not sure whether this is something I'm doing wrong in my regular
expressions or whether something is being escaped in the TPG code, or
whether.... I just don't know!
I'm going through the TPG code at the moment but am not very hopeful of
finding the problem. Could someone possibly just let me know if I've
made an obvious mistake somewhere?
Many thanks,
Andrew
I'm running into a problem whilst testing the parsing of a language I've
created with TPG . It seems that for some reason, TPG balks when I try
to parse an expression whose first letter is 't' (or, in fact, at any
time when 't' is at the beginning of a token). This doesn't happen with
any other letter (as far as I know), nor if the 'T' is capitalised.
My grammar looks like this:
# Tokens
separator space '\s+';
token Num '\d+(.\d+)?';
token Ident '[a-zA-Z]\w*';
token CharList '\'.*\'';
token CatUnOp '~';
token CatOp '[/\^~]';
token MetaOp '[=\+\-!]';
token Date '\d\d-\d\d-\d\d\d\d';
token FileID '(\w+\.\w+)'
;
# Rules
START -> CatExpr '\?' '[' MetaExpr ']'
| CatExpr
| FileID
;
CatExpr -> CatUnOp CatName
| CatName (CatOp CatName)*
| CatName
;
CatName -> Ident
#| '(' CatExpr ')'
;
MetaExpr -> MetaCrit (',' MetaCrit)*
;
MetaCrit -> Ident MetaOp Value
;
Value -> CharList | Num | Date
;
My test script like this:
if __name__ == '__main__':
""" For testing purposes only """
parseTests = ('This/is/a/simple/test', 'another/simple/test',
"a/test/with/[author='drew']")
for line in parseTests:
try:
print "\nParsing: %s \n%s\n" % (line,"="*(len(line)+9))
qp = MFQueryParser()
print qp(line)
except Exception, inst:
print "EXCEPTION: " + str(inst)
The output when using a letter which is not 't':
Parsing: another/simple/test
============================
[ 1][ 2]START.CatExpr: (1,1) Ident another != CatUnOp
[ 2][ 3]START.CatExpr.CatName: (1,1) Ident another == Ident
[ 3][ 2]START.CatExpr: (1,8) CatOp / == CatOp
[ 4][ 3]START.CatExpr.CatName: (1,9) Ident simple == Ident
[ 5][ 2]START.CatExpr: (1,15) CatOp / == CatOp
[ 6][ 3]START.CatExpr.CatName: (1,16) _tok_2 t != Ident
[ 7][ 1]START: (1,15) CatOp / != _tok_1
[ 8][ 2]START.CatExpr: (1,1) Ident another != CatUnOp
[ 9][ 3]START.CatExpr.CatName: (1,1) Ident another == Ident
[ 10][ 2]START.CatExpr: (1,8) CatOp / == CatOp
[ 11][ 3]START.CatExpr.CatName: (1,9) Ident simple == Ident
[ 12][ 2]START.CatExpr: (1,15) CatOp / == CatOp
[ 13][ 3]START.CatExpr.CatName: (1,16) _tok_2 t != Ident
EXCEPTION: SyntacticError at line 1, row 16: Syntax error near t
The output when using 't' as the first letter:
Parsing: tanother/simple/test
=============================
[ 1][ 2]START.CatExpr: (1,1) _tok_2 t != CatUnOp
[ 2][ 3]START.CatExpr.CatName: (1,1) _tok_2 t != Ident
[ 3][ 3]START.CatExpr.CatName: (1,1) _tok_2 t != Ident
[ 4][ 2]START.CatExpr: (1,1) _tok_2 t != CatUnOp
[ 5][ 3]START.CatExpr.CatName: (1,1) _tok_2 t != Ident
[ 6][ 3]START.CatExpr.CatName: (1,1) _tok_2 t != Ident
[ 7][ 1]START: (1,1) _tok_2 t != FileID
EXCEPTION: SyntacticError at line 1, row 1: Syntax error near t
I'm not sure whether this is something I'm doing wrong in my regular
expressions or whether something is being escaped in the TPG code, or
whether.... I just don't know!
I'm going through the TPG code at the moment but am not very hopeful of
finding the problem. Could someone possibly just let me know if I've
made an obvious mistake somewhere?
Many thanks,
Andrew