Help improve program for parsing simple rules

P

pruebauno

Another interesting task for those that are looking for some
interesting problem:
I inherited some rule system that checks for programmers program
outputs that to be ported: given some simple rules and the values it
has to determine if the program is still working correctly and give
the details of what the values are. If you have a better idea of how
to do this kind of parsing please chime in. I am using tokenize but
that might be more complex than it needs to be. This is what I have
come up so far:

rules=[
'( A - B ) = 0',
'(A + B + C + D + E + F + G + H + I) = J',
'(A + B + C + D + E + F + G + H) = I',
'(A + B + C + D + E + F) = G',
'(A + B + C + D + E) = (F + G + H + I + J)',
'(A + B + C + D + E) = (F + G + H + I)',
'(A + B + C + D + E) = F',
'(A + B + C + D) = (E + F + G + H)',
'(A + B + C) = (D + E + F)',
'(A + B) = (C + D + E + F)',
'(A + B) = (C + D)',
'(A + B) = (C - D + E - F - G + H + I + J)',
'(A + B) = C',
'(A + B) = 0',
'(A+B+C+D+E) = (F+G+H+I+J)',
'(A+B+C+D) = (E+F+G+H)',
'(A+B+C+D)=(E+F+G+H)',
'(A+B+C)=(D+E+F)',
'(A+B)=(C+D)',
'(A+B)=C',
'(A-B)=C',
'(A/(B+C))',
'(G + H) = I',
'-0.99 LE ((A+B+C)-(D+E+F+G)) LE 0.99',
'-0.99 LE (A-(B+C)) LE 0.99',
'-1000.00 LE A LE 0.00',
'-5000.00 LE A LE 0.00',
'A < B',
'A < 7000',
'A = -(B)',
'A = C',
'A = 0',
'A GT 0',
'A GT 0.00',
'A GT 7.00',
'A LE B',
'A LT -1000.00',
'A LT -5000',
'A LT 0',
'A=(B+C+D)',
'A=B',
'I = (G + H)',
'0.00 LE A LE 4.00',
'4.00 LT A LE 7.00'
]
vars_={'A': 0, 'B': 1.1, 'C': 2.2, 'D': 3.3, 'E': 4.4, 'F': 5.5, 'G':
6.6, 'H':7.7, 'I':8.8, 'J':9.9}

import tokenize as TOK
import StringIO as SIO

COMP_REPLACERS={'LT':'<', 'GT':'>', 'LE':'<=', 'GE':'>=', '=':'==',
'=>':'=>', '=<':'=<'}

def get_tokens(string):
return [x[:2] for x in TOK.generate_tokens(SIO.StringIO
(string).readline)][:-1]

def replace_comps(toks,repl):
return [(TOK.OP, repl[x[1]]) if x[1] in repl else x for x in toks]

def replace_names(norm,vars_):
return [(TOK.NUMBER,str(vars_.get(x[1],x[1]))) if x[0]==TOK.NAME
else x for x in norm]

def split_seccions(subs,comp_split):
l=[]
g=[]
for type_,value in subs:
if value in comp_split:
g.append((l,value))
l=[]
else:
l.append((type_,value))
g.append((l,''))
return g

def all_seccions(grps):
return [(TOK.untokenize(lst),comper) for lst,comper in grps]

def calc_seccions(rep):
return [(str(eval(comp,{},{})),comper) for comp,comper in rep]

def calc_deltas(calc):
return [eval(calc[0]+'-'+calc[i+1][0],{},{}) for i in range(len
(calc)-1)]

def main():
for cur_rule in rules[20:26]:
tokens=get_tokens(cur_rule)
normal=replace_comps(tokens,COMP_REPLACERS)
subst=replace_names(normal,vars_)
groups=split_seccions(subst,COMP_REPLACERS.values())
rep=all_seccions(groups)
rep_out=''.join(x[0]+x[1] for x in rep)
calc=calc_seccions(rep)
calc_out=''.join(x[0]+x[1] for x in calc)
deltas=calc_deltas(calc)
result=eval(calc_out,{},{})
print
print 'Values:',', '.join(str(key)+': '+str(val) for key,val
in sorted(vars_.iteritems()))
print 'Read rule: ',cur_rule
print 'Used rule: ',TOK.untokenize(normal)
print 'Substitution:',rep_out
print 'Calculation: ',calc_out
print 'Result: ','Successful' if result else 'Failed'
if not result and '==' in calc_out:
print 'Difference: ',', '.join(map(str,deltas))
print '='*40


if __name__=='__main__': main()
 
A

Aaron Brady

Another interesting task for those that are looking for some
interesting problem:
I inherited some rule system that checks for programmers program
outputs that to be ported: given some simple rules and the values it
has to determine if the program is still working correctly and give
the details of what the values are. If you have a better idea of how
to do this kind of parsing please chime in. I am using tokenize but
that might be more complex than it needs to be. This is what I have
come up so far:

rules=[
         '( A - B ) = 0',
         '(A + B + C + D + E + F + G + H + I) = J',
         '(A + B + C + D + E + F + G + H) = I',
         '(A + B + C + D + E + F) = G',
         '(A + B + C + D + E) = (F + G + H + I + J)',
         '(A + B + C + D + E) = (F + G + H + I)',
         '(A + B + C + D + E) = F',
         '(A + B + C + D) = (E + F + G + H)',
         '(A + B + C) = (D + E + F)',
         '(A + B) = (C + D + E + F)',
         '(A + B) = (C + D)',
         '(A + B) = (C - D + E - F - G + H + I + J)',
         '(A + B) = C',
         '(A + B) = 0',
         '(A+B+C+D+E) = (F+G+H+I+J)',
         '(A+B+C+D) = (E+F+G+H)',
         '(A+B+C+D)=(E+F+G+H)',
         '(A+B+C)=(D+E+F)',
         '(A+B)=(C+D)',
         '(A+B)=C',
         '(A-B)=C',
         '(A/(B+C))',
         '(G + H) = I',
         '-0.99 LE ((A+B+C)-(D+E+F+G)) LE 0.99',
         '-0.99 LE (A-(B+C)) LE 0.99',
         '-1000.00 LE A LE 0.00', snip
def main():
    for cur_rule in rules[20:26]:
        tokens=get_tokens(cur_rule)
        normal=replace_comps(tokens,COMP_REPLACERS)
        subst=replace_names(normal,vars_)
        groups=split_seccions(subst,COMP_REPLACERS.values())
        rep=all_seccions(groups)
        rep_out=''.join(x[0]+x[1] for x in rep)
        calc=calc_seccions(rep)
        calc_out=''.join(x[0]+x[1] for x in calc)
        deltas=calc_deltas(calc)
        result=eval(calc_out,{},{})
snip

You are using 'eval' which isn't safe if its inputs are dangerous. If
you are controlling the inputs, you might be interested in the
optional arguments to 'eval'.
False

The syntax is slightly different for Python 2. For the replacement of
'LE', I assume you require spaces on both sides.
False

If you need something more flexible, the 'ast' module gives you more
options. The string has to be a valid Python module to start with.

FYI, have you checked order of precedence in your custom rule set to
match Python's?
 
J

John Machin

COMP_REPLACERS={'LT':'<', 'GT':'>', 'LE':'<=', 'GE':'>=', '=':'==',
'=>':'=>', '=<':'=<'}

What do the '=>' and '=<' represent? Why are you replacing each by
itself?
 
P

pruebauno

Another interesting task for those that are looking for some
interesting problem:
I inherited some rule system that checks for programmers program
outputs that to be ported: given some simple rules and the values it
has to determine if the program is still working correctly and give
the details of what the values are. If you have a better idea of how
to do this kind of parsing please chime in. I am using tokenize but
that might be more complex than it needs to be. This is what I have
come up so far:
rules=[
         '( A - B ) = 0',
         '(A + B + C + D + E + F + G + H + I) = J',
         '(A + B + C + D + E + F + G + H) = I',
         '(A + B + C + D + E + F) = G',
         '(A + B + C + D + E) = (F + G + H + I + J)',
         '(A + B + C + D + E) = (F + G + H + I)',
         '(A + B + C + D + E) = F',
         '(A + B + C + D) = (E + F + G + H)',
         '(A + B + C) = (D + E + F)',
         '(A + B) = (C + D + E + F)',
         '(A + B) = (C + D)',
         '(A + B) = (C - D + E - F - G + H + I + J)',
         '(A + B) = C',
         '(A + B) = 0',
         '(A+B+C+D+E) = (F+G+H+I+J)',
         '(A+B+C+D) = (E+F+G+H)',
         '(A+B+C+D)=(E+F+G+H)',
         '(A+B+C)=(D+E+F)',
         '(A+B)=(C+D)',
         '(A+B)=C',
         '(A-B)=C',
         '(A/(B+C))',
         '(G + H) = I',
         '-0.99 LE ((A+B+C)-(D+E+F+G)) LE 0.99',
         '-0.99 LE (A-(B+C)) LE 0.99',
         '-1000.00 LE A LE 0.00', snip
def main():
    for cur_rule in rules[20:26]:
        tokens=get_tokens(cur_rule)
        normal=replace_comps(tokens,COMP_REPLACERS)
        subst=replace_names(normal,vars_)
        groups=split_seccions(subst,COMP_REPLACERS.values())
        rep=all_seccions(groups)
        rep_out=''.join(x[0]+x[1] for x in rep)
        calc=calc_seccions(rep)
        calc_out=''.join(x[0]+x[1] for x in calc)
        deltas=calc_deltas(calc)
        result=eval(calc_out,{},{})

snip

You are using 'eval' which isn't safe if its inputs are dangerous.  If
you are controlling the inputs, you might be interested in the
optional arguments to 'eval'.

False

The syntax is slightly different for Python 2.  For the replacement of
'LE', I assume you require spaces on both sides.

False

If you need something more flexible, the 'ast' module gives you more
options.  The string has to be a valid Python module to start with.

FYI, have you checked order of precedence in your custom rule set to
match Python's?

I know about evals implication of safety. Rules are defined by the
programmers so I don't worry too much about it at this point. They
should know better than messing up their application server. Unless
there is some easier way to do it I am willing to take the risk.
Precedence is standard math, we can always add some extra parenthesis
to the rules, I don't thing the old system would mind.

I thought about using eval with a locals dictionary, but they want
output of the intermediate values. I want to avoid the situation where
the intermediate output does not match what eval is doing.
 
P

pruebauno

What do the '=>' and '=<' represent? Why are you replacing each by
itself?

because of this:
groups=split_seccions(subst,COMP_REPLACERS.values())

I didn't want to create a separate variable. Those are dubious anyway,
I haven't seen one in all of our current rules and it generates a
syntax error in Python so I took them out:

COMP_REPLACERS={'LT':'<', 'GT':'>', 'LE':'<=', 'GE':'>=', '=':'=='}

And before somebody else points it out. This line:

(TOK.NUMBER,str(vars_.get(x[1],x[1]))) if x[0]==TOK.NAME

should really be:

(TOK.NUMBER,str(vars_[x[1]])) if (x[0]==TOK.NAME and x[1] in vars_)
 
P

Paul McGuire

Another interesting task for those that are looking for some
interesting problem:
I inherited some rule system that checks for programmers program
outputs that to be ported: given some simple rules and the values it
has to determine if the program is still working correctly and give
the details of what the values are. If you have a better idea of how
to do this kind of parsing please chime in. I am using tokenize but
that might be more complex than it needs to be. This is what I have
come up so far:

I've been meaning to expand on pyparsing's simpleArith.py example for
a while, to include the evaluation of the parsed tokens. Here is the
online version, http://pyparsing.wikispaces.com/file/view/eval_arith.py,
it will be included in version 1.5.2 (coming shortly). I took the
liberty of including your rule set as a list of embedded test cases.

-- Paul
 
J

John Machin

I've been meaning to expand on pyparsing's simpleArith.py example for
a while, to include the evaluation of the parsed tokens.  Here is the
online version,http://pyparsing.wikispaces.com/file/view/eval_arith.py,
it will be included in version 1.5.2 (coming shortly).  I took the
liberty of including your rule set as a list of embedded test cases.

Hi Paul,

I don't see how it can handle the chained relop in the last two
testcases e. g. '0.00 LE A LE 4.00' -- unless relops are chained by
default in your parser.

Cheers,
John
 
P

Paul McGuire

I don't see how it can handle the chained relop in the last two
testcases e. g. '0.00 LE A LE 4.00' -- unless relops are chained by
default in your parser.

John -

First of all, to respect precedence of operations, higher level
precedences are parsed and grouped first. If you left off the parse
actions and just printed out the parse tree created by the example
(using asList()), for "A + B * C" you would get ['A', '+', [ 'B', '*',
'C' ]]. If you expand that test case to "A + B * C + D", you would
get ['A', '+', [ 'B', '*', 'C' ], '+', 'D' ]. This is counter to the
conventional infix parser that would create [['A', '+', [ 'B', '*',
'C' ]], '+', 'D' ], in which binary operators typicaly return
'operand' 'operator' 'operand' triples, and either operand might be a
nested parse tree.

As it happens, when using pyparsing's operatorPrecedence helper, *all*
binary operators at the same precedence level are actually parsed in a
single chain.

This is why you see this logic in EvalAddOp.eval:

def eval(self):
sum = self.value[0].eval()
for op,val in operatorOperands(self.value[1:]):
if op == '+':
sum += val.eval()
if op == '-':
sum -= val.eval()
return sum

operatorOperands is a little generator that returns operator-operand
pairs, beginning at the second (that is, the "1th") token in the
list. You can't just do the simple evaluation of "operand1 operator
operand2", you have to build up the sum by first evaluating operand1,
and then iterating over the operator-operand pairs in the rest of the
list. Same thing for the muliplication operators.

For the comparison operators, things are a little more involved.
"operand1 operator1 operand2 operator2 operand3" (as in "0.00 LE A LE
4.00") has to evaluate as

op1 operator1 op2 AND op2 operator2 op3

So EvalComparisonOp's eval method looks like:

def eval(self):
val1 = self.value[0].eval()
ret = True
for op,val in operatorOperands(self.value[1:]):
fn = EvalComparisonOp.opMap[op]
val2 = val.eval()
ret = ret and fn(val1,val2)
val1 = val2
return ret

The first term is evaluated and stored in val1. Then each
comparisonop-operand pair is extracted, the operand is eval()'ed and
stored in val2, and the comparison method that is mapped to the
comparisonop is called using val1 and val2. Then, to move on to do
the next comparison, val2 is stored into val1, and the we iterate to
the next comparison-operand pair. In fact, not only does this handle
"0.00 LE A LE 4.00", but it could also evaluate "0.00 LE A LE 4.00 LE
E > D". (I see that I should actually do some short-circuiting here -
if ret is false after calling fn(val1,val2), I should just break out
at that point. I'll have that fixed in the online version shortly.)

-- Paul
 
A

Aaron Brady

On Apr 16, 10:57 am, (e-mail address removed) wrote:
Another interesting task for those that are looking for some
interesting problem:
I inherited some rule system that checks for programmers program
outputs that to be ported: given some simple rules and the values it
has to determine if the program is still working correctly and give
the details of what the values are. If you have a better idea of how
to do this kind of parsing please chime in. I am using tokenize but
that might be more complex than it needs to be. This is what I have
come up so far:
snip
def main():
    for cur_rule in rules[20:26]:
        tokens=get_tokens(cur_rule)
        normal=replace_comps(tokens,COMP_REPLACERS)
        subst=replace_names(normal,vars_)
        groups=split_seccions(subst,COMP_REPLACERS.values())
        rep=all_seccions(groups)
        rep_out=''.join(x[0]+x[1] for x in rep)
        calc=calc_seccions(rep)
        calc_out=''.join(x[0]+x[1] for x in calc)
        deltas=calc_deltas(calc)
        result=eval(calc_out,{},{})
snip
a= '-1000.00 < A < 0.00'
eval( a, { 'A': -100 } ) snip
a= '-1000.00 LE A LE 0.00'
b= a.replace( ' LE ', ' <= ' )
snip

I know about evals implication of safety. Rules are defined by the
programmers so I don't worry too much about it at this point. They
should know better than messing up their application server. Unless
there is some easier way to do it I am willing to take the risk.
Precedence is standard math, we can always add some extra parenthesis
to the rules, I don't thing the old system would mind.

I thought about using eval with a locals dictionary, but they want
output of the intermediate values. I want to avoid the situation where
the intermediate output does not match what eval is doing.

I take you to need the operands to the individual comparison
operators; that is, each side of each comparison. There might be a
way using abstract syntax trees. Or, a basic 're' split can simplify
it.
import re
a= '-1000.00 LE A LE 0.00'
b= re.split( r'\s(LT|GT|LE|GE|=|<|>|<=|>=)\s', a )
b ['-1000.00', 'LE', 'A', 'LE', '0.00']
COMP_REPLACERS={'LT':'<', 'GT':'>', 'LE':'<=', 'GE':'>=', '=':'=='}
c= [ COMP_REPLACERS.get( x, x ) for x in b ]
c ['-1000.00', '<=', 'A', '<=', '0.00']
vars_={'A': 0, 'B': 1.1, 'C': 2.2, 'D': 3.3, 'E': 4.4, 'F': 5.5, 'G': .... 6.6, 'H':7.7, 'I':8.8, 'J':9.9}
d= [ str( vars_.get( x, x ) ) for x in c ]
d ['-1000.00', '<=', '0', '<=', '0.00']
eval( ''.join( d ) )
True

The 'dict.get( x, x )' expression returns the value of the entry for
x, or x itself if it is not present.

No promises. I didn't think it all the way through.
 
A

Aaron Brady

not only does this handle
"0.00 LE A LE 4.00", but it could also evaluate "0.00 LE A LE 4.00 LE
E > D".  (I see that I should actually do some short-circuiting here -
if ret is false after calling fn(val1,val2), I should just break out
at that point.  I'll have that fixed in the online version shortly.)

Hi, not to offend; I don't know your background. One thing I like
about Python is it and the docs are careful about short-circuiting
conditions. ISTR that C left some of those details up to the compiler
at one point.
.... print( 'in f' )
.... return 10
....in f
Truein f
in f
True

Therefore, if op{n} has side effects, 'op1 operator1 op2 AND op2
operator2 op3' is not equivalent to 'op1 optor1 op2 optor2 op3'.
 
P

Paul McGuire

Hi, not to offend; I don't know your background.  

Courtesy on Usenet!!! I'm going to go buy a lottery ticket!

Not to worry, I'm a big boy. People have even called my baby ugly,
and I manage to keep my blood pressure under control.
One thing I like
about Python is it and the docs are careful about short-circuiting
conditions.  ISTR that C left some of those details up to the compiler
at one point.


...     print( 'in f' )
...     return 10
...>>> 0<f()<20

in f
True>>> 0<f() and f()<20

in f
in f
True

Therefore, if op{n} has side effects, 'op1 operator1 op2 AND op2
operator2 op3' is not equivalent to 'op1 optor1 op2 optor2 op3'.

Interesting point, but I don't remember that "A < B < C" is valid C
syntax, are you perhaps thinking of a different language?

By luck, my implementation of EvalComparisonOp.eval does in fact
capture the post-eval value of op2, so that if its evaluation caused
any side effects, they would not be repeated.

-- Paul
 
P

pruebauno

I've been meaning to expand on pyparsing's simpleArith.py example for
a while, to include the evaluation of the parsed tokens.  Here is the
online version,http://pyparsing.wikispaces.com/file/view/eval_arith.py,
it will be included in version 1.5.2 (coming shortly).  I took the
liberty of including your rule set as a list of embedded test cases.

-- Paul

That is fine with me. I don't know how feasible it is for me to use
pyparsing for this project considering I don't have admin access on
the box that is eventually going to run this. To add insult to injury
Python is in the version 2->3 transition (I really would like to push
the admins to install 3.1 by the end of the year before the amount of
code written by us gets any bigger) meaning that any third party
library is an additional burden on the future upgrade. I can't
remember if pyparsing is pure Python. If it is I might be able to
include it alongside my code if it is not too big.
 
A

Aaron Brady

Courtesy on Usenet!!!  I'm going to go buy a lottery ticket!

Not to worry, I'm a big boy.  People have even called my baby ugly,
and I manage to keep my blood pressure under control.








Interesting point, but I don't remember that "A < B < C" is valid C
syntax, are you perhaps thinking of a different language?

By luck, my implementation of EvalComparisonOp.eval does in fact
capture the post-eval value of op2, so that if its evaluation caused
any side effects, they would not be repeated.

-- Paul

It was very hazy in my memory and is easily solved by a web search.
Perhaps I was thinking of the short-circuiting of 'and', and mixing it
up with the warning about parentheses in macros.

-Macros? NO, Batman.
-I'm afraid so Robin. Macros.
 
P

Paul McGuire

That is fine with me. I don't know how feasible it is for me to use
pyparsing for this project considering I don't have admin access on
the box that is eventually going to run this. To add insult to injury
Python is in the version 2->3 transition (I really would like to push
the admins to install 3.1 by the end of the year before the amount of
code written by us gets any bigger) meaning that any third party
library is an additional burden on the future upgrade. I can't
remember if pyparsing is pure Python. If it is I might be able to
include it alongside my code if it is not too big.- Hide quoted text -

- Show quoted text -

It *is* pure Python, and consists of a single source file for the very
purpose of ease-of-inclusion. A number of projects include their own
versions of pyparsing for version compatibility management, matplotlib
is one that comes to mind.

The upcoming version 1.5.2 download includes a pyparsing_py3.py file
for Python 3 compatibility, I should have that ready for users to
download *VERY SOON NOW*!

-- Paul
 
P

pruebauno

It *is* pure Python, and consists of a single source file for the very
purpose of ease-of-inclusion.  A number of projects include their own
versions of pyparsing for version compatibility management, matplotlib
is one that comes to mind.

The upcoming version 1.5.2 download includes a pyparsing_py3.py file
for Python 3 compatibility, I should have that ready for users to
download *VERY SOON NOW*!

-- Paul

Thanks,
I will consider it. I have to admit that although it looks like it is
a very good solution, it is also longer and more complex than my
current code. Having to explicitly define standard python evaluation
and semantics is a bit overkill.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,989
Messages
2,570,207
Members
46,782
Latest member
ThomasGex

Latest Threads

Top