Hello:
I have been searching for an easy solution, and hopefully one
has already been written, so I don't want to reinvent the wheel:
Suppose I have a string of expressions such as:
"((($IP = "127.1.2.3") AND ($AX < 15)) OR (($IP = "127.1.2.4") AND ($AY !=
0)))
I would like to split up into something like:
[ "OR",
"(($IP = "127.1.2.3") AND ($AX < 15))",
"(($IP = "127.1.2.4") AND ($AY != 0))" ]
which I may then decide to or not to further split into:
[ "OR",
["AND", "($IP = "127.1.2.3")", "($AX < 15)"],
["AND", "(($IP = "127.1.2.4")", ($AY != 0))"] ]
Is there an easy way to do this?
I tried using regular expressions, re, but I don't think it is
recursive enough. I really want to break it up from:
(E1 AND_or_OR E2) and make that int [AND_or_OR, E1, E2]
and apply the same to E1 and E2 recursively until E1[0] != '('
But the main problem I am running to is, how do I split this up
by outer parentheseis. So that I get the proper '(' and ')' to split
this upper correctly?
Thanks in advance:
Michael Yanowitz
This problem is right down the pyparsing fairway! Pyparsing is a
module for defining recursive-descent parsers, and it has some built-
in help just for applications such as this.
You start by defining the basic elements of the text to be parsed. In
your sample text, you are combining a number of relational
comparisons, made up of variable names and literal integers and quoted
strings. Using pyparsing classes, we define these:
varName = Word("$",alphas, min=2)
integer = Word("0123456789").setParseAction( lambda t : int(t[0]) )
varVal = dblQuotedString | integer
varName is a "word" starting with a $, followed by 1 or more alphas.
integer is a "word" made up of 1 or more digits, and we add a parsing
action to convert these to Python ints. varVal shows that a value can
be an integer or a dblQuotedString (a common expression included with
pyparsing).
Next we define the set of relational operators, and the comparison
expression:
relationalOp = oneOf("= < > >= <= !=")
comparison = Group(varName + relationalOp + varVal)
The comparison expression is grouped so as to keep tokens separate
from surrounding expressions.
Now the most complicated part, to use the operatorPrecedence method
from pyparsing. It is possible to create the recursive grammar
explicitly, but this is another application that is very common, so
pyparsing includes a helper for it too. Here is your set of
operations defined using operatorPrecedence:
boolExpr = operatorPrecedence( comparison,
[
( "AND", 2, opAssoc.LEFT ),
( "OR", 2, opAssoc.LEFT ),
])
operatorPrecedence takes 2 arguments: the base-level or atom
expression (in your case, the comparison expression), and a list of
tuples listing the operators in descending priority. Each tuple gives
the operator, the number of operands (1 or 2), and whether it is right
or left associative.
Now the only thing left to do is use boolExpr to parse your test
string:
results = boolExpr.parseString('((($IP = "127.1.2.3") AND ($AX < 15))
OR (($IP = "127.1.2.4") AND ($AY != 0)))')
pyparsing returns parsed tokens as a rich object of type
ParseResults. This object can be accessed as a list, dict, or object
instance with named attributes. For this example, we'll actually
create a nested list using ParseResults' asList method. Passing this
list to the pprint module we get:
pprint.pprint( results.asList() )
prints
[[[['$IP', '=', '"127.1.2.3"'], 'AND', ['$AX', '<', 15]],
'OR',
[['$IP', '=', '"127.1.2.4"'], 'AND', ['$AY', '!=', 0]]]]
Here is the whole program in one chunk (I also added support for NOT -
higher priority than AND, and right-associative):
test = '((($IP = "127.1.2.3") AND ($AX < 15)) OR (($IP = "127.1.2.4")
AND ($AY != 0)))'
from pyparsing import oneOf, Word, alphas, dblQuotedString, nums, \
Literal, Group, operatorPrecedence, opAssoc
varName = Word("$",alphas)
integer = Word(nums).setParseAction( lambda t : int(t[0]) )
varVal = dblQuotedString | integer
relationalOp = oneOf("= < > >= <= !=")
comparison = Group(varName + relationalOp + varVal)
boolExpr = operatorPrecedence( comparison,
[
( "NOT", 1, opAssoc.RIGHT ),
( "AND", 2, opAssoc.LEFT ),
( "OR", 2, opAssoc.LEFT ),
])
import pprint
pprint.pprint( boolExpr.parseString(test).asList() )
The pyparsing wiki includes some related examples, SimpleBool.py and
SimpleArith.py - go to
http://pyparsing.wikispaces.com/Examples.
-- Paul