Regexp parser and generator

G

George Sakkis

Is there any package that parses regular expressions and returns an
AST ? Something like:
Regex('i ', Or('love', 'hate'), ' h', Or('is', 'er'), ' ', Or('cat',
'dog'), Optional('s'), ZeroOrMore(r'\s'), OneOrMore('!'))

Given such a structure, I want to create a generator that can generate
all strings matched by this regexp. Obviously if the regexp contains a
'*' or '+' the generator is infinite, and although it can be
artificially constrained by, say, a maxdepth parameter, for now I'm
interested in finite regexps only. It shouldn't be too hard to write
one from scratch but just in case someone has already done it, so much
the better.

George
 
S

skip

George> Is there any package that parses regular expressions and returns
George> an AST ?

Maybe not directly, but this might provide a starting point for building
such a beast:
>>> import re
>>> re.compile("[ab]", 128)
in
literal 97
literal 98
said:
>>> re.compile("ab*c[xyz]", 128)
literal 97
max_repeat 0 65535
literal 98
literal 99
in
literal 120
literal 121
literal 122
<_sre.SRE_Pattern object at 0x371f90>

Skip
 
P

Peter Otten

George said:
Is there any package that parses regular expressions and returns an
AST ? Something like:

Regex('i ', Or('love', 'hate'), ' h', Or('is', 'er'), ' ', Or('cat',
'dog'), Optional('s'), ZeroOrMore(r'\s'), OneOrMore('!'))

Seen today, on planet python:
[('in', [('literal', 97), ('literal', 98)])]


Peter
 
P

Paul McGuire

Is there any package that parses regular expressions and returns an
AST ? Something like:


Regex('i ', Or('love', 'hate'), ' h', Or('is', 'er'), ' ', Or('cat',
'dog'), Optional('s'), ZeroOrMore(r'\s'), OneOrMore('!'))

Given such a structure, I want to create a generator that can generate
all strings matched by this regexp. Obviously if the regexp contains a
'*' or '+' the generator is infinite, and although it can be
artificially constrained by, say, a maxdepth parameter, for now I'm
interested in finite regexps only. It shouldn't be too hard to write
one from scratch but just in case someone has already done it, so much
the better.

George

Check out this pyparsing regex inverter: http://pyparsing.wikispaces.com/file/view/invRegex.py

Here is what your example generates:
i (love|hate) h(is|er) (cat|dog)s?
Parse time: 0.17 seconds
16
i love his cat
i love his cats
i love his dog
i love his dogs
i love her cat
i love her cats
i love her dog
i love her dogs
i hate his cat
i hate his cats
i hate his dog
i hate his dogs
i hate her cat
i hate her cats
i hate her dog
i hate her dogs

-- Paul
 
G

George Sakkis

George said:
Is there any package that parses regular expressions and returns an
AST ? Something like:
Regex('i ', Or('love', 'hate'), ' h', Or('is', 'er'), ' ', Or('cat',
'dog'), Optional('s'), ZeroOrMore(r'\s'), OneOrMore('!'))

Seen today, on planet python:

[('in', [('literal', 97), ('literal', 98)])]

Peter

Thanks, that's rather low level and undocumented but it does the work.

Best,
George
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top