Generating text from a regular expression

Nathan Harmston · Mar 31, 2010

Hi everyone,

I have a slightly complicated/medium sized regular expression and I
want to generate all possible words that it can match (to compare
performance of regex against an acora based matcher). Using the
regular expression as a grammar to generate all words in its language.
I was wondering if this possible in Python or possible using anything.
Google doesnt seem to give any obvious answers.

Many thanks in advance,

Nathan

Grant Edwards · Mar 31, 2010

I have a slightly complicated/medium sized regular expression and I
want to generate all possible words that it can match

I was wondering if this possible in Python or possible using
anything. Google doesnt seem to give any obvious answers.

We did this one a couple weeks ago.

It's not possible in the general case (there are an infinite number of
matching words for many/most regular expressions).

Paul McGuire · Mar 31, 2010

Hi everyone,

I have a slightly complicated/medium sized regular expression and I
want to generate all possible words that it can match (to compare
performance of regex against an acora based matcher).

The pyparsing wiki Examples page includes this regex inverter:
http://pyparsing.wikispaces.com/file/view/invRegex.py

From the module header:
# Supports:
# - {n} and {m,n} repetition, but not unbounded + or * repetition
# - ? optional elements
# - [] character ranges
# - () grouping
# - | alternation

-- Paul

Nathan Harmston · Apr 3, 2010

Thanks everyone, the invRegexInf is perfect.

Thanks again,

Nathan

En Wed said:
En Wed said:

I have a slightly complicated/medium sized regular expression and I
want to generate all possible words that it can match (to compare
performance of regex against an acora based matcher).

Click to expand...

The pyparsing wiki Examples page includes this regex inverter:
http://pyparsing.wikispaces.com/file/view/invRegex.py

From the module header:

Click to expand...

# Supports:
# - {n} and {m,n} repetition, but not unbounded + or * repetition
# - ? optional elements
# - [] character ranges
# - () grouping
# - | alternation

Click to expand...

I took the liberty of modifying your invRegex.py example, adding support
for infinite repeaters. It depends on two other modules:

mergeinf.py (from http://code.activestate.com/recipes/577041) provides the
infinite merge operation.

enumre.py provides the basic functions (merge, prod, repeat, closure)
necessary to enumerate the language generated by a given regular
expression, even if it contains unbounded repeaters like *,+. The key is
to generate shorter strings before longer ones, so in 'a*|b*' it doesn't
get stuck generating infinite a's before any b.

By example, "(a|bc)*d" corresponds to this code:

prod(
closure(
merge(
'a',
prod('b','c'))),
'd')

which returns an infinite generator starting with:

d
ad
aad
bcd
aaad
abcd
bcad
aaaad
aabcd
abcad
bcaad
bcbcd
aaaaad
aaabcd
aabcad
...

I got the idea from
http://userweb.cs.utexas.edu/users/misra/Notes.dir/RegExp.pdf

Finally, invRegexInf.py is based on your original regex parser. I only
modified the generation part, taking advantage of the above
infrastructure; the parser itself remains almost the same. It essentially
saves oneself the very tedious work of converting a regular expression
into the equivalent sequence of function calls as shown above. (I hope I
got it right: I like pyparsing a lot and use it whenever I feel it's
appropriate, but not as often as to remember the details...)

FAQ 6.24 How do I match a regular expression that's in a variable?	0	Apr 19, 2011
Regular expression for not-group	3	Jun 15, 2006
Regular Expression - Matching Multiples of 3 Characters exactly.	6	Apr 28, 2008
Match a regular expression	2	Mar 2, 2007
FAQ 6.5 I put a regular expression into $/ but it didn't work. What's wrong?	0	Jan 28, 2011
Help needed with tough regular expression matching	11	Oct 12, 2009
Find and replace in a file with regular expression	3	Jan 30, 2007
FAQ 6.20 What good is "\G" in a regular expression?	0	Mar 3, 2011

Generating text from a regular expression

Nathan Harmston

Grant Edwards

Paul McGuire

Nathan Harmston

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads