compound regex

spir · Feb 9, 2009

Hello,

(new here)

Below an extension to standard module re. The point is to allow writing and testing sub-expressions individually, then nest them into a super-expression. More or less like using a parser generator -- but keeping regex grammar and power.
I used the format {sub_expr_name}: as in standard regexes {} are only used to express repetition number, a pair of curly braces nesting an identifier should not conflict.

The extension is new, very few tested. I would enjoy comments, critics, etc. I would like to know if you find such a feature useful. You will probably find the code simple enough ;-)

Denis
------
la vida e estranya

===============
# coding: utf-8

''' super_regex

Define & check sub-patterns individually,
then include them in global super-pattern.

uses format {name} for inclusion:
sub1 = Regex(...)
sub2 = Regex(...)
super_format = "...{sub1}...{sub2}..."
# final regex object:
super_regex = superRegex(super_format)
'''

from re import compile as Regex

# sub-pattern inclusion format
sub_pattern = Regex(r"{[a-zA-Z_][a-zA-Z_0-9]*}")

# sub-pattern expander
def sub_pattern_expansion(inclusion, dic=None):
name = inclusion.group()[1:-1]
### namespace dict may be specified -- else globals()
if dic is None:
dic = globals()
if name not in dic:
raise NameError("Cannot find sub-pattern '%s'." % name)
return dic[name].pattern

# super-pattern generator
def superRegex(format):
expanded_format = sub_pattern.sub(sub_pattern_expansion, format)
return Regex(expanded_format)

if __name__ == "__main__": # purely artificial example use
# pattern
time = Regex(r"\d\d:\d\d:\d\d") # hh:mm:ss
code = Regex(r"\S{5}") # non-whitespace x 5
desc = Regex(r"[\w\s]+$") # alphanum|space --> EOL
ref_format = "^ref: {time} #{code} --- {desc}"
ref_regex = superRegex(ref_format)
# output
print 'super pattern:\n"%s" ==>\n"%s"\n' % (ref_format,ref_regex.pattern)
text = "ref: 12:04:59 #%+.?% --- foo 987 bar"
result = ref_regex.match(text)
print 'text: "%s" ==>\n"%s"' %(text,result.group())

Issue with textbox script?	0	Sep 5, 2022
Php modal form to email	1	Aug 28, 2024
A nice way to use regex for complicate parsing	3	Mar 29, 2007
Problem creating a regular expression to parse open-iscsi, iscsiadmoutput (help?)	5	Jun 13, 2013
Questions about regex	3	May 29, 2009
Collect Excel Data from Website	5	Apr 30, 2022
How to debug a regex with (?DEFINE)?	0	Aug 7, 2012
Matching XML Tag Contents with Regex	6	Dec 11, 2007

compound regex

spir

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads