N
Noah Hoffman
I have been trying to write a regular expression that identifies a
block of text enclosed by (potentially nested) parentheses. I've found
solutions using other regular expression engines (for example, my text
editor, BBEdit, which uses the PCRE library), but have not been able
to replicate it using python's re module.
Here's a version that works using the PCRE syntax, along with the
python error message. I'm hoping for this to identify the string '(foo
(bar) (baz))'
% python -V
Python 2.5.1
% python
py> import re
py> text = 'buh (foo (bar) (baz)) blee'
py> no_ws = lambda s: ''.join(s.split())
py> rexp = r"""(?P<parens>
.... \(
.... (?>
.... (?> [^()]+ ) |
.... (?P>parens)
.... )*
.... \)
.... )"""
py> print re.findall(no_ws(rexp), text)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/re.py", line 167, in findall
return _compile(pattern, flags).findall(string)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/re.py", line 233, in _compile
raise error, v # invalid expression
sre_constants.error: unexpected end of pattern
From what I understand of the PCRE syntax, the (?>) construct is a non-
capturing subpattern, and (?P>parens) is a recursive call to the
enclosing (named) pattern. So my best guess at a python equivalent is
this:
py> rexp2 = r"""(?P<parens>
.... \(
.... (?=
.... (?= [^()]+ ) |
.... (?P=parens)
.... )*
.... \)
.... )"""
py> print re.findall(no_ws(rexp2), text)
[]
....which results in no match. I've played around quite a bit with
variations on this theme, but haven't been able to come up with one
that works.
Can anyone help me understand how to construct a regular expression
that does the job in python?
Thanks -
block of text enclosed by (potentially nested) parentheses. I've found
solutions using other regular expression engines (for example, my text
editor, BBEdit, which uses the PCRE library), but have not been able
to replicate it using python's re module.
Here's a version that works using the PCRE syntax, along with the
python error message. I'm hoping for this to identify the string '(foo
(bar) (baz))'
% python -V
Python 2.5.1
% python
py> import re
py> text = 'buh (foo (bar) (baz)) blee'
py> no_ws = lambda s: ''.join(s.split())
py> rexp = r"""(?P<parens>
.... \(
.... (?>
.... (?> [^()]+ ) |
.... (?P>parens)
.... )*
.... \)
.... )"""
py> print re.findall(no_ws(rexp), text)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/re.py", line 167, in findall
return _compile(pattern, flags).findall(string)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/re.py", line 233, in _compile
raise error, v # invalid expression
sre_constants.error: unexpected end of pattern
From what I understand of the PCRE syntax, the (?>) construct is a non-
capturing subpattern, and (?P>parens) is a recursive call to the
enclosing (named) pattern. So my best guess at a python equivalent is
this:
py> rexp2 = r"""(?P<parens>
.... \(
.... (?=
.... (?= [^()]+ ) |
.... (?P=parens)
.... )*
.... \)
.... )"""
py> print re.findall(no_ws(rexp2), text)
[]
....which results in no match. I've played around quite a bit with
variations on this theme, but haven't been able to come up with one
that works.
Can anyone help me understand how to construct a regular expression
that does the job in python?
Thanks -