regular expression for nested parentheses

Noah Hoffman · Dec 9, 2007

I have been trying to write a regular expression that identifies a
block of text enclosed by (potentially nested) parentheses. I've found
solutions using other regular expression engines (for example, my text
editor, BBEdit, which uses the PCRE library), but have not been able
to replicate it using python's re module.

Here's a version that works using the PCRE syntax, along with the
python error message. I'm hoping for this to identify the string '(foo
(bar) (baz))'

% python -V
Python 2.5.1
% python
py> import re
py> text = 'buh (foo (bar) (baz)) blee'
py> no_ws = lambda s: ''.join(s.split())
py> rexp = r"""(?P<parens>
.... \(
.... (?>
.... (?> [^()]+ ) |
.... (?P>parens)
.... )*
.... \)
.... )"""
py> print re.findall(no_ws(rexp), text)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/re.py", line 167, in findall
return _compile(pattern, flags).findall(string)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/re.py", line 233, in _compile
raise error, v # invalid expression
sre_constants.error: unexpected end of pattern

From what I understand of the PCRE syntax, the (?>) construct is a non-
capturing subpattern, and (?P>parens) is a recursive call to the
enclosing (named) pattern. So my best guess at a python equivalent is
this:

py> rexp2 = r"""(?P<parens>
.... \(
.... (?=
.... (?= [^()]+ ) |
.... (?P=parens)
.... )*
.... \)
.... )"""
py> print re.findall(no_ws(rexp2), text)
[]

....which results in no match. I've played around quite a bit with
variations on this theme, but haven't been able to come up with one
that works.

Can anyone help me understand how to construct a regular expression
that does the job in python?

Thanks -

John Machin · Dec 9, 2007

I have been trying to write a regular expression that identifies a
block of text enclosed by (potentially nested) parentheses. I've found
solutions using other regular expression engines (for example, my text
editor, BBEdit, which uses the PCRE library), but have not been able
to replicate it using python's re module.

A pattern that can validly be described as a "regular expression"
cannot count and thus can't match balanced parentheses. Some "RE"
engines provide a method of tagging a sub-pattern so that a match must
include balanced () (or [] or {}); Python's doesn't.

Looks like you need a parser; try pyparsing.

[snip]

py> rexp = r"""(?P<parens>
... \(
... (?>
... (?> [^()]+ ) |
... (?P>parens)
... )*
... \)
... )"""
py> print re.findall(no_ws(rexp), text)

Ummm ... even if Python's re engine did do what you want, wouldn't you
need flags=re.VERBOSE in there?

Noah Hoffman · Dec 9, 2007

A pattern that can validly be described as a "regular expression"
cannot count and thus can't match balanced parentheses. Some "RE"
engines provide a method of tagging a sub-pattern so that a match must
include balanced () (or [] or {}); Python's doesn't.

Okay, thanks for the clarification. So recursion is not possible using
python regular expressions?

Ummm ... even if Python's re engine did do what you want, wouldn't you
need flags=re.VERBOSE in there?

Ah, thanks for letting me know about that flag; but removing
whitespace as I did with the no_ws lambda expression should also work,
no?

John Machin · Dec 9, 2007

A pattern that can validly be described as a "regular expression"
cannot count and thus can't match balanced parentheses. Some "RE"
engines provide a method of tagging a sub-pattern so that a match must
include balanced () (or [] or {}); Python's doesn't.

Click to expand...

Okay, thanks for the clarification. So recursion is not possible using
python regular expressions?

Ummm ... even if Python's re engine did do what you want, wouldn't you
need flags=re.VERBOSE in there?

Click to expand...

Ah, thanks for letting me know about that flag; but removing
whitespace as I did with the no_ws lambda expression should also work,
no?

Under a very limited definition of "work". That technique would not
produce correct answers on patterns that contain any *significant*
whitespace e.g. you want to match "foo" and "bar" separated by one or
more spaces (but not tabs, newlines etc) ....
pattern = r"""
foo
[ ]+
bar
"""

MRAB · Dec 10, 2007

A pattern that can validly be described as a "regular expression"
cannot count and thus can't match balanced parentheses. Some "RE"
engines provide a method of tagging a sub-pattern so that a match must
include balanced () (or [] or {}); Python's doesn't.

Click to expand...

Click to expand...

Okay, thanks for the clarification. So recursion is not possible using
python regular expressions?

Click to expand...

Ah, thanks for letting me know about that flag; but removing
whitespace as I did with the no_ws lambda expression should also work,
no?

Click to expand...

Under a very limited definition of "work". That technique would not
produce correct answers on patterns that contain any *significant*
whitespace e.g. you want to match "foo" and "bar" separated by one or
more spaces (but not tabs, newlines etc) ....
pattern = r"""
foo
[ ]+
bar
"""

You can also escape a literal space:

pattern = r"""
foo
\ +
bar
"""

John Machin · Dec 10, 2007

A pattern that can validly be described as a "regular expression"
cannot count and thus can't match balanced parentheses. Some "RE"
engines provide a method of tagging a sub-pattern so that a match must
include balanced () (or [] or {}); Python's doesn't.
Okay, thanks for the clarification. So recursion is not possible using
python regular expressions?
Ummm ... even if Python's re engine did do what you want, wouldn't you
need flags=re.VERBOSE in there?
Ah, thanks for letting me know about that flag; but removing
whitespace as I did with the no_ws lambda expression should also work,
no?

Click to expand...

Click to expand...

Under a very limited definition of "work". That technique would not
produce correct answers on patterns that contain any *significant*
whitespace e.g. you want to match "foo" and "bar" separated by one or
more spaces (but not tabs, newlines etc) ....
pattern = r"""
foo
[ ]+
bar
"""

Click to expand...

You can also escape a literal space:

pattern = r"""
foo
\ +
bar
"""

I know that. *Any* method of putting in a literal significant space is
clobbered by the OP's "trick" of removing *all* whitespace instead of
using the VERBOSE flag, which also permits comments:
pattern = r"""
\ + # ugly
[ ]+ # not quite so ugly
"""

OverflowError: regular expression code size limit exceeded	0	Apr 16, 2008
regular expression for getting content between parentheses	0	May 8, 2009
whats wrong with my reg expression ?	1	Jan 15, 2007
Python regular expression	1	Dec 5, 2006
Request for Feedback; a module making it easier to use regular expressions.	1	Jan 31, 2005
Suggestion for a new regular expression extension	4	Nov 20, 2003
regular expressions eliminating filenames of type foo.thumbnail.jpg	7	Jun 25, 2007
Strange re problem on OSX but Not Linux	3	Nov 8, 2006

regular expression for nested parentheses

Noah Hoffman

John Machin

Noah Hoffman

John Machin

MRAB

John Machin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads