non-terminating regex match

  • Thread starter Maurizio Vitale
  • Start date
M

Maurizio Vitale

Has to be something really stupid, but the following never finish
(running Python 2.5.1 (r251:54863, Jan 10 2008, 18:00:49)
[GCC 4.2.1 (SUSE Linux)] on linux2).

The intention is to match C++ identifiers, with or without namespace
qualification, with or without arguments (e.g. variables, functions and
macros).
The following should be accepted:
main
main(int,char**)
::main
std::cout
::std::cout
NDEBUG

Thanks for any help.
And yes, I'm a total beginner when it comes to Python, but it seems
very strange to me that a regex match on a finite length string
doesn't terminate
Regards,

Maurizio

#!/usr/bin/env python
# -*- Python -*-

import re

if __name__ == '__main__':
r = re.compile (
r'(?:(?P<scope>(?:(?:::)?\w+)*)::)?'
r'(?P<name>\w+)'
r'(?:\((?P<arguments>[^\)]*)\))?'
)
match = r.search ('WITH_ALOHA_EXCEPTION_HANDLERS')
 
M

Marc 'BlackJack' Rintsch

And yes, I'm a total beginner when it comes to Python, but it seems
very strange to me that a regex match on a finite length string
doesn't terminate

It does terminate, you just don't wait long enough. Try it with fewer
characters and then increase the identifier character by character and
watch the time of the runs grow exponentially.

Ciao,
Marc 'BlackJack' Rintsch
 
M

Maurizio Vitale

Marc 'BlackJack' Rintsch said:
It does terminate, you just don't wait long enough. Try it with fewer
characters and then increase the identifier character by character and
watch the time of the runs grow exponentially.

Ok. Now, my assumption was that re.compile would produce a DFA (or NFA)
and the match would then being linear in the size of the string.

Apparently this is not the case, so is there anything that can be done
to the regex itself to make sure that whatever re.compile produces is
more efficient?

Thanks,

Maurizio
 
M

MRAB

Has to be something really stupid, but the following never finish
(running Python 2.5.1 (r251:54863, Jan 10 2008, 18:00:49)
[GCC 4.2.1 (SUSE Linux)] on linux2).

The intention is to match C++ identifiers, with or without namespace
qualification, with or without arguments (e.g. variables, functions and
macros).
The following should be accepted:
main
main(int,char**)
::main
std::cout
::std::cout
NDEBUG

Thanks for any help.
And yes, I'm a total beginner when it comes to Python, but it seems
very strange to me that a regex match on a finite length string
doesn't terminate
Regards,

Maurizio

#!/usr/bin/env python
# -*- Python -*-

import re

if __name__ == '__main__':
r = re.compile (
r'(?:(?P<scope>(?:(?:::)?\w+)*)::)?'
r'(?P<name>\w+)'
r'(?:\((?P<arguments>[^\)]*)\))?'
)
match = r.search ('WITH_ALOHA_EXCEPTION_HANDLERS')

I think the problem is with this bit: '(?:(?:::)?\w+)*'. The '::' is
optional and also in a repeated group, so if it tries to match, say,
'abc' it can try and then backtrack all of these possibilities: abc,
ab c, a bc, a b c. The longer the string, the more possibilities to
try. Try this instead:

r = re.compile (
r'(?P<scope>(?:::)?(?:\w+::)*)?'
r'(?P<name>\w+)'
r'(?:\((?P<arguments>[^\)]*)\))?'
)
 
M

Maurizio Vitale

MRAB said:
I think the problem is with this bit: '(?:(?:::)?\w+)*'. The '::' is
optional and also in a repeated group, so if it tries to match, say,
'abc' it can try and then backtrack all of these possibilities: abc,
ab c, a bc, a b c. The longer the string, the more possibilities to
try. Try this instead:

r = re.compile (
r'(?P<scope>(?:::)?(?:\w+::)*)?'
r'(?P<name>\w+)'
r'(?:\((?P<arguments>[^\)]*)\))?'
)

That was indeed the problem. Your version is ok w/ the minor difference
that the named group <scope> includes the '::' at the end. This can be
easily stripped off. Or maybe the regexp can be massaged further.
Thanks a lot,

Maurizio
 
G

Gabriel Genellina

En Wed, 02 Apr 2008 13:01:59 -0300, Maurizio Vitale
The intention is to match C++ identifiers, with or without namespace
qualification, with or without arguments (e.g. variables, functions and
macros).
The following should be accepted:
main
main(int,char**)
::main
std::cout
::std::cout
NDEBUG

What about foo(int(*f)(int)) ?
You can't match a function definition with a regular expression alone, due
to the nested (). The r.e. can recognize an identifier; you can later see
whether there is a ( and scan up to the matching ).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top