regex searching hangs

  • Thread starter Fortepianissimo
  • Start date
F

Fortepianissimo

Could someone explains why the following code hangs (Python 2.3.3)?

--- CODE STARTS ---
import re

p=re.compile(r'\S+(?:\.\S+)+\.com')

t='......................................'
p.search(t)
--- CODE ENDS ---

From the syntax and semantics of regex, the entire t should be
consumed by the pattern '\S+(?:\.\S+)+' and the search() call should
return None. Is search() doing a depth-first search and can't pulling
itself out? It doesn't seem right to me, or I'm missing something?

BTW, I don't think any regex matching should ever hang in any
circumstances - correct me if I'm wrong: is halting problem a problem
here?

Thanks!
 
J

Jan Erik Breimo

Fortepianissimo said:
Could someone explains why the following code hangs (Python 2.3.3)?

--- CODE STARTS ---
import re

p=re.compile(r'\S+(?:\.\S+)+\.com')

t='......................................'
p.search(t)
--- CODE ENDS ---

The search doesn't hang, it just takes a long time to compute - the time it
takes almost doubles for each dot you add to t.
If you replace \S in the expression with [^ \t\n\r\f\v.] (ie. no whitespace
and no dots) you'll get the computation time you expect.
 
R

Rob Renaud

Could someone explains why the following code hangs (Python 2.3.3)?

--- CODE STARTS ---
import re

p=re.compile(r'\S+(?:\.\S+)+\.com')

t='......................................'
p.search(t)
--- CODE ENDS ---

From the syntax and semantics of regex, the entire t should be
consumed by the pattern '\S+(?:\.\S+)+' and the search() call should
return None. Is search() doing a depth-first search and can't pulling
itself out? It doesn't seem right to me, or I'm missing something?

BTW, I don't think any regex matching should ever hang in any
circumstances - correct me if I'm wrong: is halting problem a problem
here?

Thanks!


It doesn't halt, it just seems you found an exponential time search,
searching at length i seems to take about 1.6 times the time it takes
to search at length i-1.

import re
import time
p=re.compile(r'\S+(?:\.\S+)+\.com')
t = '.'
for i in range(100):
start = time.time()
p.search(t*i)
print i, time.time() - start

How should you improve it? I don't know, I can't even read the regexp
:)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top