re.findall() hangs in python

  • Thread starter silverburgh.meryl
  • Start date
S

silverburgh.meryl

Hi,

I have the following regular expression.
It works when 'data' contains the pattern and I see 'match2' get print
out.
But when 'data' does not contain pattern, it just hangs at
're.findall'

pattern = re.compile("(.*)<img (.*?) src=\"(.*?)img(.*?)\"(.*?)",
re.S)

print "before find all"

match = re.findall(pattern, data)

if (match):
print "match2"



Can you please tell me why it that?
 
P

Peter Otten

I have the following regular expression.
It works when 'data' contains the pattern and I see 'match2' get print
out.
But when 'data' does not contain pattern, it just hangs at
're.findall'

pattern = re.compile("(.*)<img (.*?) src=\"(.*?)img(.*?)\"(.*?)",
re.S)

print "before find all"

match = re.findall(pattern, data)

if (match):
print "match2"



Can you please tell me why it that?

Could it be that it is just slow? If not, post a small example of data that
provokes findall() to hang.

Peter
 
7

7stud

Hi,

I have the following regular expression.
It works when 'data' contains the pattern and I see 'match2' get print
out.
But when 'data' does not contain pattern, it just hangs at
're.findall'

pattern = re.compile("(.*)<img (.*?) src=\"(.*?)img(.*?)\"(.*?)",
re.S)

print "before find all"

match = re.findall(pattern, data)

if (match):
print "match2"

Can you please tell me why it that?

It doesn't hang when I try it. Why don't you post a complete example
that hangs.

Also, you might consider using exterior single quotes around your
string so that you don't have to escape double quotes inside the
string.
 
G

Gabriel Genellina

Could it be that it is just slow? If not, post a small example of data
that
provokes findall() to hang.

I bet it is very slooooooow!
To the OP: do you actually need all those groups? Specially the first and
last (.*), they match all the surrounding text.
 
I

irstas

But when 'data' does not contain pattern, it just hangs at
're.findall'

pattern = re.compile("(.*)<img (.*?) src=\"(.*?)img(.*?)\"(.*?)",
re.S)

That pattern is just really slow to evaluate. What you want is
probably something more like this:

re.compile(r'<img [^>]*src\s*=\s*"([^"]*img[^"]*)"')

"dot" is usually not so great. Prefer "NOT end-character", like [^>]
or [^"].
 
S

silverburgh.meryl

But when 'data' does not contain pattern, it just hangs at
're.findall'
pattern = re.compile("(.*)<img (.*?) src=\"(.*?)img(.*?)\"(.*?)",
re.S)

That pattern is just really slow to evaluate. What you want is
probably something more like this:

re.compile(r'<img [^>]*src\s*=\s*"([^"]*img[^"]*)"')

"dot" is usually not so great. Prefer "NOT end-character", like [^>]
or [^"].

Thank you. Your suggestion solves my problem!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,202
Messages
2,571,057
Members
47,667
Latest member
DaniloB294

Latest Threads

Top