J
Johnny Lee
Hi,
I've met a problem in match a regular expression in python. Hope
any of you could help me. Here are the details:
I have many tags like this:
xxx<a href="http://xxx.xxx.xxx" xxx>xxx
xxx<a href="wap://xxx.xxx.xxx" xxx>xxx
xxx<a href="http://xxx.xxx.xxx" xxx>xxx
.....
And I want to find all the "http://xxx.xxx.xxx" out, so I do it
like this:
httpPat = re.compile("(<a )(href=\")(http://.*)(\")")
result = httpPat.findall(data)
I use this to observe my output:
for i in result:
print i[2]
Surprisingly I will get some output like this:
http://xxx.xxx.xxx">xxx</a>xxx
In fact it's filtered from this kind of source:
<a href="http://xxx.xxx.xxx">xxx</a>xxx"
But some result are right, I wonder how can I get the all the
answers clean like "http://xxx.xxx.xxx"? Thanks for your help.
Regards,
Johnny
I've met a problem in match a regular expression in python. Hope
any of you could help me. Here are the details:
I have many tags like this:
xxx<a href="http://xxx.xxx.xxx" xxx>xxx
xxx<a href="wap://xxx.xxx.xxx" xxx>xxx
xxx<a href="http://xxx.xxx.xxx" xxx>xxx
.....
And I want to find all the "http://xxx.xxx.xxx" out, so I do it
like this:
httpPat = re.compile("(<a )(href=\")(http://.*)(\")")
result = httpPat.findall(data)
I use this to observe my output:
for i in result:
print i[2]
Surprisingly I will get some output like this:
http://xxx.xxx.xxx">xxx</a>xxx
In fact it's filtered from this kind of source:
<a href="http://xxx.xxx.xxx">xxx</a>xxx"
But some result are right, I wonder how can I get the all the
answers clean like "http://xxx.xxx.xxx"? Thanks for your help.
Regards,
Johnny