My re is as below:
_____________________________________________
r=re.compile(ur'valign=top>(?P<number>\d{1,2})</td><td[^>]*>\s{0,2}'
ur'<a href="(?P<url>[^<>]+\.mp3)"( )target=_blank>'
ur'(?P<name>.+)</td>',re.UNICODE|re.IGNORECASE)
_____________________________________________
There should be over 30 matches in the html. But I find nothing by
re.finditer(html) because my last line of re is wrong. I can't use
"(?P<name>.+)</td>" because there are many many "</td>" in the html
and I just want the ".*" to match what are before the firest "</td>".
So I think if there is some idea I can exclude a word, this will be
done. Assume there is "NOT(WORD)" can do it, I just need to write the
last line of the re as "(?P<name>(NOT(</td>))+)</td>".
But I still have no idea after thinking and trying for a very long time.
In other words, I want the "</td>" of "(?P<name>.+)</td>" to be
exactly the first "</td>" in this match. And there is more than one
match in this html, so this must be done by using re.
And I can't use any of your idea because what I want I deal with is a
very complicated html, not just a single line of word.
I can copy part of the html up to here but it's kinda too lengthy.
could said:
In re, the punctuation "^" can exclude a single character, but I want
to exclude a whole word now. for example I have a string "hi, how are
you. hello", I want to extract all the part before the world "hello",
I can't use ".*[^hello]" because "^" only exclude single char "h" or
"e" or "l" or "o". Will somebody tell me how to do it? Thanks.
(1) Why must you use re? It's often a good idea to use string methods
where they can do the job you want.
(2) What do you want to have happen if "hello" is not in the string?
Example:
C:\junk>type upto.py
def upto(strg, what):
k = strg.find(what)
if k > -1:
return strg[:k]
return None # or raise an exception
helo = "hi, how are you? HELLO I'm fine, thank you hello hello hello.
that's it"
print repr(upto(helo, "HELLO"))
print repr(upto(helo, "hello"))
print repr(upto(helo, "hi"))
print repr(upto(helo, "goodbye"))
print repr(upto("", "goodbye"))
print repr(upto("", ""))
C:\junk>upto.py
'hi, how are you? '
"hi, how are you? HELLO I'm fine, thank you "
''
None
None
''
HTH,
John