URL listers

P

P. Daniell

I have the following HTML document

<html>
<body>
<a href="http://www.yahoo.com">I don't give a hoot</a>
</body>
</html>

I want my HTMLParser subclass (code below) to output

http://www.yahoo.com I don't give a hoot

Instead it outputs

http://www.yahoo.com I don
http://www.yahoo.com '
http://www.yahoo.com t give a hoot


Would anyone care to give me some guidance on how to fix this?

Thanks,
PD



class URLLister(HTMLParser):
def __init__(self):
HTMLParser.__init__(self, formatter.NullFormatter())
self.in_a = 0
self.tempurl = ''

def anchor_bgn(self, href, name, type):
self.in_a = 1
self.tempurl = href

def anchor_end(self):
self.in_a = 0

def handle_data(self, data):
if self.in_a == 1:
print self.tempurl, data
 
P

Peter Otten

P. Daniell said:
I have the following HTML document

<html>
<body>
<a href="http://www.yahoo.com">I don't give a hoot</a>
</body>
</html>

I want my HTMLParser subclass (code below) to output

http://www.yahoo.com I don't give a hoot

Instead it outputs

http://www.yahoo.com I don
http://www.yahoo.com '
http://www.yahoo.com t give a hoot


Would anyone care to give me some guidance on how to fix this?

handle_data() can be called multiple times inside <tag>...</tag>, so you
must collect the chunks (see the text attribute below) and only print them
in the anchor_end() method:

class URLLister(htmllib.HTMLParser):
def __init__(self):
htmllib.HTMLParser.__init__(self, formatter.NullFormatter())
self.in_a = 0
self.tempurl = ''
self.text = []

def anchor_bgn(self, href, name, type):
self.in_a = 1
self.tempurl = href

def anchor_end(self):
print self.tempurl, "".join(self.text)
del self.text[:]
self.in_a = 0

def handle_data(self, data):
if self.in_a:
self.text.append(data)


By the way, there is another HTMLParser in the HTMLParser module,
which I think is superior.

Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,170
Messages
2,570,925
Members
47,466
Latest member
DrusillaYa

Latest Threads

Top