P
P. Daniell
I have the following HTML document
<html>
<body>
<a href="http://www.yahoo.com">I don't give a hoot</a>
</body>
</html>
I want my HTMLParser subclass (code below) to output
http://www.yahoo.com I don't give a hoot
Instead it outputs
http://www.yahoo.com I don
http://www.yahoo.com '
http://www.yahoo.com t give a hoot
Would anyone care to give me some guidance on how to fix this?
Thanks,
PD
class URLLister(HTMLParser):
def __init__(self):
HTMLParser.__init__(self, formatter.NullFormatter())
self.in_a = 0
self.tempurl = ''
def anchor_bgn(self, href, name, type):
self.in_a = 1
self.tempurl = href
def anchor_end(self):
self.in_a = 0
def handle_data(self, data):
if self.in_a == 1:
print self.tempurl, data
<html>
<body>
<a href="http://www.yahoo.com">I don't give a hoot</a>
</body>
</html>
I want my HTMLParser subclass (code below) to output
http://www.yahoo.com I don't give a hoot
Instead it outputs
http://www.yahoo.com I don
http://www.yahoo.com '
http://www.yahoo.com t give a hoot
Would anyone care to give me some guidance on how to fix this?
Thanks,
PD
class URLLister(HTMLParser):
def __init__(self):
HTMLParser.__init__(self, formatter.NullFormatter())
self.in_a = 0
self.tempurl = ''
def anchor_bgn(self, href, name, type):
self.in_a = 1
self.tempurl = href
def anchor_end(self):
self.in_a = 0
def handle_data(self, data):
if self.in_a == 1:
print self.tempurl, data