How to exctract title of links

raver2046 · Apr 26, 2005

here i have a link <a href="http://raver2046.ath.cx/CV/">cv network
admin</a>

how to extract "cv network admin"

here is the code i have find to exctract link but not title of link

----------------------------
import htmllib, formatter, urllib
class x(htmllib.HTMLParser):
def dump(self, tag, attrs):
#print tag,
for a, v in attrs:
if a in ['a', 'src', 'href']:
print v,

print
#def do_img(self, attrs):
# self.dump('img', attrs)
def start_a(self, attrs):
self.dump('a', attrs)
#def start_form(self, attrs):
# self.dump('form', attrs)

y = x(formatter.NullFormatter())
y.feed(urllib.urlopen('http://www.aquabase.org/fish/dump.php3').read())
y.close()

prasad · Apr 26, 2005

import htmllib, formatter, urllib
class x(htmllib.HTMLParser):
inanchor = True # indicates whether we are inside anchor element
def dump(self, tag, attrs):
#print tag,
for a, v in attrs:
if a in ['a', 'src', 'href']:
print v,

print
#def do_img(self, attrs):
# self.dump('img', attrs)
def start_a(self, attrs):
self.dump('a', attrs)
self.inanchor = True # yes now we are in anchor element

def handle_data(self,data):
if self.inanchor:
print data # lets us print the anchor element inner data
self.inanchor = False # we handled the anchor element data
# this is not a nice way, self.inanchor should be set false

# when </a> is reached. try in end_a(self) ...

#def start_form(self, attrs):
# self.dump('form', attrs)

y = x(formatter.NullFormatter())
y.feed(urllib.urlopen('http://www.aquabase.org/fish/dump.php3').read())
y.close()

Larry Bates · Apr 26, 2005

You should take a look at BeautifulSoup at:

http://www.python.org/pypi/BeautifulSoup/2.0.2

Larry Bates

help with link parsing?	3	Dec 20, 2010
Newbie, list has no attribute iteritems	2	Jul 4, 2008
Ideas on how to parse a dynamically generated html pages	1	Oct 22, 2010
HTMLParser skipping HTML? [newbie]	6	Sep 5, 2012
Executing Javascript, then reading value	6	Jan 29, 2007
confused by HTMLParser class	3	May 28, 2008
HTMLParser can't read japanese	3	Apr 13, 2010
cut strings and parse for images	5	Dec 6, 2004

How to exctract title of links

raver2046

prasad

Larry Bates

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads