UTF8 & HTMLParser

Jan Danielsson · Dec 1, 2006

Hello all,

I'm writing a python script which fetches a HTML-page (using wget),
and then parses the retrieved page using a custom htmllib HTMLParser.

The page I fetch is encoded in utf8, and my text-handler currently
looks like this:

def handle_data(self, text):
if self.inOption:
self.currentName = text

However, I would like to convert the "text" (which is utf8) to
latin-1. How do I do that? I've been trying to figure it out for some
time now, and I'm just getting frustrated. :-(

Jan Danielsson · Dec 1, 2006

Jan said:
Hello all,

I'm writing a python script which fetches a HTML-page (using wget),
and then parses the retrieved page using a custom htmllib HTMLParser.

The page I fetch is encoded in utf8, and my text-handler currently
looks like this:

def handle_data(self, text):
if self.inOption:
self.currentName = text

However, I would like to convert the "text" (which is utf8) to
latin-1. How do I do that? I've been trying to figure it out for some
time now, and I'm just getting frustrated. :-(

I should have mentioned: The problem appears to be that I can't seem
to find a way to make python understand that "text" (the above argument)
is in fact already utf-8.

Klaus Alexander Seistrup · Dec 1, 2006

Jan said:
However, I would like to convert the "text" (which is utf8)
to latin-1. How do I do that?

How about:

latin = unicode(text, 'utf-8').encode('iso-8859-1')

Please see help(u''.encode) for details about error handling. You
might also want to trap errors in a try-except statement.

Cheers,

HTMLParser skipping HTML? [newbie]	6	Sep 5, 2012
HTMLParser not parsing whole html file	4	Oct 24, 2010
Turning HTMLParser into an iterator	0	Jun 1, 2009
Parsing HTML--looking for info/comparison of HTMLParser vs. htmllibmodules.	1	Jul 7, 2006
HTMLParser problems.	11	Oct 30, 2003
HTMLParser question	1	Aug 19, 2004
Buffering HTML as HTMLParser reads it?	3	Aug 1, 2007
[ENCODING] UTF8 hell	12	Feb 2, 2010

UTF8 & HTMLParser

Jan Danielsson

Jan Danielsson

Klaus Alexander Seistrup

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads