Trouble Encoding

fingermark · Jun 7, 2005

I'm using feedparser to parse the following:

<div class="indent text">Adv: Termite Inspections! Jenny Moyer welcomes
you to her HomeFinderResource.com TM A "MUST See &hellip;</div>

I'm receiveing the following error when i try to print the feedparser
parsing of the above text:

UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in
position 86: ordinal not in range(256)

Why is this happening and where does the problem lie?

thanks

deelan · Jun 7, 2005

I'm using feedparser to parse the following:

<div class="indent text">Adv: Termite Inspections! Jenny Moyer welcomes
you to her HomeFinderResource.com TM A "MUST See &hellip;</div>

I'm receiveing the following error when i try to print the feedparser
parsing of the above text:

UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in
position 86: ordinal not in range(256)

Why is this happening and where does the problem lie?

it seems that the unicode character 0x201c isn't part
of the latin-1 charset, see:

"LEFT DOUBLE QUOTATION MARK"
<http://www.fileformat.info/info/unicode/char/201c/index.htm>

try to encode the feedparser output to UTF-8 instead, or
use the "replace" option for the encode() method.

ok, let's try replace
'?'

using "replace" will not throw an error, but it will replace
the offending characther with a question mark.

HTH.

fingermark · Jun 7, 2005

why is it even trying latin-1 at all? I don't see it anywhere in
feedparser.py or my code.

Jarek Zgoda · Jun 7, 2005

(e-mail address removed) napisa³(a):

why is it even trying latin-1 at all? I don't see it anywhere in
feedparser.py or my code.

Check your site.py or sitecustomize.py module, you can have non-standard
default encoding set there.

John Roth · Jun 7, 2005

I'm using feedparser to parse the following:

<div class="indent text">Adv: Termite Inspections! Jenny Moyer welcomes
you to her HomeFinderResource.com TM A "MUST See &hellip;</div>

I'm receiveing the following error when i try to print the feedparser
parsing of the above text:

UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in
position 86: ordinal not in range(256)

Why is this happening and where does the problem lie?

Several different things are going on here. First, when you try to
print a unicode string using str() or a similar function, Python is going to
use the default encoding to render it. The default encoding is usually
ASCII-7. Why it's trying to use Latin-1 in this case is somewhat
of a mystery.

The quote in front of the word MUST is a "smart quote", that is a
curly quote, and it is not a valid character in either ASCII or
Latin-1. Use Windows-1252 explicitly, and it should render
properly. Alternatively use UTF-8, as one of the other posters
suggested. Then it's up to whatever software you use to actually
put the ink on the paper to render it properly, but that's a different
issue.

John Roth

Kent Johnson · Jun 8, 2005

John said:
Several different things are going on here. First, when you try to
print a unicode string using str() or a similar function, Python is
going to
use the default encoding to render it. The default encoding is usually
ASCII-7. Why it's trying to use Latin-1 in this case is somewhat
of a mystery.

Actually I believe it will use sys.stdout.encoding for this, which is presumably latin-1 on fingermark's machine.

Kent

Encodign issue in Python 3.3.1 (once again)	42	May 26, 2013
'ascii' codec can't encode character u'\u2013'	3	Sep 30, 2005
MySQL: 'latin-1' codec can't encode character	4	May 13, 2005
Unicode chr(150) en dash	13	Apr 16, 2008
Questions about working with character encodings	1	Dec 15, 2005
UnicodeEncodeError in Windows	2	Sep 17, 2007
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006

Trouble Encoding

fingermark

deelan

fingermark

Jarek Zgoda

John Roth

Kent Johnson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads