Unicode and MoinMoin

gdetre · Feb 27, 2006

Dear all,

My lab has been using a Movable Type blog for internal communication
and announcement for a couple of years, but we've now seen the light
and I've set up a MoinMoin wiki. Everything's installed beautifully, so
I'm writing scripts to export all our Movable Type blog posts to wiki
pages. So far so good.

The only issue I'm having relates to Unicode. MoinMoin and python are
pretty unforgiving about files that contain Unicode characters that
aren't included in the coding properly. I've spent hours reading about
Unicode, and playing with different encoding/decoding commands, but at
this point, I just want a hacky solution that will ignore the
improperly coded characters or replace them with placeholders.

Can anyone recommend a simple surefire Unix/Python/Perl command that
will help me avoid errors like the one below? Any suggestions would be
hugely appreciated.

Thank you very much for your time,

Yours,
Greg

----

'utf8' codec can't decode byte 0x96 in position 4910: unexpected code
byte

* args = ('utf8', 'AUTHOR: blahblah\n\nTITLE: Reading Course
Readings... G. A. \x96 For references see blahblah.\n\n\n-----\n\n',
4910, 4911, 'unexpected code byte')
* encoding = 'utf8'
* end = 4911
* object = 'AUTHOR: blahblah\n\nTITLE: Reading Course Readings...
G. A. \x96 For references see blahblah.\n\n\n-----\n\n'
* reason = 'unexpected code byte'
* start = 4910

Neil Hodgson · Feb 27, 2006

Greg:

The only issue I'm having relates to Unicode. MoinMoin and python are
pretty unforgiving about files that contain Unicode characters that
aren't included in the coding properly. I've spent hours reading about
Unicode, and playing with different encoding/decoding commands, but at
this point, I just want a hacky solution that will ignore the
improperly coded characters or replace them with placeholders.

Call the codec with the errors argument set to "ignore" or "replace".
A. \x96 For references see blahblah.\n\n\n-----\n\n', 'utf8')
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "c:\python24\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 58:
unexpected code byteA. \x96 For references see blahblah.\n\n\n-----\n\n', 'utf8', 'replace')
u'AUTHOR: blahblah\n\nTITLE: Reading Course Readings... G. A. \ufffd For
references see blahblah.\n\n\n-----\n\n'

BTW, its probably in Windows-1252 where it would be a dash.
Depending on your context it may pay to handle the exception instead of
using "replace" and attempt interpreting as Windows-1252.

Neil

Fredrik Lundh · Feb 27, 2006

Neil said:
Call the codec with the errors argument set to "ignore" or "replace".

A. \x96 For references see blahblah.\n\n\n-----\n\n', 'utf8')
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "c:\python24\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 58:
unexpected code byte
A. \x96 For references see blahblah.\n\n\n-----\n\n', 'utf8', 'replace')
u'AUTHOR: blahblah\n\nTITLE: Reading Course Readings... G. A. \ufffd For
references see blahblah.\n\n\n-----\n\n'

BTW, its probably in Windows-1252 where it would be a dash.
Depending on your context it may pay to handle the exception instead of
using "replace" and attempt interpreting as Windows-1252.

here's one way to explicitly deal with 1252 gremlins:

http://effbot.org/zone/unicode-gremlins.htm

</F>

given char* utf8, how to read unicode line by line, and output utf8	2	Mar 13, 2012
unicode and data strings	0	Jan 28, 2005
LWP and Unicode	17	Oct 2, 2006
XML_RPC and unicode problems	8	Sep 17, 2004
decode unicode string using 'unicode_escape' codecs	2	Jan 13, 2006
Error in Handling Unicode(UTF16-LE) File & String	4	May 6, 2008
generate and send mail with python: tutorial	8	Aug 11, 2011
What the \xc2\xa0 ?!!	1	Sep 7, 2010

Unicode and MoinMoin

gdetre

Neil Hodgson

Fredrik Lundh

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads