G
gdetre
Dear all,
My lab has been using a Movable Type blog for internal communication
and announcement for a couple of years, but we've now seen the light
and I've set up a MoinMoin wiki. Everything's installed beautifully, so
I'm writing scripts to export all our Movable Type blog posts to wiki
pages. So far so good.
The only issue I'm having relates to Unicode. MoinMoin and python are
pretty unforgiving about files that contain Unicode characters that
aren't included in the coding properly. I've spent hours reading about
Unicode, and playing with different encoding/decoding commands, but at
this point, I just want a hacky solution that will ignore the
improperly coded characters or replace them with placeholders.
Can anyone recommend a simple surefire Unix/Python/Perl command that
will help me avoid errors like the one below? Any suggestions would be
hugely appreciated.
Thank you very much for your time,
Yours,
Greg
----
'utf8' codec can't decode byte 0x96 in position 4910: unexpected code
byte
* args = ('utf8', 'AUTHOR: blahblah\n\nTITLE: Reading Course
Readings... G. A. \x96 For references see blahblah.\n\n\n-----\n\n',
4910, 4911, 'unexpected code byte')
* encoding = 'utf8'
* end = 4911
* object = 'AUTHOR: blahblah\n\nTITLE: Reading Course Readings...
G. A. \x96 For references see blahblah.\n\n\n-----\n\n'
* reason = 'unexpected code byte'
* start = 4910
My lab has been using a Movable Type blog for internal communication
and announcement for a couple of years, but we've now seen the light
and I've set up a MoinMoin wiki. Everything's installed beautifully, so
I'm writing scripts to export all our Movable Type blog posts to wiki
pages. So far so good.
The only issue I'm having relates to Unicode. MoinMoin and python are
pretty unforgiving about files that contain Unicode characters that
aren't included in the coding properly. I've spent hours reading about
Unicode, and playing with different encoding/decoding commands, but at
this point, I just want a hacky solution that will ignore the
improperly coded characters or replace them with placeholders.
Can anyone recommend a simple surefire Unix/Python/Perl command that
will help me avoid errors like the one below? Any suggestions would be
hugely appreciated.
Thank you very much for your time,
Yours,
Greg
----
'utf8' codec can't decode byte 0x96 in position 4910: unexpected code
byte
* args = ('utf8', 'AUTHOR: blahblah\n\nTITLE: Reading Course
Readings... G. A. \x96 For references see blahblah.\n\n\n-----\n\n',
4910, 4911, 'unexpected code byte')
* encoding = 'utf8'
* end = 4911
* object = 'AUTHOR: blahblah\n\nTITLE: Reading Course Readings...
G. A. \x96 For references see blahblah.\n\n\n-----\n\n'
* reason = 'unexpected code byte'
* start = 4910