K
kettle
Hi,
I am rather new to python, and am currently struggling with some
encoding issues. I have some utf-8-encoded text which I need to
encode as iso-2022-jp before sending it out to the world. I am using
python's encode functions:
--
var = var.encode("iso-2022-jp", "replace")
print var
--
I am using the 'replace' argument because there seem to be a couple
of utf-8 japanese characters which python can't correctly convert to
iso-2022-jp. The output looks like this:
↓æ±äº¬???日比谷線?北åƒä½è¡Œ
However if use perl's encode module to re-encode the exact same bit
of text:
--
$var = encode("iso-2022-jp", decode("utf8", $var))
print $var
--
I get proper output (no unsightly question-marks):
↓æ±äº¬ãƒ¡ãƒˆãƒæ—¥æ¯”谷線・北åƒä½è¡Œ
So, what's the deal? Why can't python properly encode some of these
characters? I know there are a host of different iso-2022-jp
variants, could it be using a different one than I think (the
default)? I'm quite liking python at the moment for a variety of
different reasons (I suspect perl will forever win when it comes to
regular expressions but everything else is pretty darn nice), but this
is a bit worrying.
-Joe
I am rather new to python, and am currently struggling with some
encoding issues. I have some utf-8-encoded text which I need to
encode as iso-2022-jp before sending it out to the world. I am using
python's encode functions:
--
var = var.encode("iso-2022-jp", "replace")
print var
--
I am using the 'replace' argument because there seem to be a couple
of utf-8 japanese characters which python can't correctly convert to
iso-2022-jp. The output looks like this:
↓æ±äº¬???日比谷線?北åƒä½è¡Œ
However if use perl's encode module to re-encode the exact same bit
of text:
--
$var = encode("iso-2022-jp", decode("utf8", $var))
print $var
--
I get proper output (no unsightly question-marks):
↓æ±äº¬ãƒ¡ãƒˆãƒæ—¥æ¯”谷線・北åƒä½è¡Œ
So, what's the deal? Why can't python properly encode some of these
characters? I know there are a host of different iso-2022-jp
variants, could it be using a different one than I think (the
default)? I'm quite liking python at the moment for a variety of
different reasons (I suspect perl will forever win when it comes to
regular expressions but everything else is pretty darn nice), but this
is a bit worrying.
-Joe