D
Dietrich Bollmann
Hi,
Are there any functions in python to convert between different Japanese
coding systems?
I would like to convert between (at least) ISO-2022-JP, UTF-8, EUC-JP
and SJIS. I also need some function to encode / decode base64 encoded
strings.
I get the strings (which actually are emails) from a server on the
internet with:
import urllib
server = urllib.urlopen(serverURL, parameters)
email = server.read()
The coding systems are given in the response string:
Example:
email = '''[...]
Subject:
=?UTF-8?Q?romaji=E3=81=B2=E3=82=89=E3=81=8C=E3=81=AA=E3=82=AB=E3=82=BF?=
=?UTF-8?Q?=E3=82=AB=E3=83=8A=E6=BC=A2=E5=AD=97?=
[...]
Content-Type: text/plain; charset=EUC-JP
[...]
Content-Transfer-Encoding: base64
[...]
cm9tYWpppNKk6aSspMqlq6W/paulyrTBu/oNCg0K
'''
My idea is to first parse the 'email' string and to extract the email
body as well as the values of the 'Subject: ', the 'Content-Type: ' and
the 'Content-Transfer-Encoding: ' attributes and to after use them to
convert them to some other coding system:
Something in the lines of:
(subject, contentType, contentTransferEncoding, content) =
parseEmail(email)
to = 'utf-8'
subjectUtf8 = decodeSubject(subject, to)
from = contentType
to = 'utf-8'
contentUtf8 = convertCodingSystem(decodeBase64(content), from, to)
The only problem is that I could not find any standard functionality to
convert between different Japanese coding systems.
Thanks,
Dietrich Bollmann
Are there any functions in python to convert between different Japanese
coding systems?
I would like to convert between (at least) ISO-2022-JP, UTF-8, EUC-JP
and SJIS. I also need some function to encode / decode base64 encoded
strings.
I get the strings (which actually are emails) from a server on the
internet with:
import urllib
server = urllib.urlopen(serverURL, parameters)
email = server.read()
The coding systems are given in the response string:
Example:
email = '''[...]
Subject:
=?UTF-8?Q?romaji=E3=81=B2=E3=82=89=E3=81=8C=E3=81=AA=E3=82=AB=E3=82=BF?=
=?UTF-8?Q?=E3=82=AB=E3=83=8A=E6=BC=A2=E5=AD=97?=
[...]
Content-Type: text/plain; charset=EUC-JP
[...]
Content-Transfer-Encoding: base64
[...]
cm9tYWpppNKk6aSspMqlq6W/paulyrTBu/oNCg0K
'''
My idea is to first parse the 'email' string and to extract the email
body as well as the values of the 'Subject: ', the 'Content-Type: ' and
the 'Content-Transfer-Encoding: ' attributes and to after use them to
convert them to some other coding system:
Something in the lines of:
(subject, contentType, contentTransferEncoding, content) =
parseEmail(email)
to = 'utf-8'
subjectUtf8 = decodeSubject(subject, to)
from = contentType
to = 'utf-8'
contentUtf8 = convertCodingSystem(decodeBase64(content), from, to)
The only problem is that I could not find any standard functionality to
convert between different Japanese coding systems.
Thanks,
Dietrich Bollmann