W
W. Trevor King
Hello list!
I'm trying to figure out how to flatten a MIMEText message to bytes
using an 8bit Content-Transfer-Encoding in Python 3.3. Here's what
I've tried so far:
# -*- encoding: utf-8 -*-
import email.encoders
from email.charset import Charset
from email.generator import BytesGenerator
from email.mime.text import MIMEText
import sys
body = 'ΖεÏÏ‚'
encoding = 'utf-8'
charset = Charset(encoding)
charset.body_encoding = email.encoders.encode_7or8bit
message = MIMEText(body, 'plain', encoding)
del message['Content-Transfer-Encoding']
message.set_payload(body, charset)
try:
BytesGenerator(sys.stdout.buffer).flatten(message)
except UnicodeEncodeError as e:
print('error with string input:')
print(e)
message = MIMEText(body, 'plain', encoding)
del message['Content-Transfer-Encoding']
message.set_payload(body.encode(encoding), charset)
try:
BytesGenerator(sys.stdout.buffer).flatten(message)
except TypeError as e:
print('error with byte input:')
print(e)
The `del m[…]; m.set_payload()` bits work around #16324 [1] and should
be orthogonal to the encoding issues. It's possible that #12553 is
trying to address this issue [2,3], but that issue's comments are a
bit vague, so I'm not sure.
The problem with the string payload is that
email.generator.BytesGenerator.write is getting the Unicode string
payload unencoded and trying to encode it as ASCII. It may be
possible to work around this by encoding the payload so that anything
that doesn't encode (using the body charset) to a 7bit value is
replaced with a surrogate escape, but I'm not sure how to do that.
The problem with the byte payload is that _has_surrogates (used in
email.generator.Generator._handle_text and
BytesGenerator._handle_text) chokes on byte input:
TypeError: can't use a string pattern on a bytes-like object
For UTF-8, you can get away with:
message.as_string().encode(message.get_charset().get_output_charset())
because the headers are encoded into 7 bits, so re-encoding them with
UTF-8 is a no-op. However, if the body charset is UTF-16-LE or any
other encoding that remaps 7bit characters, this hack breaks down.
Thoughts?
Trevor
[1]: http://bugs.python.org/issue16324
[2]: http://bugs.python.org/issue12553
[3]: http://bugs.python.org/issue12552#msg140294
--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAEBAgAGBQJRAN08AAoJEEUbTsx0l5OMfwAP/3oX6AhlhUhNVaUb99mVJe4C
moT+pN3ribyhdrxxy6elUxOzywGkVUIBlK29etu97LZIGNLUJ7/2qL1P6YF3oLE4
aODfAztnCicqWWmvjITMdfY54yJaspDdSMyO4lIN/5OtVnPYejLkWUEFI/CXqGgh
kFG/RQWAaRW49AESGWy+2pZCr3QaGeBUA6axoPHYa2b9H/5uN9OT8qUiOeVyBKBZ
n+gcb3PbK3nthIehr7W7fqZ6GtnXoDuIO9zSopVjrEfn0/BSJtvhdifv8pNezevN
tvuWTBCIMGAj76XO9nh7I7JZOtDHmmtSKb523pyZiZBkhMeTFcrH7MgNPJ3sT2Jx
+WKVW1ui/YmW5e2weXvEBlnYLpb/3lRzYLDsQAIgzPxPbmw14yQqJlobzPPyDDXN
GnjmRdEV7GaJekiOOiNxCCOYbwIvKv2Xm/txiEO25gotzYZUQ4AP2BXNamMStUmX
pFC+K8pPJNzeWpVUqzUTkYbWit2QgPUJWS4Dwt2kgV5Qv6ut0dYJaeCRWuttUoMx
jcxiL7uSN2g7czERVA/a81kzYsUphcUWtuO+nBVjl+8AGosLDamm6WOZtwVMzagm
vHgrlcJ9vIULDy9HiI9AkUrmiAKMKbYVu/X9OnMK85IdaFiJy6CCv+Lm9XDXoOiw
fuFfS/uVNPIRjAv9euT2
=OT7m
-----END PGP SIGNATURE-----
I'm trying to figure out how to flatten a MIMEText message to bytes
using an 8bit Content-Transfer-Encoding in Python 3.3. Here's what
I've tried so far:
# -*- encoding: utf-8 -*-
import email.encoders
from email.charset import Charset
from email.generator import BytesGenerator
from email.mime.text import MIMEText
import sys
body = 'ΖεÏÏ‚'
encoding = 'utf-8'
charset = Charset(encoding)
charset.body_encoding = email.encoders.encode_7or8bit
message = MIMEText(body, 'plain', encoding)
del message['Content-Transfer-Encoding']
message.set_payload(body, charset)
try:
BytesGenerator(sys.stdout.buffer).flatten(message)
except UnicodeEncodeError as e:
print('error with string input:')
print(e)
message = MIMEText(body, 'plain', encoding)
del message['Content-Transfer-Encoding']
message.set_payload(body.encode(encoding), charset)
try:
BytesGenerator(sys.stdout.buffer).flatten(message)
except TypeError as e:
print('error with byte input:')
print(e)
The `del m[…]; m.set_payload()` bits work around #16324 [1] and should
be orthogonal to the encoding issues. It's possible that #12553 is
trying to address this issue [2,3], but that issue's comments are a
bit vague, so I'm not sure.
The problem with the string payload is that
email.generator.BytesGenerator.write is getting the Unicode string
payload unencoded and trying to encode it as ASCII. It may be
possible to work around this by encoding the payload so that anything
that doesn't encode (using the body charset) to a 7bit value is
replaced with a surrogate escape, but I'm not sure how to do that.
The problem with the byte payload is that _has_surrogates (used in
email.generator.Generator._handle_text and
BytesGenerator._handle_text) chokes on byte input:
TypeError: can't use a string pattern on a bytes-like object
For UTF-8, you can get away with:
message.as_string().encode(message.get_charset().get_output_charset())
because the headers are encoded into 7 bits, so re-encoding them with
UTF-8 is a no-op. However, if the body charset is UTF-16-LE or any
other encoding that remaps 7bit characters, this hack breaks down.
Thoughts?
Trevor
[1]: http://bugs.python.org/issue16324
[2]: http://bugs.python.org/issue12553
[3]: http://bugs.python.org/issue12552#msg140294
--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAEBAgAGBQJRAN08AAoJEEUbTsx0l5OMfwAP/3oX6AhlhUhNVaUb99mVJe4C
moT+pN3ribyhdrxxy6elUxOzywGkVUIBlK29etu97LZIGNLUJ7/2qL1P6YF3oLE4
aODfAztnCicqWWmvjITMdfY54yJaspDdSMyO4lIN/5OtVnPYejLkWUEFI/CXqGgh
kFG/RQWAaRW49AESGWy+2pZCr3QaGeBUA6axoPHYa2b9H/5uN9OT8qUiOeVyBKBZ
n+gcb3PbK3nthIehr7W7fqZ6GtnXoDuIO9zSopVjrEfn0/BSJtvhdifv8pNezevN
tvuWTBCIMGAj76XO9nh7I7JZOtDHmmtSKb523pyZiZBkhMeTFcrH7MgNPJ3sT2Jx
+WKVW1ui/YmW5e2weXvEBlnYLpb/3lRzYLDsQAIgzPxPbmw14yQqJlobzPPyDDXN
GnjmRdEV7GaJekiOOiNxCCOYbwIvKv2Xm/txiEO25gotzYZUQ4AP2BXNamMStUmX
pFC+K8pPJNzeWpVUqzUTkYbWit2QgPUJWS4Dwt2kgV5Qv6ut0dYJaeCRWuttUoMx
jcxiL7uSN2g7czERVA/a81kzYsUphcUWtuO+nBVjl+8AGosLDamm6WOZtwVMzagm
vHgrlcJ9vIULDy9HiI9AkUrmiAKMKbYVu/X9OnMK85IdaFiJy6CCv+Lm9XDXoOiw
fuFfS/uVNPIRjAv9euT2
=OT7m
-----END PGP SIGNATURE-----