Email headers and non-ASCII characters

C

Christoph Haas

Hello, everyone...

I'm trying to send an email to people with non-ASCII characters in their
names. A recpient's address may look like:

"Jörg Nørgens" <joerg@nowhere>

My example code:

=================================
def sendmail(sender, recipient, body, subject):
message = MIMEText(body)
message['Subject'] = Header(subject, 'iso-8859-1')
message['From'] = Header(sender, 'iso-8859-1')
message['To'] = Header(recipient, 'iso-8859-1')

s = smtplib.SMTP()
s.connect()
s.sendmail(sender, recipient, message.as_string())
s.close()
=================================

However the Header() method encodes the whole expression in ISO-8859-1:

=?iso-8859-1?q?=22J=C3=B6rg_N=C3=B8rgens=22_=3Cjoerg=40nowhere=3E?=

However I had expected something like:

"=?utf-8?q?J=C3=B6rg?= =?utf-8?q?_N=C3=B8rgens?=" <joerg@nowhere>

Of course my mail transfer agent is not happy with the first string
although I see that Header() is just doing its job. I'm looking for a way
though to encode just the non-ASCII parts like any mail client does. Does
anyone have a recipe on how to do that? Or is there a method in
the "email" module of the standard library that does what I need? Or
should I split by regular expression to extract the email address
beforehand? Or a list comprehension to just look for non-ASCII character
and Header() them? Sounds dirty.

Hints welcome.

Regards
Christoph
 
M

Max M

Christoph Haas skrev:
Hello, everyone...

I'm trying to send an email to people with non-ASCII characters in their
names. A recpient's address may look like:

"Jörg Nørgens" <joerg@nowhere>

My example code:

=================================
def sendmail(sender, recipient, body, subject):
message = MIMEText(body)
message['Subject'] = Header(subject, 'iso-8859-1')
message['From'] = Header(sender, 'iso-8859-1')
message['To'] = Header(recipient, 'iso-8859-1')

s = smtplib.SMTP()
s.connect()
s.sendmail(sender, recipient, message.as_string())
s.close()
=================================

However the Header() method encodes the whole expression in ISO-8859-1:

=?iso-8859-1?q?=22J=C3=B6rg_N=C3=B8rgens=22_=3Cjoerg=40nowhere=3E?=

However I had expected something like:

"=?utf-8?q?J=C3=B6rg?= =?utf-8?q?_N=C3=B8rgens?=" <joerg@nowhere>

Of course my mail transfer agent is not happy with the first string


Why offcourse? But it seems that you are passing the Header object a
utf-8 encoded string, not a latin-1 encoded.

You are telling the header the encoding. Not asking it to encode.

--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science
 
C

Christoph Haas

Christoph Haas skrev:
Hello, everyone...

I'm trying to send an email to people with non-ASCII characters in
their names. A recpient's address may look like:

"Jörg Nørgens" <joerg@nowhere>

My example code:

=================================
def sendmail(sender, recipient, body, subject):
message = MIMEText(body)
message['Subject'] = Header(subject, 'iso-8859-1')
message['From'] = Header(sender, 'iso-8859-1')
message['To'] = Header(recipient, 'iso-8859-1')

s = smtplib.SMTP()
s.connect()
s.sendmail(sender, recipient, message.as_string())
s.close()
=================================

However the Header() method encodes the whole expression in
ISO-8859-1:

=?iso-8859-1?q?=22J=C3=B6rg_N=C3=B8rgens=22_=3Cjoerg=40nowhere=3E?=

However I had expected something like:

"=?utf-8?q?J=C3=B6rg?= =?utf-8?q?_N=C3=B8rgens?=" <joerg@nowhere>

Of course my mail transfer agent is not happy with the first string

Why offcourse?

Because my MTA doesn't care about MIME. It just transports the email. And
it expects an email address in said:
But it seems that you are passing the Header object a
utf-8 encoded string, not a latin-1 encoded.
You are telling the header the encoding. Not asking it to encode.

Uhm, okay. Let's see:

u'"Jörg Nørgens" <joerg@nowhere>'.encode('latin-1')

=> '"J\xc3\xb6rg N\xc3\xb8rgens" <joerg@nowhere>'

So far so good. Now run Header() on it:

=> '=?utf-8?b?IkrDtnJnIE7DuHJnZW5zIiA8am9lcmdAbm93aGVyZT4=?='

Still nothing like <...> in it and my MTA is unhappy again. What am I
missing? Doesn't anyone know how mail clients handle that encoding?

Desperately,
Christoph
 
L

Leo Kislov

Christoph said:
Hello, everyone...

I'm trying to send an email to people with non-ASCII characters in their
names. A recpient's address may look like:

"Jörg Nørgens" <joerg@nowhere>

My example code:

=================================
def sendmail(sender, recipient, body, subject):
message = MIMEText(body)
message['Subject'] = Header(subject, 'iso-8859-1')
message['From'] = Header(sender, 'iso-8859-1')
message['To'] = Header(recipient, 'iso-8859-1')

s = smtplib.SMTP()
s.connect()
s.sendmail(sender, recipient, message.as_string())
s.close()
=================================

However the Header() method encodes the whole expression in ISO-8859-1:

=?iso-8859-1?q?=22J=C3=B6rg_N=C3=B8rgens=22_=3Cjoerg=40nowhere=3E?=

However I had expected something like:

"=?utf-8?q?J=C3=B6rg?= =?utf-8?q?_N=C3=B8rgens?=" <joerg@nowhere>

Of course my mail transfer agent is not happy with the first string
although I see that Header() is just doing its job. I'm looking for a way
though to encode just the non-ASCII parts like any mail client does. Does
anyone have a recipe on how to do that? Or is there a method in
the "email" module of the standard library that does what I need? Or
should I split by regular expression to extract the email address
beforehand? Or a list comprehension to just look for non-ASCII character
and Header() them? Sounds dirty.

Why dirty?

from email.Header import Header
from itertools import groupby
h = Header()
addr = u'"Jörg Nørgens" <joerg@nowhere>'
def is_ascii(char):
return ord(char) < 128
for ascii, group in groupby(addr, is_ascii):
h.append(''.join(group),"latin-1")

print h
=>
"J =?iso-8859-1?q?=F6?= rg N =?iso-8859-1?q?=F8?= rgens"
<joerg@nowhere>

-- Leo
 
M

Max M

Christoph Haas skrev:
Christoph Haas skrev:
Hello, everyone...

I'm trying to send an email to people with non-ASCII characters in
their names. A recpient's address may look like:

"Jörg Nørgens" <joerg@nowhere>

My example code:

=================================
def sendmail(sender, recipient, body, subject):
message = MIMEText(body)
message['Subject'] = Header(subject, 'iso-8859-1')
message['From'] = Header(sender, 'iso-8859-1')
message['To'] = Header(recipient, 'iso-8859-1')

s = smtplib.SMTP()
s.connect()
s.sendmail(sender, recipient, message.as_string())
s.close()
=================================

However the Header() method encodes the whole expression in
ISO-8859-1:

=?iso-8859-1?q?=22J=C3=B6rg_N=C3=B8rgens=22_=3Cjoerg=40nowhere=3E?=

However I had expected something like:

"=?utf-8?q?J=C3=B6rg?= =?utf-8?q?_N=C3=B8rgens?=" <joerg@nowhere>

Of course my mail transfer agent is not happy with the first string
Why offcourse?

Because my MTA doesn't care about MIME. It just transports the email. And
it expects an email address in said:
But it seems that you are passing the Header object a
utf-8 encoded string, not a latin-1 encoded.
You are telling the header the encoding. Not asking it to encode.

Uhm, okay. Let's see:

u'"Jörg Nørgens" <joerg@nowhere>'.encode('latin-1')

=> '"J\xc3\xb6rg N\xc3\xb8rgens" <joerg@nowhere>'

So far so good. Now run Header() on it:

=> '=?utf-8?b?IkrDtnJnIE7DuHJnZW5zIiA8am9lcmdAbm93aGVyZT4=?='

Still nothing like <...> in it and my MTA is unhappy again. What am I
missing? Doesn't anyone know how mail clients handle that encoding?
'=?iso-8859-1?q?=22J=F6rg_N=F8rgens=22_=3Cjoerg=40nowhere=3E?='

Is this not correct?

At least roundtripping works:
>>> from email.Header import decode_header
>>> encoded, coding = decode_header(hdr)[0]
>>> encoded, coding
('"J\xf6rg N\xf8rgens said:
>>> encoded.decode(coding)
u'"J\xf6rg N\xf8rgens" <joerg@nowhere>'

And parsing the address works too.

--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,240
Members
46,828
Latest member
LauraCastr

Latest Threads

Top