email bug?

S

Stuart D. Gathman

Running the following with Python 2.2.2:

from email.Parser import Parser

txt = """Subject: IE is Evil
Content-Type: image/pjpeg; name="Jim&&Jill"

<html>
</html>
"""

msg = email.message_from_string(txt)
print msg.get_params()

I get:
[('image/pjpeg', ''), ('name', '"Jim&amp'), ('&amp', ''), ('Jill"', '')]

What IE apparently gets is:

[('image/pjpeg', ''), ('name', '"Jim&amp;&amp;Jill"')]

Is this a bug (in the email package, I mean - obviously IE is buggy)?

Do I have to write my own custom param parsing routines to handle this?
 
A

Andrew Dalke

Stuart D. Gathman:
Content-Type: image/pjpeg; name="Jim&amp;&amp;Jill"
What IE apparently gets is:

[('image/pjpeg', ''), ('name', '"Jim&amp;&amp;Jill"')]

Is this a bug (in the email package, I mean - obviously IE is buggy)?

Do I have to write my own custom param parsing routines to handle this?

BTW, I verified this in 2.3.

Looks like the Content-Type syntax is defined in
http://www.faqs.org/rfcs/rfc2045.html
5.1. Syntax of the Content-Type Header Field

content := "Content-Type" ":" type "/" subtype
*(";" parameter)

parameter := attribute "=" value

value := token / quoted-string

token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
or tspecials>

tspecials := "(" / ")" / "<" / ">" / "@" /
"," / ";" / ":" / "\" / <">
"/" / "[" / "]" / "?" / "="
; Must be in quoted-string,
; to use within parameter values

So the ";" must be in a quoted string. That's defined in
RFC 822, http://www.faqs.org/rfcs/rfc822.html
(now obsolete)

quoted-string = <"> *(qtext/quoted-pair) <">

qtext = <any CHAR excepting <">, ; => may be folded
"\" & CR, and including
linear-white-space>

CHAR = <any ASCII character>

The ';' is in CHAR and is not "\" nor CR so it's in qtext,
so it's part of quoted-string, so it's allowed in a value
without extra interpretation.

I looks like 2822 (the updated version of 822) a
http://www.faqs.org/rfcs/rfc2822.html agrees.

So I think it's a bug in the email module's parser.

The actual bug is in email/Parser.py with

# Regular expression used to split header parameters. BAW: this may be too
# simple. It isn't strictly RFC 2045 (section 5.1) compliant, but it
catches
# most headers found in the wild. We may eventually need a full fledged
# parser eventually.
paramre = re.compile(r'\s*;\s*')

A quick scan of the code suggests that it isn't a quick fix (eg,
not just a matter of tweaking that regexp.

Could you file a bug report against it?

Andrew
(e-mail address removed)
 
S

Stuart D. Gathman

A quick scan of the code suggests that it isn't a quick fix (eg, not
just a matter of tweaking that regexp.

Here is my quick (and probably incorrect) fix:

from email.Message import Message
from email.Utils import unquote

# helper to split params while ignoring ';' inside quotes
def _parseparam(str):
plist = []
while str[:1] == ';':
str = str[1:]
end = str.find(';')
while end > 0 and (str.count('"',0,end) & 1):
end = str.find(';',end + 1)
if end < 0: end = len(str)
f = str[:end]
if '=' in f:
i = f.index('=')
f = f[:i].strip().lower() + \
'=' + f[i+1:].strip()
plist.append(f.strip())
str = str[end:]
return plist

class MimeMessage(Message):

def getparam(self,name,header='content-type'):
for key,val in self.getparams(header):
if key == name: return unquote(val)
return None

# like get_params but obey quotes
def getparams(self,header='content-type'):
"Return all parameter names and values. Use parser that handles quotes."
val = self.get(header)
result = []
if val:
plist = _parseparam(';' + val)
for p in plist:
i = p.find('=')
if i >= 0: result.append((p[:i].lower(),unquote(p[i+1:])))
return result
 
A

Andrew Dalke

Stuart D. Gathman:
Here is my quick (and probably incorrect) fix:

There's a test suite in email.tests. It includes tests for
getparams, and I see some commented out code which
lists a test known to fail.

You could use that to check the validity of your code.

Andrew
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,094
Messages
2,570,615
Members
47,231
Latest member
Satyam

Latest Threads

Top