How can I get text of the body (payload) of an email?

A

andrew blah

Hello,

I need to get the text of the body (the payload) of an email.

As I understand it, an email has headers at the top, then a blank line,
then the body of the message.

I want to get the text of the body - every character from the new line
after the headers until the end of the message.

My objective is to do an SHA hash on the body text so the get_payload
method isn't what I am after.

Can anyone suggest a convenient way to get access to the raw message
payload?

Thanks in advance for your help.

Andrew Stuart
 
P

Paul Rubin

andrew blah said:
Can anyone suggest a convenient way to get access to the raw message
payload?

If you're using the mailbox module, the body text is what you get
from message.fp.read() where message is an rfc822 message object
from reading the mailbox. Is that what you wanted to know?
 
A

andrew blah

I'm puzzled. Josiah suggested that this would allow me to get the
payload of an email message.

body = message.split('\r\n\r\n', 1)[1]

As I understand it, the headers of an email are terminated by a blank
line, after which comes the message payload. A blank line being
represented by \r\n\r\n

After trying Josiah's above suggestion on many emails and failing to
get it to work, I found that in fact the following works:

self.raw_data.split('\n\n', 1)[0]

But this doesn't agree with my understanding of the RFC822 email
format, which is that the blank line should be represented by \r\n\r\n

Can anyone suggest where my understanding is wrong?
Thanks

Andrew Stuart
 
J

Jeffrey Froman

andrew said:
I want to get the text of the body - every character from the new line
after the headers until the end of the message.

My objective is to do an SHA hash on the body text so the get_payload
method isn't what I am after.

Funny, I recently undertook the same task. Here's my solution:

msg = email.message_from_string(foo)
x = sha.new()
for line in email.Iterators.body_line_iterator(msg):
x.update(line)
hash = x.digest()

This very cool iterator returns every body line, but skips all the headers,
including the headers present in each sub-part of the email. If you only
want plain text parts, you might combine this iterator with
email.Iterators.typed_subpart_iterator().

Jeffrey
 
J

Josiah Carlson

I'm puzzled. Josiah suggested that this would allow me to get the
payload of an email message.

body = message.split('\r\n\r\n', 1)[1]

As I understand it, the headers of an email are terminated by a blank
line, after which comes the message payload. A blank line being
represented by \r\n\r\n

After trying Josiah's above suggestion on many emails and failing to
get it to work, I found that in fact the following works:

self.raw_data.split('\n\n', 1)[0]

But this doesn't agree with my understanding of the RFC822 email
format, which is that the blank line should be represented by \r\n\r\n

Can anyone suggest where my understanding is wrong?
Thanks


Your understanding isn't wrong, but somehow you are acquiring emails
with only line feed line endings. This may be the case of opening a
file and getting universal line-ending support (which tosses '\r'). This
could be the case of some other processing you do perhaps stripping it
out (I don't use the email package, so don't know what it may or may not
be doing).

A known method of normalizing line endings for data that could come from
anywhere is through the use of regular expressions:

email = re.sub('(\r\n|\r|\n)', email_with_ambiguous_line_endings, '\r\n')


If you know your data to be good on disk, perhaps it would be better to
open files as 'rb' to make sure that universal line ending support is
not used.

- Josiah
 
M

M.E.Farmer

andrew blah said:
I need to get the text of the body (the payload) of an email.
As I understand it, an email has headers at the top, then a blank line,
then the body of the message.
I want to get the text of the body - every character from the new line
after the headers until the end of the message.

[headers]
[blank line]
[body]

You explained how to do it ;)
I want to get the text of the body - every character from the new line
after the headers until the end of the message.

If you just find the first blank line then the next line is the start
of the email body ;)

import poplib
Mail = poplib.POP3('mail.yourserver.net')
Mail.user('username')
Mail.pass_("userpass")
# just get the first message
MyMessage=Mail.retr(1)
FullText=""
PastHeaders=0
for MsgLine in MyMessage[1]:
if PastHeaders==0:
if (len(MsgLine)==0):
PastHeaders = 1
else:
FullText +=MsgLine+'\n'
Mail.quit()
print FullText

This is from Python 2.1 Bible(Dave Brueck,Stephen Tanner);)
That book is an awesome reference still today!
My objective is to do an SHA hash on the body text so the get_payload
method isn't what I am after.
Can anyone suggest a convenient way to get access to the raw message
payload?
Thanks in advance for your help.
HTH,
M.E.Farmer :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,209
Messages
2,571,088
Members
47,686
Latest member
scamivo

Latest Threads

Top