How can I get text of the body (payload) of an email?

andrew blah · Oct 16, 2004

Hello,

I need to get the text of the body (the payload) of an email.

As I understand it, an email has headers at the top, then a blank line,
then the body of the message.

I want to get the text of the body - every character from the new line
after the headers until the end of the message.

My objective is to do an SHA hash on the body text so the get_payload
method isn't what I am after.

Can anyone suggest a convenient way to get access to the raw message
payload?

Thanks in advance for your help.

Andrew Stuart

Josiah Carlson · Oct 16, 2004

Can anyone suggest a convenient way to get access to the raw message
payload?

body = message.split('\r\n\r\n', 1)[1]

- Josiah

Paul Rubin · Oct 16, 2004

andrew blah said:
Can anyone suggest a convenient way to get access to the raw message
payload?

If you're using the mailbox module, the body text is what you get
from message.fp.read() where message is an rfc822 message object
from reading the mailbox. Is that what you wanted to know?

andrew blah · Oct 16, 2004

I'm puzzled. Josiah suggested that this would allow me to get the
payload of an email message.

body = message.split('\r\n\r\n', 1)[1]

As I understand it, the headers of an email are terminated by a blank
line, after which comes the message payload. A blank line being
represented by \r\n\r\n

After trying Josiah's above suggestion on many emails and failing to
get it to work, I found that in fact the following works:

self.raw_data.split('\n\n', 1)[0]

But this doesn't agree with my understanding of the RFC822 email
format, which is that the blank line should be represented by \r\n\r\n

Can anyone suggest where my understanding is wrong?
Thanks

Andrew Stuart

Jeffrey Froman · Oct 16, 2004

andrew said:
I want to get the text of the body - every character from the new line
after the headers until the end of the message.

My objective is to do an SHA hash on the body text so the get_payload
method isn't what I am after.

Funny, I recently undertook the same task. Here's my solution:

msg = email.message_from_string(foo)
x = sha.new()
for line in email.Iterators.body_line_iterator(msg):
x.update(line)
hash = x.digest()

This very cool iterator returns every body line, but skips all the headers,
including the headers present in each sub-part of the email. If you only
want plain text parts, you might combine this iterator with
email.Iterators.typed_subpart_iterator().

Jeffrey

Josiah Carlson · Oct 16, 2004

I'm puzzled. Josiah suggested that this would allow me to get the

payload of an email message.

body = message.split('\r\n\r\n', 1)[1]

As I understand it, the headers of an email are terminated by a blank
line, after which comes the message payload. A blank line being
represented by \r\n\r\n

After trying Josiah's above suggestion on many emails and failing to
get it to work, I found that in fact the following works:

self.raw_data.split('\n\n', 1)[0]

But this doesn't agree with my understanding of the RFC822 email
format, which is that the blank line should be represented by \r\n\r\n

Can anyone suggest where my understanding is wrong?
Thanks

Your understanding isn't wrong, but somehow you are acquiring emails
with only line feed line endings. This may be the case of opening a
file and getting universal line-ending support (which tosses '\r'). This
could be the case of some other processing you do perhaps stripping it
out (I don't use the email package, so don't know what it may or may not
be doing).

A known method of normalizing line endings for data that could come from
anywhere is through the use of regular expressions:

email = re.sub('(\r\n|\r|\n)', email_with_ambiguous_line_endings, '\r\n')

If you know your data to be good on disk, perhaps it would be better to
open files as 'rb' to make sure that universal line ending support is
not used.

- Josiah

M.E.Farmer · Oct 16, 2004

andrew blah said:
I need to get the text of the body (the payload) of an email.
As I understand it, an email has headers at the top, then a blank line,
then the body of the message.
I want to get the text of the body - every character from the new line
after the headers until the end of the message.

[headers]
[blank line]
[body]

You explained how to do it

I want to get the text of the body - every character from the new line
after the headers until the end of the message.

If you just find the first blank line then the next line is the start
of the email body

import poplib
Mail = poplib.POP3('mail.yourserver.net')
Mail.user('username')
Mail.pass_("userpass")
# just get the first message
MyMessage=Mail.retr(1)
FullText=""
PastHeaders=0
for MsgLine in MyMessage[1]:
if PastHeaders==0:
if (len(MsgLine)==0):
PastHeaders = 1
else:
FullText +=MsgLine+'\n'
Mail.quit()
print FullText

This is from Python 2.1 Bible(Dave Brueck,Stephen Tanner)

That book is an awesome reference still today!

My objective is to do an SHA hash on the body text so the get_payload
method isn't what I am after.
Can anyone suggest a convenient way to get access to the raw message
payload?
Thanks in advance for your help.

HTH,
M.E.Farmer

I'm about to get in trouble with the HTML <body></body> tags	10	Aug 12, 2023
Script to send email not working	1	Apr 10, 2023
Php modal form to email	1	Aug 28, 2024
How can I get my menu inside of a menu to function properly?	1	Jan 19, 2023
How can I remove the extra space marked in the image attached to my Email HTML template?	2	Feb 25, 2023
How can I arrange a series of radio buttons?	2	Jan 25, 2024
How Do I Set text on an Image and use the image as a border?	7	Mar 16, 2023
email 8bit encoding	6	Jul 29, 2013

How can I get text of the body (payload) of an email?

andrew blah

Josiah Carlson

Paul Rubin

andrew blah

Jeffrey Froman

Josiah Carlson

M.E.Farmer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads