Parsing Email Headers

T

T

All I'm looking to do is to download messages from a POP account and
retrieve the sender and subject from their headers. Right now I'm 95%
of the way there, except I can't seem to figure out how to *just* get
the headers. Problem is, certain email clients also include headers
in the message body (i.e. if you're replying to a message), and these
are all picked up as additional senders/subjects. So, I want to avoid
processing anything from the message body.

Here's a sample of what I have:

# For each line in message
for j in M.retr(i+1)[1]:
# Create email message object from returned string
emailMessage = email.message_from_string(j)
# Get fields
fields = emailMessage.keys()
# If email contains "From" field
if emailMessage.has_key("From"):
# Get contents of From field
from_field = emailMessage.__getitem__("From")

I also tried using the following, but got the same results:
emailMessage =
email.Parser.HeaderParser().parsestr(j, headersonly=True)

Any help would be appreciated!
 
M

MRAB

T said:
All I'm looking to do is to download messages from a POP account and
retrieve the sender and subject from their headers. Right now I'm 95%
of the way there, except I can't seem to figure out how to *just* get
the headers. Problem is, certain email clients also include headers
in the message body (i.e. if you're replying to a message), and these
are all picked up as additional senders/subjects. So, I want to avoid
processing anything from the message body.

Here's a sample of what I have:

# For each line in message
for j in M.retr(i+1)[1]:
# Create email message object from returned string
emailMessage = email.message_from_string(j)
# Get fields
fields = emailMessage.keys()
# If email contains "From" field
if emailMessage.has_key("From"):
# Get contents of From field
from_field = emailMessage.__getitem__("From")

I also tried using the following, but got the same results:
emailMessage =
email.Parser.HeaderParser().parsestr(j, headersonly=True)

Any help would be appreciated!

If you're using poplib then use ".top" instead of ".retr".
 
G

Grant Edwards

All I'm looking to do is to download messages from a POP account and
retrieve the sender and subject from their headers. Right now I'm 95%
of the way there, except I can't seem to figure out how to *just* get
the headers.

The headers are saparated from the body by a blank line.
Problem is, certain email clients also include headers in the message
body (i.e. if you're replying to a message), and these are all picked
up as additional senders/subjects. So, I want to avoid processing
anything from the message body.

Then stop when you see a blank line.

Or retreive just the headers.
 
T

T

T said:
All I'm looking to do is to download messages from a POP account and
retrieve the sender and subject from their headers.  Right now I'm 95%
of the way there, except I can't seem to figure out how to *just* get
the headers.  Problem is, certain email clients also include headers
in the message body (i.e. if you're replying to a message), and these
are all picked up as additional senders/subjects.  So, I want to avoid
processing anything from the message body.
Here's a sample of what I have:
                # For each line in message
                for j in M.retr(i+1)[1]:
                    # Create email message object from returned string
                    emailMessage = email.message_from_string(j)
                    # Get fields
                    fields = emailMessage.keys()
                    # If email contains "From" field
                    if emailMessage.has_key("From"):
                        # Get contents of From field
                        from_field = emailMessage.__getitem__("From")
I also tried using the following, but got the same results:
                 emailMessage =
email.Parser.HeaderParser().parsestr(j, headersonly=True)
Any help would be appreciated!

If you're using poplib then use ".top" instead of ".retr".

I'm still having the same issue, even with .top. Am I missing
something?

for j in M.top(i+1, 0)[1]:
emailMessage = email.message_from_string(j)
#emailMessage =
email.Parser.HeaderParser().parsestr(j, headersonly=True)
# Get fields
fields = emailMessage.keys()
# If email contains "From" field
if emailMessage.has_key("From"):
# Get contents of From field
from_field = emailMessage.__getitem__("From")

Is there another way I should be using to retrieve only the headers
(not those in the body)?
 
M

MRAB

T said:
T said:
All I'm looking to do is to download messages from a POP account and
retrieve the sender and subject from their headers. Right now I'm 95%
of the way there, except I can't seem to figure out how to *just* get
the headers. Problem is, certain email clients also include headers
in the message body (i.e. if you're replying to a message), and these
are all picked up as additional senders/subjects. So, I want to avoid
processing anything from the message body.
Here's a sample of what I have:
# For each line in message
for j in M.retr(i+1)[1]:
# Create email message object from returned string
emailMessage = email.message_from_string(j)
# Get fields
fields = emailMessage.keys()
# If email contains "From" field
if emailMessage.has_key("From"):
# Get contents of From field
from_field = emailMessage.__getitem__("From")
I also tried using the following, but got the same results:
emailMessage =
email.Parser.HeaderParser().parsestr(j, headersonly=True)
Any help would be appreciated!
If you're using poplib then use ".top" instead of ".retr".

I'm still having the same issue, even with .top. Am I missing
something?

for j in M.top(i+1, 0)[1]:
emailMessage = email.message_from_string(j)
#emailMessage =
email.Parser.HeaderParser().parsestr(j, headersonly=True)
# Get fields
fields = emailMessage.keys()
# If email contains "From" field
if emailMessage.has_key("From"):
# Get contents of From field
from_field = emailMessage.__getitem__("From")

Is there another way I should be using to retrieve only the headers
(not those in the body)?

The documentation does say:

"""unfortunately, TOP is poorly specified in the RFCs and is
frequently broken in off-brand servers."""

All I can say is that it works for me with my ISP! :)
 
T

T

Thanks for your suggestions! Here's what seems to be working - it's
basically the same thing I originally had, but first checks to see if
the line is blank

response, lines, bytes = M.retr(i+1)
# For each line in message
for line in lines:
if not line.strip():
M.dele(i+1)
break

emailMessage = email.message_from_string(line)
# Get fields
fields = emailMessage.keys()
# If email contains "From" field
if emailMessage.has_key("From"):
# Get contents of From field
from_field = emailMessage.__getitem__("From")
 
T

Thomas Guettler

T said:
Thanks for your suggestions! Here's what seems to be working - it's
basically the same thing I originally had, but first checks to see if
the line is blank

response, lines, bytes = M.retr(i+1)
# For each line in message
for line in lines:
if not line.strip():
M.dele(i+1)
break

emailMessage = email.message_from_string(line)
# Get fields
fields = emailMessage.keys()
# If email contains "From" field
if emailMessage.has_key("From"):
# Get contents of From field
from_field = emailMessage.__getitem__("From")

Hi T,

wait, this code looks strange.

You delete the email if it contains an empty line? I use something like this:

message='\n'.join(connection.retr(msg_num)[1])

Your code:
emailMessage = email.message_from_string(line)
create an email object from only *one* line!

You retrieve the whole message (you don't save bandwith), but maybe that's
what you want.


Thomas
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top