Working with email and mailbox module

N

Nirnimesh

I want to extract emails from an mbox-type file which contains a number
of individual emails.

I tried the python mailbox and email modules individually, but I'm
unable to combine them to get what I want. Mailbox allows me to iterate
over all the mails but doesn't give me access the individual messages
of a multipart mail. The email.Message module provides this, but I'm
unable to iterate through all the messages with this module.

Here's what I want:

Get a list of all messages from mbox-file
For each message, be able to read the header or body individually (so
that I can apply some operation)

Does someone have experience in doing something of this sort?
 
S

Steve Holden

Nirnimesh said:
I want to extract emails from an mbox-type file which contains a number
of individual emails.

I tried the python mailbox and email modules individually, but I'm
unable to combine them to get what I want. Mailbox allows me to iterate
over all the mails but doesn't give me access the individual messages
of a multipart mail. The email.Message module provides this, but I'm
unable to iterate through all the messages with this module.

Here's what I want:

Get a list of all messages from mbox-file
For each message, be able to read the header or body individually (so
that I can apply some operation)

Does someone have experience in doing something of this sort?

When you create your mailbox you need to provide a factory function,
otherwise you get rfc822.Message objects.

It's not obvious to me what that facory should be: I'm guessing you
could get away with something like

mymailbox = mailbox.UnixMailbox(fp, email.parser.Parser().parse)

but I am far from convinced that will work, and have no time for testing
right now.

regards
Steve
 
R

Rob Williscroft

Nirnimesh wrote in @d34g2000cwd.googlegroups.com in comp.lang.python:
I want to extract emails from an mbox-type file which contains a number
of individual emails.

I tried the python mailbox and email modules individually, but I'm
unable to combine them to get what I want. Mailbox allows me to iterate
over all the mails but doesn't give me access the individual messages
of a multipart mail. The email.Message module provides this, but I'm
unable to iterate through all the messages with this module.

Here's what I want:

Get a list of all messages from mbox-file
For each message, be able to read the header or body individually (so
that I can apply some operation)

Does someone have experience in doing something of this sort?

Not really, but this is what I came up with the other day to read
one of my newsreaders mbx files:

MBX = r"<<<-insert-path-to-your-mbx->>>"

import mailbox, email

fmbx = open( MBX, 'rb' )
mbx = mailbox.PortableUnixMailbox( fmbx, email.message_from_file )

for i, msg in enumerate( mbx ):
print msg.__class__
for i in msg.keys(): # gets header names
print i
break

fmbx.close()


http://docs.python.org/lib/module-email.Message.html



Rob.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,981
Messages
2,570,188
Members
46,731
Latest member
MarcyGipso

Latest Threads

Top