M
Mrmaster Mrmaster
Hello,
I'm creating an application that will parse mbox files, extract the
data, and put it into a db. I have a couple of problems. For those of
you who are not familiar with mbox files, just think of one text file
that stores all of the emails in text format.
1) mbox files keep updating so how do notify my script that new data has
come in? Do I rerun the script with a placeholder where it last
finished? That would require me to rescan the whole mbox file to find
the placeholder which is pretty bad design.
2) What is the most efficient way to read the emails into memory before
putting it into a db? Since there are multiple emails in each mbox file
will I just read one of the emails, store it into memory, dumb it into
db, then replace the current email in memory with the new one?
Thank you for all of your help.
ps. I realize there are scripts that already do this. I'm creating this
for my own learning experience.
I'm creating an application that will parse mbox files, extract the
data, and put it into a db. I have a couple of problems. For those of
you who are not familiar with mbox files, just think of one text file
that stores all of the emails in text format.
1) mbox files keep updating so how do notify my script that new data has
come in? Do I rerun the script with a placeholder where it last
finished? That would require me to rescan the whole mbox file to find
the placeholder which is pretty bad design.
2) What is the most efficient way to read the emails into memory before
putting it into a db? Since there are multiple emails in each mbox file
will I just read one of the emails, store it into memory, dumb it into
db, then replace the current email in memory with the new one?
Thank you for all of your help.
ps. I realize there are scripts that already do this. I'm creating this
for my own learning experience.