B
Benjamin Han
A while ago I asked if anyone knows a module for parsing Received: headers
in emails. Apparently my guess was wrong (that someone already wrote it in
Python). I got an email pointing me to Spambayes project, however the tokenizer
doesn't seem like doing a lot on the Received headers (especially when
comparing to SpamAssassin's code).
So I wrote a small set of scripts for doing this:
http://www.cs.cmu.edu/~benhdj/Code/receivedDB.v0_1-20040114.tar.gz
It's based on SpamAssassin's Received.pm script, but I separated patterns
from the code. Patterns of all known headers are kept in a text database
file so new entries can be added without touching anything else.
I hope this is useful to someone else too - and of course any patch to
increase the coverage of the database is welcome!
Ben
in emails. Apparently my guess was wrong (that someone already wrote it in
Python). I got an email pointing me to Spambayes project, however the tokenizer
doesn't seem like doing a lot on the Received headers (especially when
comparing to SpamAssassin's code).
So I wrote a small set of scripts for doing this:
http://www.cs.cmu.edu/~benhdj/Code/receivedDB.v0_1-20040114.tar.gz
It's based on SpamAssassin's Received.pm script, but I separated patterns
from the code. Patterns of all known headers are kept in a text database
file so new entries can be added without touching anything else.
I hope this is useful to someone else too - and of course any patch to
increase the coverage of the database is welcome!
Ben