M
Mike
I would like some suggestions and/or references to already-written code
to assist with writing an email filter. I use a web-based email client
(questmail.futurequest.net) and want to write an intelligent filter to
(hopefully) discourage some of the repeat junk mail that I receive (>80
messages/day). I'm fairly new to Perl and am still sifting through the
standard packages. Here's a description of what I want this filter to
do:
1. Anyone on my 'white-list' will have their message allowed through.
If they have included a binary attachment (rare), I can scan it
manually for viruses.
2. I want to have a dynamic 'off-white-list' that will consist of
addresses and subject lines of recently sent email. If I can somehow
read my "Sent" folder, that will essentially be the off-white-list.
3. I want a 'black-list' that not only holds the actual email address
(not the display "From") but also holds the date of the last email sent
from that address and a counter of the number of sends.
The date information in #3 will be used to remove entries from the
black-list once they've cleaned up their act and haven't sent an email
in, say one month. The 'send-count' will be used to progressively send
more obnoxious email kickbacks to the sender. On the first three
occurrences, the kickback will consist of either an email notifying the
sender why their email was rejected or a mockup of the message returned
from email servers for non-existing addresses (I haven't yet decided
which). Beginning with offense number four, N-minus-3 emails will be
sent (N being the number of occurrences stored in the black-list) and
will include a 1Mb junk attachment file. So on the fifth offense, the
sender will receive two identical email kickbacks, each with a 1Mb
attachment, on the sixth they will receive
three, and so on. Even if the actual sender ignores the kickbacks,
eventually someone should notice their email server filling up.
I will need to handle the (hopefully) rare case of the black-list email
entry being a "No Reply" type of address, lest my code get into an
infinite loop. For now, the only thing that I can think to do is
remove such addresses (if I can identify them) from the black-list. If
anyone has any slicker suggestions for this, please let me know.
Incoming email not already on the black-list will be sent there based
on the following criteria:
1. One or more keywords (from a list) in the subject line; this will
help quickly throw out porn.
2. A subject line of the format "RE: {something}" where the {something}
is NOT a subject of any of my sent messages (from the off-white-list).
3. A subject line containing those 'filter-defeaters', words with
zeroes in place of the letter "O", dollar signs in place of the letter
"S", and so on. I'm not sure how I'm going to write
this logic yet, but it sounds like a complex exercise in regular
expressions.
Any feedback would be appreciated. I'm particularly looking for tips
on parsing the incoming (and outgoing) email and/or reusable code from
someone who has done anything remotely similar.
Thanks in advance.
Mike McIntyre
to assist with writing an email filter. I use a web-based email client
(questmail.futurequest.net) and want to write an intelligent filter to
(hopefully) discourage some of the repeat junk mail that I receive (>80
messages/day). I'm fairly new to Perl and am still sifting through the
standard packages. Here's a description of what I want this filter to
do:
1. Anyone on my 'white-list' will have their message allowed through.
If they have included a binary attachment (rare), I can scan it
manually for viruses.
2. I want to have a dynamic 'off-white-list' that will consist of
addresses and subject lines of recently sent email. If I can somehow
read my "Sent" folder, that will essentially be the off-white-list.
3. I want a 'black-list' that not only holds the actual email address
(not the display "From") but also holds the date of the last email sent
from that address and a counter of the number of sends.
The date information in #3 will be used to remove entries from the
black-list once they've cleaned up their act and haven't sent an email
in, say one month. The 'send-count' will be used to progressively send
more obnoxious email kickbacks to the sender. On the first three
occurrences, the kickback will consist of either an email notifying the
sender why their email was rejected or a mockup of the message returned
from email servers for non-existing addresses (I haven't yet decided
which). Beginning with offense number four, N-minus-3 emails will be
sent (N being the number of occurrences stored in the black-list) and
will include a 1Mb junk attachment file. So on the fifth offense, the
sender will receive two identical email kickbacks, each with a 1Mb
attachment, on the sixth they will receive
three, and so on. Even if the actual sender ignores the kickbacks,
eventually someone should notice their email server filling up.
I will need to handle the (hopefully) rare case of the black-list email
entry being a "No Reply" type of address, lest my code get into an
infinite loop. For now, the only thing that I can think to do is
remove such addresses (if I can identify them) from the black-list. If
anyone has any slicker suggestions for this, please let me know.
Incoming email not already on the black-list will be sent there based
on the following criteria:
1. One or more keywords (from a list) in the subject line; this will
help quickly throw out porn.
2. A subject line of the format "RE: {something}" where the {something}
is NOT a subject of any of my sent messages (from the off-white-list).
3. A subject line containing those 'filter-defeaters', words with
zeroes in place of the letter "O", dollar signs in place of the letter
"S", and so on. I'm not sure how I'm going to write
this logic yet, but it sounds like a complex exercise in regular
expressions.
Any feedback would be appreciated. I'm particularly looking for tips
on parsing the incoming (and outgoing) email and/or reusable code from
someone who has done anything remotely similar.
Thanks in advance.
Mike McIntyre