I recently wrote and deployed a usenet 'bot which identifies
multiposted messages.
Per several helpful suggestions (thanks), the bot has been considerably
refined. Several people who had originally expressed reservations have
given positive feedback on the changes.
The big problem with the original bot (as I now realize) was the long
and heavy message. That issue was fixed a couple of days ago, and the
message text has now been further refined per an additional sugestion.
As several folks also suggested, the message cross-references now
include groupnames (indented for clarity).
The bot now ignores messages which contain e-mail addresses and certain
URIs. I believe this should be very effective in preventing the bot
from flagging spam (without admitting many false-negatives). Keyword
filtering (inclusive) was suggested, but I believe the e-mail/URI
approach would be more robust, based on a bit of research into old
multiposts.
The "References" headers have been tweaked so the I-R-T is always the
last item listed. Some readers weren't properly threading the bot's
reply; there was some speculation that re-ordering the References in
this manner might help (verification appreciated).
It was also suggested that the bot not reply to messages which already
have a reply. I'm looking into that - it would require some significant
changes to program logic; it's not a quick-n-easy thing to do (as were
these other things). And I'm not 100% convinced it's even a good idea
(multiposts with other replies are often flagged manually, right?).
I have not had a chance to look into ignoring control.cancel items yet.
But I've observed that cancels on some servers don't ever show up at
(or are not honored by) my provider (GigaNews), so even that would not
be guaranteed effective for all servers, given the oddities of Usenet
(however, I believe that most spam-related (non-)cancels would be
ignored by the e-mail/URI filtering anyway).
You may see an example of the current behavior of the bot at:
http://tinyurl.com/oll3u
or <or see recent "Lorem Ipsum" postings in alt.test.test
To tell you the truth, if I knew then what I know now, I don't think I
would have ever written this bot in the first place. But I *have*
written it, and it's getting pretty well refined (thanks to many
suggestions), and it seems to have settled into something that is
favorable (or at least not patenly objectionable) to many folks. John
Bokma has suggested some sort of vote, and I like that idea (though I'm
still not sure how to conduct it), but before attempting anything like
that, I'd like to let the bot run for 30 days or so to prove it out (so
folks have a more informed idea of exactly what they're voting
for/against). And September is just around the corner, so there should
be some good test cases popping up soon...
Further input, of course, is always welcomed and appreciated.