[Claire McLister]
I've made the script available on our downloads page at:
http://www.zeesource.net/downloads/e2i
[Alan Kennedy]
[Claire McLister]
Me too. Please let me know how we should modify the script.
Having examined your script, I'm not entirely sure what your input
source is, so I'm assuming it's an mbox file of the archives from
python-list, e.g. as appears on this page
http://mail.python.org/pipermail/python-list/
or at this URL
http://mail.python.org/pipermail/python-list/2005-November.txt
Those messages are the email versions, so all of the NNTP headers, e.g.
NNTP-Posting-Host, will have been dropped. You will need these in order
to get the geographic location of posts that have been made through NNTP.
In order to be able to get those headers, you need somehow to get the
NNTP originals of messages that originated on UseNet. You can see an
example of the format, i.e. your message to which I am replying, at this URL
http://groups.google.com/group/comp.lang.python/msg/56e3baabcd4498f2?dmode=source
The NNTP-Posting-Host for that message is '194.109.207.14', which
reverses to 'bag.python.org', which is presumably the machine that
gatewayed the message from python-list onto comp.lang.python.
So there are a couple of different approaches
1. Get an archive of the UseNet postings to comp.lang.python (anybody
know where?)
A: messages sent through email will have the NNTP-Posting-Host as
a machine at python.org, so fall back to your original algorithm for
those messages
B: messages sent through UseNet, or a web gateway to same, will have an
NNTP-Posting-Host elsewhere than python.org, so do your geo-lookup
on that IP address.
2. Get the python-list archive
A: Figure out which messages came through the python.org NNTP gateway
(not sure offhand if this is possible). Automate a query to Google
groups to find the NNTP-Posting-Host (using a URL like the one
above). Requires being able to map the python-list message-id to the
google groups message-id. Do your geo-lookup on that
NNTP-Posting-Host value
B: Use your original algorithm for messages sent through email.
2A message-id lookup should be achievable through the advanced google
groups search, at this URL
http://groups.google.com/advanced_search?q=&
See the "Lookup the message with message ID" at the bottom.
Sorry I don't have time to supply code for any of this. Perhaps some one
can add more details, or better still some code?