A
Andrew Stuart
Hello all,
I am releasing a utility called catchmail to open source under a BSD style
license.
Here is the catchmail homepage:
http://www.users.bigpond.net.au/mysite/catchmail.htm
Catchmail is a Python utility that writes emails into a Postgres database.
Catchmail's SQL schema is based on an extended version of the Yukatan data
model (a SQL schema for relational storage of email RFC822 messages).
catchmail needs real world testing and feedback however before it can
progress beyond beta release.
It's not quite ready for release however - it needs more people to try to
use it and check it out before full scale public release.
If anyone has the time or the inclination I would value a code review and
advice being given as to how to do things differently or better.
I'm no great Python programmer so any volunteers who might be interested in
helping to enhance and help support catchmail would be much appreciated. I
have set up a newsgroup at http://groups-beta.google.com/catchmail
There is also a final known problem that I would value advice on.
Everything seems to be working fine except one thing - unicode
If I create the database using this command, everything seems to run fine -
I can import 4000 emails if I create the Postgres data with this command:
createdb -U postgres catchmail;
If I create the Postgres database using this command, postgres starts to
come back with unicode errors when I do the import
createdb --encoding=UNICODE -U postgres test
The import process starts to fail on lots of messages with this error:
Database error: ERROR: invalid byte sequence for encoding "UNICODE":
0xe92062
The objective is to have the database in Unicode so I suppose its quite an
important problem to resolve. It looks to me like some sort of
encoding/decoding requirement but although I had a good look I couldn't sort
it out.
I'm afraid I don't much understand how unicode is meant to be used in this
sort of application - if you can throw any light on it for me it would be
appreciated.
How SHOULD unicode be implemented for a utility such as this? I'd like
catchmail to be as flexible as possible and to lose as little data as
possible through things like character set conversions.
I found some references to client encoding and multibyte in the postgres
docs here - but maybe it should be fixed in the Python code?
SET CLIENT_ENCODING TO 'value';
http://jamesthornton.com/postgres/7.3/postgres/multibyte.html
http://www.postgresql.org/docs/7.4/static/multibyte.html#MULTIBYTE-TRANSLATION-TABLE
The latest version of catchmail is the one found on the website at
http://www.users.bigpond.net.au/mysite/catchmail.htm
Any feedback on catchmail or your experience with catchmail valued.
Thanks to the great work of Mark Hammond and Jukka Zitting!
Andrew Stuart
a n d r e w . s t u a r t @ x s e . c o m . a u
I am releasing a utility called catchmail to open source under a BSD style
license.
Here is the catchmail homepage:
http://www.users.bigpond.net.au/mysite/catchmail.htm
Catchmail is a Python utility that writes emails into a Postgres database.
Catchmail's SQL schema is based on an extended version of the Yukatan data
model (a SQL schema for relational storage of email RFC822 messages).
catchmail needs real world testing and feedback however before it can
progress beyond beta release.
It's not quite ready for release however - it needs more people to try to
use it and check it out before full scale public release.
If anyone has the time or the inclination I would value a code review and
advice being given as to how to do things differently or better.
I'm no great Python programmer so any volunteers who might be interested in
helping to enhance and help support catchmail would be much appreciated. I
have set up a newsgroup at http://groups-beta.google.com/catchmail
There is also a final known problem that I would value advice on.
Everything seems to be working fine except one thing - unicode
If I create the database using this command, everything seems to run fine -
I can import 4000 emails if I create the Postgres data with this command:
createdb -U postgres catchmail;
If I create the Postgres database using this command, postgres starts to
come back with unicode errors when I do the import
createdb --encoding=UNICODE -U postgres test
The import process starts to fail on lots of messages with this error:
Database error: ERROR: invalid byte sequence for encoding "UNICODE":
0xe92062
The objective is to have the database in Unicode so I suppose its quite an
important problem to resolve. It looks to me like some sort of
encoding/decoding requirement but although I had a good look I couldn't sort
it out.
I'm afraid I don't much understand how unicode is meant to be used in this
sort of application - if you can throw any light on it for me it would be
appreciated.
How SHOULD unicode be implemented for a utility such as this? I'd like
catchmail to be as flexible as possible and to lose as little data as
possible through things like character set conversions.
I found some references to client encoding and multibyte in the postgres
docs here - but maybe it should be fixed in the Python code?
SET CLIENT_ENCODING TO 'value';
http://jamesthornton.com/postgres/7.3/postgres/multibyte.html
http://www.postgresql.org/docs/7.4/static/multibyte.html#MULTIBYTE-TRANSLATION-TABLE
The latest version of catchmail is the one found on the website at
http://www.users.bigpond.net.au/mysite/catchmail.htm
Any feedback on catchmail or your experience with catchmail valued.
Thanks to the great work of Mark Hammond and Jukka Zitting!
Andrew Stuart
a n d r e w . s t u a r t @ x s e . c o m . a u