character sets? unicode?

M

Michael

I'm trying to import text from email I've received, run some regular
expressions on it, and save the text into a database. I'm trying to
figure out how to handle the issue of character sets. I've had some
problems with my regular expressions on email that has interesting
character sets. Korean text seems to be filled with a lot of '=3D=21'
type of stuff. This doesn't look like unicode (or am I wrong?) so does
anyone know how I should handle it? Do I need to do anything special
when passing text with non-ascii characters to re, MySQLdb, or any other
libraries? Is it better to save the text as-is in my db and save the
character set type too or should I try to convert all text to some
default format like UTF-8? Any advice? Thanks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top