i18n hell

F

fyleow

I just spent hours trying to figure out why even after I set my SQL
table attributes to UTF-8 only garbage kept adding into the database.
Apparently you need to execute "SET NAMES 'utf8'" before inserting into
the tables.

Does anyone have experience working with other languages using Django
or Turbogears? I just need to be able to retrieve and enter text to
the database from my page without it being mangled. I know these
frameworks employ ORM so you don't need to write SQL and that worries
me because I tried this on Rails and it wouldn't work.

Thanks.
 
S

Serge Orlov

fyleow said:
I just spent hours trying to figure out why even after I set my SQL
table attributes to UTF-8 only garbage kept adding into the database.
Apparently you need to execute "SET NAMES 'utf8'" before inserting into
the tables.

Does anyone have experience working with other languages using Django
or Turbogears? I just need to be able to retrieve and enter text to
the database from my page without it being mangled. I know these
frameworks employ ORM so you don't need to write SQL and that worries
me because I tried this on Rails and it wouldn't work.

Frequently asked question to people who are burning in i18n hell: are
you using unicode strings or byte strings? Unicode string means that
type(your_string) is unicode, it does not mean you keep utf-8 encoded
text in python byte strings.

AFAIK Ruby has only byte strings that have the same set of
issues/problems/traps as Python byte strings.
 
M

Martin Blais

Frequently asked question to people who are burning in i18n hell: are
you using unicode strings or byte strings? Unicode string means that
type(your_string) is unicode, it does not mean you keep utf-8 encoded
text in python byte strings.

I used to live i18n hell, a while ago, until I understood this:
everytime you keep a reference to some kind of string object, ALWAYS
ALWAYS ALWAYS be AWARE of whether it is not encoded (a unicode object)
or an encoding string (a str object), and if so, which encoding it is
in. Then deal with the conversion between the two domains EXPLICITLY
(e.g. encode(), decode()). If you hold onto a str or unicode object
and you don't know which it is, you are inevitably bound to face
unicode hell at some point. You can use a prefix convention if that
makes it easier for you, but the point is that you CANNOT just "wing
it". Python makes it too easy to just "wing it" and that creates a
lot of surprises, especially since some methods hide the conversions,
e.g. str.join.

w.r.t. to DB storage, that depends on the specific database you're
using and the DBAPI module you're using, read up on it, write a few
tests on your corresponding DBAPI (simple tests, easy peasy), know
what kinds of strings you're sending in and reading back. I'm using
PostgreSQL often and my configuration always stores strings in UTF-8
in the database. I have a lightweight mapping module that
disambiguiates and does the encoding/decoding automatically in a
consistent way (that decision belongs in the client code for now,
unfortunately, but is centralized using my table declaration that
lists the desired conversions for each column). See
http://furius.ca/antiorm/ for something simple that works well.

cheers,
 
J

Jarek Zgoda

Martin Blais napisa³(a):
See
http://furius.ca/antiorm/ for something simple that works well.

I'd like to know what is this module/library good for *before* I start
downloading it. "Almost like ORM but not exactly" is rather vague term
and can denote anything. Is it dishwasher? Or microwave oven?

BTW, I don't have any problems without character encodings since I
started using unicode objects internally in my programs. Database is the
same kind of data source, as regular files, sockets or ttys -- you have
to know client encoding before you start receiving data. Then decode it
to unicode and you are fine.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Forum statistics

Threads
474,294
Messages
2,571,511
Members
48,197
Latest member
ปั๊มเฟส|JoyLikeSEO

Latest Threads

Top