MySql+UTF8 woes

R

Ronald Fischer

I am experiencing a strange problem when retrieving Unicode values from
my MySql5 database on Windows. Here is what I have:

* All programs/modules in my application require in the beginning a file =
containing

$KCODE=3D'u'
require 'jcode'

in the beginning, ensuring that "everything is done in Unicode".

* I checked that the MySql database is "as UTF-8 as possible", in
particular
- in the table editor of the MySql Administrator, the character set
for the rows is listed as utf8 (with sorting order utf8_general_ci)
- The "start variables" tab of the Administrator says utf8 for the =
default
character set
- mysql.ini says=20
default-character-set=3Dutf8
- On the mysql command line, I see

mysql> SHOW VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | utf8 |=20
| character_set_connection | utf8 |=20
| character_set_database | utf8 |=20
| character_set_filesystem | binary |=20
| character_set_results | utf8 |=20
| character_set_server | utf8 |=20
| character_set_system | utf8 |=20
+--------------------------+--------+
=20

Indeed, the following works well:

When I store (from Ruby, via 'mysql' gem) a UTF-8 string
into the database, and later retrieve it, I get back exactly
what I had put in. This works equally well, whether I type
(using my UTF-8 enabled text editor) the character (for=20
example, a German umlaut) directly into the Ruby string,=20
which I then store, i.e. (table X having only one column)

sqlstatement=3D"INSERT INTO X VALUES ('=F6')"
dbhandle.query(sqlstatement)

or whether I write the code point in hexadecimal:
=20
uml_o=3D"\xc3\xb6" # =F6, i.e. German umlaut-o in UTF-8 encoding
sqlstatement=3D"INSERT INTO X VALUES ('#{uml_o}')"
dbhandle.query(sqlstatement)

In both cases, when I retrieve the rows with=20
dbhandle.query("SELECT * FROM X")
I eventually get back the row with my "=F6" as a UTF-8=20
2-Byte-sequence C3 B6.

So far OK. But there are several things which do not work:

(1) When I look at the database table using the SQL Administrator,
I do not see the letter '=F6', but instead the characters =
corresponding
to the Windows ANSI code sequence for C3 and B6 characters.

(2) When I manually enter data into the database using MySql
Administrator, by submitting the SQL statement
INSERT INTO X VALUES ('=F6')
the =F6 is displayed fine when looking at the table with a SELECT
using the Administrator, but when I read the table from Ruby, I do
not get back a 2 byte UTF-8 sequence for the umlaut, but only one=20
byte with hex code F6. Note that this F6 corresponds to the ANSI=20
code for =F6 in Windows.

(3) When I put the unicode data into a file, and use a Java application
to read the file and write the data into the database (using JDBC), =
and=20
I look at the database then with MySql Administrator, I can see the=20
German characters; and when I then read them back with my Ruby=20
application, I get it encoded in Windows ANSI, not Unicode.

Observations (1) and (2) could be explained in the way that maybe the=20
MySql administrator interface was not written with UTF-8 in minde. What
surprises me more is that when writing data using Java (which is =
inherently
Unicode aware) and reading it back with Ruby, causes somewhere a =
conversion
from UTF-8 into Windows ANSI.=20

Does someone have experienced a similar problem and can point out a =
possible
reason for it?


Ronald
--=20
Ronald Fischer <[email protected]>
Phone: +49-89-452133-162
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,815
Latest member
treekmostly22

Latest Threads

Top