mySQLdb versus platform problem

Robin Becker · Mar 16, 2007

I am seeing different outcomes from simple requests against a common database
when run from a freebsd machine and a win32 box.

The test script is
#######################
import MySQLdb, sys
print sys.version
print MySQLdb.__version__
db=MySQLdb.connect(host='appx',db='sc_0',user='user',passwd='secret',use_unicode=True)
cur=db.cursor()
cur.execute('select * from sc_accomodation where id=31')
data=cur.fetchall()

for i,t in enumerate(data[0]):
if isinstance(t,(str,unicode)): print i,repr(t)
#######################

The table in question is charset='latin1', however the original owners put some
special windows characters in eg 0x92 (a quote).

in the windows version I see this kind of string in the output

2.4.4 (#71, Oct 18 2006, 08:34:43) [MSC v.1310 32 bit (Intel)]
1.2.1_p2
..........
14 u'Built entirely of mahogany, Acajou seeks to introduce a new concept of
living in the midst of nature on the C\xf4te d\x92Or beach which stretches
along the island\x92s northern coast.\r\n\r\nThe hotel\x92s 24 standard
and 4 superior ro........

the freeBSD machine produces

2.4.3 (#2, Sep 7 2006, 09:34:29)
[GCC 3.4.4 [FreeBSD] 20050518]
............
14 u'Built entirely of mahogany, Acajou seeks to introduce a new concept of
living in the midst of nature on the C\xf4te d\u2019Or beach which stretches
along the island\u2019s northern coast.\r\n\r\nThe hotel\u2019s 24 standard
and 4 superior rooms.......

so the windows version seems to leave the \x92 as is and the freebsd version
converts it to its correct value.

This is already bad enough as I expected the outcomes to be the same, but given
that the encoding of the database is wrong I expected some problems.

However, if I don't have use_unicode=True in the above script I get back
strings, but this time the difference is larger.

windows
2.4.4 (#71, Oct 18 2006, 08:34:43) [MSC v.1310 32 bit (Intel)]
1.2.1_p2
........
2 "C\xf4te d'Or\r\nPraslin"
........

unix

2.4.3 (#2, Sep 7 2006, 09:34:29)
[GCC 3.4.4 [FreeBSD] 20050518]
1.2.1_p2
.......
2 "C\xc3\xb4te d'Or\r\nPraslin"
........

so here the returned string appears to have been automatically converted to utf8.

My questions are

1) why the difference in the unicode version?
2) why does the unix version convert to utf8?

The database being common it seems it's either the underlying libraries or the
compiled extension or python that causes these differences, but which?

John Nagle · Mar 16, 2007

Try:

db=MySQLdb.connect(host='appx',db='sc_0',user='user',passwd='secret',
use_unicode=True, charset = "utf8")

The distinction is that "use_unicode" tells Python to convert to Unicode,
but Python doesn't know the MySQL table type. 'charset="utf8"' tells
MySQL to do the conversion to UTF8, which can be reliably converted
to Unicode.

John Nagle

Robin Becker · Mar 16, 2007

John said:
Try:

db=MySQLdb.connect(host='appx',db='sc_0',user='user',passwd='secret',
use_unicode=True, charset = "utf8")

The distinction is that "use_unicode" tells Python to convert to Unicode,
but Python doesn't know the MySQL table type. 'charset="utf8"' tells
MySQL to do the conversion to UTF8, which can be reliably converted
to Unicode.

John Nagle

.......

OK that seems to help. However, my database has tables with different
encodings. Does MySQLdb ignore the table encoding? That would be a bit lame.

Also it still doesn't explain the different behaviours between unix &
win32 (or perhaps different defaults are somehow magically decided upon).
-things were so much easier when bytes were bytes-ly yrs-
Robin Becker

John Nagle · Mar 17, 2007

Robin said:
OK that seems to help. However, my database has tables with different
encodings. Does MySQLdb ignore the table encoding? That would be a bit
lame.

MySQLdb, the client, doesn't know the table encoding. The
server end does.

Also it still doesn't explain the different behaviours between unix &
win32 (or perhaps different defaults are somehow magically decided upon).

The default encoding is an environment thing. It comes, somehow, from
the locale your system thinks it is in.

-things were so much easier when bytes were bytes-ly yrs-
Robin Becker

So convert the database to Unicode/UTF-8 and have everything
be consistent. MySQL 5 can do that dynamically with an ALTER
TABLE statement.

John Nagle

mySQLdb versus platform problem

Robin Becker

John Nagle

Robin Becker

John Nagle

Members online

Forum statistics

Latest Threads