R
Robin Becker
I am seeing different outcomes from simple requests against a common database
when run from a freebsd machine and a win32 box.
The test script is
#######################
import MySQLdb, sys
print sys.version
print MySQLdb.__version__
db=MySQLdb.connect(host='appx',db='sc_0',user='user',passwd='secret',use_unicode=True)
cur=db.cursor()
cur.execute('select * from sc_accomodation where id=31')
data=cur.fetchall()
for i,t in enumerate(data[0]):
if isinstance(t,(str,unicode)): print i,repr(t)
#######################
The table in question is charset='latin1', however the original owners put some
special windows characters in eg 0x92 (a quote).
in the windows version I see this kind of string in the output
2.4.4 (#71, Oct 18 2006, 08:34:43) [MSC v.1310 32 bit (Intel)]
1.2.1_p2
..........
14 u'Built entirely of mahogany, Acajou seeks to introduce a new concept of
living in the midst of nature on the C\xf4te d\x92Or beach which stretches
along the island\x92s northern coast.\r\n\r\nThe hotel\x92s 24 standard
and 4 superior ro........
the freeBSD machine produces
2.4.3 (#2, Sep 7 2006, 09:34:29)
[GCC 3.4.4 [FreeBSD] 20050518]
............
14 u'Built entirely of mahogany, Acajou seeks to introduce a new concept of
living in the midst of nature on the C\xf4te d\u2019Or beach which stretches
along the island\u2019s northern coast.\r\n\r\nThe hotel\u2019s 24 standard
and 4 superior rooms.......
so the windows version seems to leave the \x92 as is and the freebsd version
converts it to its correct value.
This is already bad enough as I expected the outcomes to be the same, but given
that the encoding of the database is wrong I expected some problems.
However, if I don't have use_unicode=True in the above script I get back
strings, but this time the difference is larger.
windows
2.4.4 (#71, Oct 18 2006, 08:34:43) [MSC v.1310 32 bit (Intel)]
1.2.1_p2
........
2 "C\xf4te d'Or\r\nPraslin"
........
unix
2.4.3 (#2, Sep 7 2006, 09:34:29)
[GCC 3.4.4 [FreeBSD] 20050518]
1.2.1_p2
.......
2 "C\xc3\xb4te d'Or\r\nPraslin"
........
so here the returned string appears to have been automatically converted to utf8.
My questions are
1) why the difference in the unicode version?
2) why does the unix version convert to utf8?
The database being common it seems it's either the underlying libraries or the
compiled extension or python that causes these differences, but which?
when run from a freebsd machine and a win32 box.
The test script is
#######################
import MySQLdb, sys
print sys.version
print MySQLdb.__version__
db=MySQLdb.connect(host='appx',db='sc_0',user='user',passwd='secret',use_unicode=True)
cur=db.cursor()
cur.execute('select * from sc_accomodation where id=31')
data=cur.fetchall()
for i,t in enumerate(data[0]):
if isinstance(t,(str,unicode)): print i,repr(t)
#######################
The table in question is charset='latin1', however the original owners put some
special windows characters in eg 0x92 (a quote).
in the windows version I see this kind of string in the output
2.4.4 (#71, Oct 18 2006, 08:34:43) [MSC v.1310 32 bit (Intel)]
1.2.1_p2
..........
14 u'Built entirely of mahogany, Acajou seeks to introduce a new concept of
living in the midst of nature on the C\xf4te d\x92Or beach which stretches
along the island\x92s northern coast.\r\n\r\nThe hotel\x92s 24 standard
and 4 superior ro........
the freeBSD machine produces
2.4.3 (#2, Sep 7 2006, 09:34:29)
[GCC 3.4.4 [FreeBSD] 20050518]
............
14 u'Built entirely of mahogany, Acajou seeks to introduce a new concept of
living in the midst of nature on the C\xf4te d\u2019Or beach which stretches
along the island\u2019s northern coast.\r\n\r\nThe hotel\u2019s 24 standard
and 4 superior rooms.......
so the windows version seems to leave the \x92 as is and the freebsd version
converts it to its correct value.
This is already bad enough as I expected the outcomes to be the same, but given
that the encoding of the database is wrong I expected some problems.
However, if I don't have use_unicode=True in the above script I get back
strings, but this time the difference is larger.
windows
2.4.4 (#71, Oct 18 2006, 08:34:43) [MSC v.1310 32 bit (Intel)]
1.2.1_p2
........
2 "C\xf4te d'Or\r\nPraslin"
........
unix
2.4.3 (#2, Sep 7 2006, 09:34:29)
[GCC 3.4.4 [FreeBSD] 20050518]
1.2.1_p2
.......
2 "C\xc3\xb4te d'Or\r\nPraslin"
........
so here the returned string appears to have been automatically converted to utf8.
My questions are
1) why the difference in the unicode version?
2) why does the unix version convert to utf8?
The database being common it seems it's either the underlying libraries or the
compiled extension or python that causes these differences, but which?