Inserting Unicode text with MySQLdb in Python 2.4-2.5?

K

Keith Hughitt

Hi all,

I ran into a problem recently when trying to add support for earlier
versions of Python (2.4 and 2.5) to some database related code which
uses MySQLdb, and was wondering if anyone has any suggestions.

With later versions of Python (2.6), inserting Unicode is very simple,
e.g.:

# -*- coding: utf-8 -*-
...
cursor.execute('''INSERT INTO `table` VALUES (0,
'Ångström'),...''')

When the same code is run on earlier versions, however, the results is
either garbled text (e.g. "Ã or "?" instead of "Å" in Python 2.5), or
an exception being thrown (Python 2.4):

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in
position 60: ordinal not in range(128)

So far I've tried a number of different things, including:

1. Using Unicode strings (e.g. u"\u212B")

2. Manually specifying the encoding using sys.setdefaultencoding
('utf-8')

3. Manually enabling Unicode support in MySQLdb
(use_unicode=False, charset = "utf8")

....but no combination of any of the above resulted in proper database
content.

To be certain that the issue was related to Python/MySQLdb and not
MySQL itself, I manually inserted the text and it worked just fine.
Furthermore, when working in a Python console, both print "Å" and
print u"\u212B" display the correct output.

Any ideas? The versions of the MySQLdb adapter tested were 1.2.1
(Python 2.4), and 1.2.2-10 (Python 2.5).

Thanks!
Keith
 
D

Diez B. Roggisch

Keith said:
Hi all,

I ran into a problem recently when trying to add support for earlier
versions of Python (2.4 and 2.5) to some database related code which
uses MySQLdb, and was wondering if anyone has any suggestions.

With later versions of Python (2.6), inserting Unicode is very simple,
e.g.:

# -*- coding: utf-8 -*-
...
cursor.execute('''INSERT INTO `table` VALUES (0,
'Ångström'),...''')

You are aware that the coding-declaration only affects unicode-literals (the
ones like u"i'm unicode")? So the above insert-statement is *not* unicode,
it's a byte-string in whatever encoding your editor happens to save the
file in.

And that's point two: make sure your editor reads and writes the file in the
same encoding you specified in the comment in the beginning.
When the same code is run on earlier versions, however, the results is
either garbled text (e.g. "Ã or "?" instead of "Å" in Python 2.5), or
an exception being thrown (Python 2.4):

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in
position 60: ordinal not in range(128)

Makes sense if the execute tries to encode to unicode first - as you didn't
give it a unicode-object.
So far I've tried a number of different things, including:

1. Using Unicode strings (e.g. u"\u212B")

2. Manually specifying the encoding using sys.setdefaultencoding
('utf-8')

3. Manually enabling Unicode support in MySQLdb
(use_unicode=False, charset = "utf8")

You *disabled* unicode here!!!!! Unicode is NOT utf-8!!!

http://www.joelonsoftware.com/articles/Unicode.html

...but no combination of any of the above resulted in proper database
content.

To be certain that the issue was related to Python/MySQLdb and not
MySQL itself, I manually inserted the text and it worked just fine.
Furthermore, when working in a Python console, both print "Ã…" and
print u"\u212B" display the correct output.

Any ideas? The versions of the MySQLdb adapter tested were 1.2.1
(Python 2.4), and 1.2.2-10 (Python 2.5).

Try the above, and better yet provide self-contained examples that show the
behavior.

Diez
 
K

Keith Hughitt

Hello,

Thanks for the suggestions and information Diez!

You are aware that the coding-declaration only affects unicode-literals (the
ones like u"i'm unicode")? So the above insert-statement is *not* unicode,
it's a byte-string in whatever encoding your editor happens to save the
file in.

Thanks. I didn't know that, but that is helpful.
And that's point two: make sure your editor reads and writes the file in the
same encoding you specified in the comment in the beginning.

That is taken care of: the files are opened/saved as UTF-8.
Makes sense if the execute tries to encode to unicode first - as you didn't
give it a unicode-object.

In Python 2.6 where I originally developed the code, it wasn't
necessary to explicitly specify the type of
some text: it was basically handled for you. It does make sense
though.
You *disabled* unicode here!!!!! Unicode is NOT utf-8!!!

Oops. It was enabled when I ran it, I just copied the above text from
somewhere else and forgot to change it. I am aware that Unicode does
not equal
utf-8, but utf-8 is a Unicode encoding, right?
http://www.joelonsoftware.com/articles/Unicode.html
Thanks!


Try the above, and better yet provide self-contained examples that show the
behavior.

Diez

Still no luck. I also tried using double-quotes instead of single-
quotes around the relevant strings (as suggested over email by
ThreaderSlash), however, that did
not work for me.

Here is a small example of what I'm trying to do. Notice that if you
run it in Python 2.5-2.6, everything works fine. It is only in Python
2.4 that the
below example doesn't work.

============= Begin Example ==================

#!/usr/bin/env python
#-*- coding:utf-8 -*-
import sys

def main():
import MySQLdb, getpass

admin = raw_input("Database admin: ")
pw = getpass.getpass("Password: ")
db = MySQLdb.connect(user=admin, passwd=pw)

cursor = db.cursor()
cursor.execute("CREATE DATABASE IF NOT EXISTS unicode_test;")

cursor.execute('''
CREATE TABLE `unicode_test`.`test` (
`id` SMALLINT unsigned NOT NULL AUTO_INCREMENT,
`name` VARCHAR(255) NOT NULL,
PRIMARY KEY (`id`), INDEX (`id`)
) DEFAULT CHARSET=utf8;''')

cursor.execute('''
INSERT INTO `unicode_test`.`test` VALUES
(NULL, 'Ångström');
''')

# Test 1
print "Just printing: %s" % 'Ångström'

# Test 2
cursor.execute("SELECT name FROM unicode_test.test;")
print "From database: %s" % cursor.fetchone()[0].decode('utf-8')

# Test 3 (Manual)
print 'To verify manually: mysql -u %s -p -e "SELECT name FROM
unicode_test.test"' % admin

if __name__ == '__main__':
sys.exit(main())

============= End Example ====================

Any suggestions?
Thanks!
Keith
 
J

John Nagle

Keith said:
Hi all,

I ran into a problem recently when trying to add support for earlier
versions of Python (2.4 and 2.5) to some database related code which
uses MySQLdb, and was wondering if anyone has any suggestions.

With later versions of Python (2.6), inserting Unicode is very simple,
e.g.:

# -*- coding: utf-8 -*-
...
cursor.execute('''INSERT INTO `table` VALUES (0,
'Ångström'),...''')

When the same code is run on earlier versions, however, the results is
either garbled text (e.g. "Ã or "?" instead of "Å" in Python 2.5), or
an exception being thrown (Python 2.4):

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in
position 60: ordinal not in range(128)

So far I've tried a number of different things, including:

1. Using Unicode strings (e.g. u"\u212B")

2. Manually specifying the encoding using sys.setdefaultencoding
('utf-8')

3. Manually enabling Unicode support in MySQLdb
(use_unicode=False, charset = "utf8")

No, that's backwards. Try:

db = MySQLdb.connect(host="localhost",
use_unicode = True, charset = "utf8",
user=username, passwd=password, db=database)

"use_unicode" means that you want MySQLdb to accept and return
Unicode strings. "charset="utf8" means you want MySQLdb to
negotiate with the server to use UTF8 on the socket connecting
it to the database. This works fine in Python 2.4 and 2.5.
Returned strings will be in Unicode.

At the database end, you have to make sure that 1) MySQL was
built with Unicode support (it usually is), 2) the database
fields of interest are in Unicode. I suggest

ALTER DATABASE dbname DEFAULT CHARACTER SET utf8;

before doing any CREATE TABLE operations. Then strings
will be UTF8 in the database.

Read this: http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html

It all works quite well.

John Nagle
 
K

Keith Hughitt

Hi John,

Thanks for the suggestions: I have finally been able to get it
working :)

In anyone else runs into the same problem, here is some example code
that works in Python 2.4+:

============= Begin Example ==================

#!/usr/bin/env python
#-*- coding:utf-8 -*-
import sys
def main():
import MySQLdb, getpass

admin = raw_input("Database admin: ")
pw = getpass.getpass("Password: ")
db = MySQLdb.connect(use_unicode=True, charset = "utf8",
user=admin, passwd=pw)

cursor = db.cursor()
try:
cursor.execute("DROP DATABASE IF EXISTS unicode_test;")
cursor.execute("CREATE DATABASE unicode_test DEFAULT CHARACTER
SET utf8;")
except:
print ""

cursor.execute('''
CREATE TABLE `unicode_test`.`test` (
`id` SMALLINT unsigned NOT NULL AUTO_INCREMENT,
`name` VARCHAR(255) NOT NULL,
PRIMARY KEY (`id`), INDEX (`id`)
) DEFAULT CHARSET=utf8;''')

cursor.execute("INSERT INTO `unicode_test`.`test` VALUES (NULL,
'%s');" % u"Ångström")

# Test 1
print "Just printing: %s" % 'Ångström'

# Test 2
cursor.execute("SELECT name FROM unicode_test.test;")
print "From database: %s" % cursor.fetchone()[0]

# Test 3 (Manual)
print 'To verify manually: mysql -u %s -p -e "SELECT name FROM
unicode_test.test"' % admin

if __name__ == '__main__':
sys.exit(main())

============= End Example ====================

Thanks all for taking the time to help!

Best,
Keith
 
Joined
Dec 7, 2012
Messages
1
Reaction score
0
unicode strings being inserted into the DB without encoding

Hi,

I'm using Zope 2.12.19, MySQL 5.5 and Python 2.6.8
When I run a Python file that reads from the DB, it displays unicode strings in the shell.
I do not think this should be the case as the default encoding has been set in sitecustmoize.py to be utf-8.
So, before the strings are displayed they should have been encoded using utf-8 and displayed as byte strings rather than unicode strings.

the problem is data is being stored in the database with u' prefixes.
Example : Table contains "u'testdata" instead of "testdata".

What is the mistake here ?

Ravi
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,149
Members
46,695
Latest member
StanleyDri

Latest Threads

Top