UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position

I

iMath

the following code originally from http://zetcode.com/databases/mysqlpythontutorial/
within the "Writing images" part .


import MySQLdb as mdb
import sys

try:
fin = open("Chrome_Logo.svg.png",'rb')
img = fin.read()
fin.close()

except IOError as e:

print ("Error %d: %s" % (e.args[0],e.args[1]))
sys.exit(1)


try:
conn = mdb.connect(host='localhost',user='testuser',
passwd='test623', db='testdb')
cursor = conn.cursor()
cursor.execute("INSERT INTO Images SET Data='%s'" % \
mdb.escape_string(img))

conn.commit()

cursor.close()
conn.close()

except mdb.Error as e:

print ("Error %d: %s" % (e.args[0],e.args[1]))
sys.exit(1)


I port it to python 3 ,and also change
fin = open("chrome.png")
to
fin = open("Chrome_Logo.png",'rb')
but when I run it ,it gives the following error :

Traceback (most recent call last):
File "E:\Python\py32\itest4.py", line 20, in <module>
mdb.escape_string(img))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

so how to fix it ?
 
T

Terry Reedy

the following code originally from http://zetcode.com/databases/mysqlpythontutorial/
within the "Writing images" part .


import MySQLdb as mdb

Not part of stdlib. 'MySQLdb' should be in the subject line to get
attention of someone who is familiar with it. I am not.
import sys

try:
fin = open("Chrome_Logo.svg.png",'rb')
img = fin.read()
fin.close()

except IOError as e:

print ("Error %d: %s" % (e.args[0],e.args[1]))
sys.exit(1)


try:
conn = mdb.connect(host='localhost',user='testuser',
passwd='test623', db='testdb')
cursor = conn.cursor()
cursor.execute("INSERT INTO Images SET Data='%s'" % \
mdb.escape_string(img))

From the name, I would expect that excape_string expects text. From the
error, it seems to specifically expect utf-8 encoded bytes. After
decoding, I expect that it does some sort of 'escaping'. An image does
not qualify as that sort of input. If escape_string takes an encoding
arg, latin1 *might* work.
conn.commit()

cursor.close()
conn.close()

except mdb.Error as e:

print ("Error %d: %s" % (e.args[0],e.args[1]))
sys.exit(1)


I port it to python 3 ,and also change
fin = open("chrome.png")
to
fin = open("Chrome_Logo.png",'rb')
but when I run it ,it gives the following error :

Traceback (most recent call last):
File "E:\Python\py32\itest4.py", line 20, in <module>
mdb.escape_string(img))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

so how to fix it ?
 
H

Hans Mulder

the following code originally from http://zetcode.com/databases/mysqlpythontutorial/
within the "Writing images" part .


import MySQLdb as mdb
import sys

try:
fin = open("Chrome_Logo.svg.png",'rb')
img = fin.read()
fin.close()

except IOError as e:

print ("Error %d: %s" % (e.args[0],e.args[1]))
sys.exit(1)


try:
conn = mdb.connect(host='localhost',user='testuser',
passwd='test623', db='testdb')
cursor = conn.cursor()
cursor.execute("INSERT INTO Images SET Data='%s'" % \
mdb.escape_string(img))

You shouldn't call mdb.escape_string directly. Instead, you
should put placeholders in your SQL statement and let MySQLdb
figure out how to properly escape whatever needs escaping.

Somewhat confusingly, placeholders are written as %s in MySQLdb.
They differ from strings in not being enclosed in quotes.
The other difference is that you'd provide two arguments to
cursor.execute; the second of these is a tuple; in this case
a tuple with only one element:

cursor.execute("INSERT INTO Images SET Data=%s", (img,))
conn.commit()

cursor.close()
conn.close()

except mdb.Error as e:

print ("Error %d: %s" % (e.args[0],e.args[1]))
sys.exit(1)


I port it to python 3 ,and also change
fin = open("chrome.png")
to
fin = open("Chrome_Logo.png",'rb')
but when I run it ,it gives the following error :

Traceback (most recent call last):
File "E:\Python\py32\itest4.py", line 20, in <module>
mdb.escape_string(img))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

so how to fix it ?

Python 3 distinguishes between binary data and Unicode text.
Trying to apply string functions to images or other binary
data won't work.

Maybe correcting this bytes/strings confusion and porting
to Python 3 in one go is too large a transformation. In
that case, your best bet would be to go back to Python 2
and fix all the bytes/string confusion there. When you've
got it working again, you may be ready to port to Python 3.


Hope this helps,

-- HansM
 
S

Steven D'Aprano

the following code originally from
http://zetcode.com/databases/mysqlpythontutorial/ within the "Writing
images" part .


import MySQLdb as mdb
import sys

try:
fin = open("Chrome_Logo.svg.png",'rb')
img = fin.read()
fin.close()
except IOError as e:
print ("Error %d: %s" % (e.args[0],e.args[1]))
sys.exit(1)

Every time a programmer catches an exception, only to merely print a
vague error message and then exit, God kills a kitten. Please don't do
that.

If all you are going to do is print an error message and then exit,
please don't bother. All you do is make debugging harder. When Python
detects an error, by default it prints a full traceback, which gives you
lots of information to track down the error. By catching that exception
as you do, you lose that information and make it harder to debug.

Moving on to the next thing:


[snip code]
I port it to python 3 ,and also change fin = open("chrome.png")
to
fin = open("Chrome_Logo.png",'rb')
but when I run it ,it gives the following error :

Traceback (most recent call last):
File "E:\Python\py32\itest4.py", line 20, in <module>
mdb.escape_string(img))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0:
invalid start byte

so how to fix it ?

I suggest you start by reading the documentation for
MySQLdb.escape_string. What does it do? What does it expect? A byte
string or a unicode text string?

It seems very strange to me that you are reading a binary file, then
passing it to something which appears to be expecting a string. It looks
like what happens is that the PNG image starts with a 0x89 byte, and the
escape_string function tries to decode those bytes into Unicode text:

py> img = b"\x89\x00\x23\xf2" # fake PNG binary data
py> img.decode('utf-8') # I'm expecting text
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0:
invalid start byte

Without knowing more about escape_string, I can only make a wild guess.
Try this:

import base64
img = fin.read() # read the binary data of the PNG file
data = base64.encodebytes(img) # turn the binary image into text
cursor.execute("INSERT INTO Images SET Data='%s'" % \
mdb.escape_string(data))


and see what that does.
 
I

iMath

在 2012å¹´12月6日星期四UTC+8下åˆ7æ—¶07分35秒,Hans Mulder写é“:
within the "Writing images" part .


import MySQLdb as mdb
import sys


fin = open("Chrome_Logo.svg.png",'rb')
img = fin.read()
fin.close()

except IOError as e:
print ("Error %d: %s" % (e.args[0],e.args[1]))
sys.exit(1)

conn = mdb.connect(host='localhost',user='testuser',
passwd='test623', db='testdb')
cursor = conn.cursor()
cursor.execute("INSERT INTO Images SET Data='%s'" % \
mdb.escape_string(img))



You shouldn't call mdb.escape_string directly. Instead, you

should put placeholders in your SQL statement and let MySQLdb

figure out how to properly escape whatever needs escaping.



Somewhat confusingly, placeholders are written as %s in MySQLdb.

They differ from strings in not being enclosed in quotes.

The other difference is that you'd provide two arguments to

cursor.execute; the second of these is a tuple; in this case

a tuple with only one element:



cursor.execute("INSERT INTO Images SET Data=%s", (img,))
thanks,but it still doesn't work
conn.commit()
cursor.close()
conn.close()

except mdb.Error as e:
print ("Error %d: %s" % (e.args[0],e.args[1]))
sys.exit(1)

I port it to python 3 ,and also change
fin = open("chrome.png")
fin = open("Chrome_Logo.png",'rb')
but when I run it ,it gives the following error :

Traceback (most recent call last):
File "E:\Python\py32\itest4.py", line 20, in <module>

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0:invalid start byte

so how to fix it ?



Python 3 distinguishes between binary data and Unicode text.

Trying to apply string functions to images or other binary

data won't work.



Maybe correcting this bytes/strings confusion and porting

to Python 3 in one go is too large a transformation. In

that case, your best bet would be to go back to Python 2

and fix all the bytes/string confusion there. When you've

got it working again, you may be ready to port to Python 3.





Hope this helps,



-- HansM
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,708
Latest member
SherleneF1

Latest Threads

Top