urllib2.unquote() vs unicode

Maciej Bliziñski · Mar 18, 2008

I've been hit by a urllib2.unquote() issue. Consider the following
unit test:

import unittest
import urllib2

class UnquoteUnitTest(unittest.TestCase):

def setUp(self):
self.utxt = u'%C4%99'
self.stxt = '%C4%99'

def testEq(self):
self.assertEqual(
self.utxt,
self.stxt)

def testStrEq(self):
self.assertEqual(
str(self.utxt),
str(self.stxt))

def testUnicodeEq(self):
self.assertEqual(
unicode(self.utxt),
unicode(self.stxt))

def testUnquote(self):
self.assertEqual(
urllib2.unquote(self.utxt),
urllib2.unquote(self.stxt))

def testUnquoteStr(self):
self.assertEqual(
urllib2.unquote(str(self.utxt)),
urllib2.unquote(str(self.stxt)))

def testUnquoteUnicode(self):
self.assertEqual(
urllib2.unquote(unicode(self.utxt)),
urllib2.unquote(unicode(self.stxt)))

if __name__ == '__main__':
unittest.main()

The three testEq*() tests positively confirm that the two are equal,
they are the same, they are also the same if cast both to str or
unicode. Tests with unquote() called with utxt and stxt cast into str
or unicode are also successful. However...

....E..
======================================================================
ERROR: testUnquote (__main__.UnquoteUnitTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "unquote.py", line 28, in testUnquote
urllib2.unquote(self.stxt))
File "/usr/lib/python2.4/unittest.py", line 332, in failUnlessEqual
if not first == second:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position
0: ordinal not in range(128)

----------------------------------------------------------------------
Ran 6 tests in 0.001s

FAILED (errors=1)

Why does this test fail while others are successful? Any ideas?

Regards,
Maciej

Gabriel Genellina · Mar 18, 2008

I've been hit by a urllib2.unquote() issue. Consider the following
unit test:

import unittest
import urllib2

class UnquoteUnitTest(unittest.TestCase):

def setUp(self):
self.utxt = u'%C4%99'
self.stxt = '%C4%99'

def testEq(self):
self.assertEqual(
self.utxt,
self.stxt)

def testStrEq(self):
self.assertEqual(
str(self.utxt),
str(self.stxt))

def testUnicodeEq(self):
self.assertEqual(
unicode(self.utxt),
unicode(self.stxt))

def testUnquote(self):
self.assertEqual(
urllib2.unquote(self.utxt),
urllib2.unquote(self.stxt))

def testUnquoteStr(self):
self.assertEqual(
urllib2.unquote(str(self.utxt)),
urllib2.unquote(str(self.stxt)))

def testUnquoteUnicode(self):
self.assertEqual(
urllib2.unquote(unicode(self.utxt)),
urllib2.unquote(unicode(self.stxt)))

if __name__ == '__main__':
unittest.main()

The three testEq*() tests positively confirm that the two are equal,
they are the same, they are also the same if cast both to str or
unicode. Tests with unquote() called with utxt and stxt cast into str
or unicode are also successful. However...

...E..
======================================================================
ERROR: testUnquote (__main__.UnquoteUnitTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "unquote.py", line 28, in testUnquote
urllib2.unquote(self.stxt))
File "/usr/lib/python2.4/unittest.py", line 332, in failUnlessEqual
if not first == second:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position
0: ordinal not in range(128)

----------------------------------------------------------------------
Ran 6 tests in 0.001s

FAILED (errors=1)

Why does this test fail while others are successful? Any ideas?

Both utxt and stxt consist exclusively of ASCII characters, so the
default ASCII encoding works fine.
When both are converted to unicode, or both are converted to string,
and then "unquoted", the resulting objects are again both unicode or
both strings, and compare without problem (even if they can't be
represented in ASCII at this stage).
In testUnquote, after "unquoting", you have non ASCII chars, both
string and unicode, and it fails to convert both to the same type to
compare them.

parametized unittest	3	Jan 12, 2014
Strange classmethod mock behavior	0	Oct 25, 2011
How can I hide my stack frames in a TestCase subclass?	5	Oct 4, 2012
unit testing, setUp and scoping	1	Apr 14, 2010
Decorator for Enforcing Argument Types	20	Dec 21, 2006
[ANN] Oktest 0.9.0 released - a new-style testing library	0	Aug 27, 2011
unittest.py patch: add skipped test functionality	0	Sep 24, 2004
InvalidResponseError: headers must be str	9	Dec 31, 2011

urllib2.unquote() vs unicode

Maciej Bliziñski

Gabriel Genellina

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads