Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > urllib2.unquote() vs unicode

Reply
Thread Tools

urllib2.unquote() vs unicode

 
 
Maciej Bliziński
Guest
Posts: n/a
 
      03-18-2008
I've been hit by a urllib2.unquote() issue. Consider the following
unit test:

import unittest
import urllib2

class UnquoteUnitTest(unittest.TestCase):

def setUp(self):
self.utxt = u'%C4%99'
self.stxt = '%C4%99'

def testEq(self):
self.assertEqual(
self.utxt,
self.stxt)

def testStrEq(self):
self.assertEqual(
str(self.utxt),
str(self.stxt))

def testUnicodeEq(self):
self.assertEqual(
unicode(self.utxt),
unicode(self.stxt))

def testUnquote(self):
self.assertEqual(
urllib2.unquote(self.utxt),
urllib2.unquote(self.stxt))

def testUnquoteStr(self):
self.assertEqual(
urllib2.unquote(str(self.utxt)),
urllib2.unquote(str(self.stxt)))

def testUnquoteUnicode(self):
self.assertEqual(
urllib2.unquote(unicode(self.utxt)),
urllib2.unquote(unicode(self.stxt)))


if __name__ == '__main__':
unittest.main()

The three testEq*() tests positively confirm that the two are equal,
they are the same, they are also the same if cast both to str or
unicode. Tests with unquote() called with utxt and stxt cast into str
or unicode are also successful. However...


....E..
================================================== ====================
ERROR: testUnquote (__main__.UnquoteUnitTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "unquote.py", line 28, in testUnquote
urllib2.unquote(self.stxt))
File "/usr/lib/python2.4/unittest.py", line 332, in failUnlessEqual
if not first == second:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position
0: ordinal not in range(12

----------------------------------------------------------------------
Ran 6 tests in 0.001s

FAILED (errors=1)

Why does this test fail while others are successful? Any ideas?

Regards,
Maciej
 
Reply With Quote
 
 
 
 
Gabriel Genellina
Guest
Posts: n/a
 
      03-18-2008
On 18 mar, 02:20, Maciej Bliziński <(E-Mail Removed)> wrote:
> I've been hit by a urllib2.unquote() issue. Consider the following
> unit test:
>
> import unittest
> import urllib2
>
> class UnquoteUnitTest(unittest.TestCase):
>
> * *def setUp(self):
> * * * *self.utxt = u'%C4%99'
> * * * *self.stxt = '%C4%99'
>
> * *def testEq(self):
> * * * *self.assertEqual(
> * * * * * * * *self.utxt,
> * * * * * * * *self.stxt)
>
> * *def testStrEq(self):
> * * * *self.assertEqual(
> * * * * * * * *str(self.utxt),
> * * * * * * * *str(self.stxt))
>
> * *def testUnicodeEq(self):
> * * * *self.assertEqual(
> * * * * * * * *unicode(self.utxt),
> * * * * * * * *unicode(self.stxt))
>
> * *def testUnquote(self):
> * * * *self.assertEqual(
> * * * * * * * *urllib2.unquote(self.utxt),
> * * * * * * * *urllib2.unquote(self.stxt))
>
> * *def testUnquoteStr(self):
> * * * *self.assertEqual(
> * * * * * * * *urllib2.unquote(str(self.utxt)),
> * * * * * * * *urllib2.unquote(str(self.stxt)))
>
> * *def testUnquoteUnicode(self):
> * * * *self.assertEqual(
> * * * * * * * *urllib2.unquote(unicode(self.utxt)),
> * * * * * * * *urllib2.unquote(unicode(self.stxt)))
>
> if __name__ == '__main__':
> * *unittest.main()
>
> The three testEq*() tests positively confirm that the two are equal,
> they are the same, they are also the same if cast both to str or
> unicode. Tests with unquote() called with utxt and stxt cast into str
> or unicode are also successful. However...
>
> ...E..
> ================================================== ====================
> ERROR: testUnquote (__main__.UnquoteUnitTest)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> *File "unquote.py", line 28, in testUnquote
> * *urllib2.unquote(self.stxt))
> *File "/usr/lib/python2.4/unittest.py", line 332, in failUnlessEqual
> * *if not first == second:
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position
> 0: ordinal not in range(12
>
> ----------------------------------------------------------------------
> Ran 6 tests in 0.001s
>
> FAILED (errors=1)
>
> Why does this test fail while others are successful? Any ideas?


Both utxt and stxt consist exclusively of ASCII characters, so the
default ASCII encoding works fine.
When both are converted to unicode, or both are converted to string,
and then "unquoted", the resulting objects are again both unicode or
both strings, and compare without problem (even if they can't be
represented in ASCII at this stage).
In testUnquote, after "unquoting", you have non ASCII chars, both
string and unicode, and it fails to convert both to the same type to
compare them.

--
Gabriel Genellina
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: os.lisdir, gets unicode, returns unicode... USUALLY?!?!? Jean-Paul Calderone Python 23 11-21-2006 10:25 AM
os.lisdir, gets unicode, returns unicode... USUALLY?!?!? gabor Python 13 11-18-2006 09:23 AM
Unicode digit to unicode string Gabriele *darkbard* Farina Python 2 05-16-2006 01:15 PM
unicode wrap unicode object? ygao Python 6 04-08-2006 09:54 AM
Unicode + jsp + mysql + tomcat = unicode still not displaying Robert Mark Bram Java 0 09-28-2003 05:37 AM



Advertisments