Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > urllib.unquote + unicode

Reply
Thread Tools

urllib.unquote + unicode

 
 
koara
Guest
Posts: n/a
 
      11-13-2007
Hello all,

i am using urllib.unquote_plus to unquote a string. Sometimes i get a
strange string like for example "spolu%u017E%E1ci.cz" to unquote. Here
the problem is that some application decided to quote a non-ascii
character as %uxxxx directly, instead of using an encoding and quoting
byte per byte.

Python (2.4.1) simply returns "'spolu%u017E\xe1ci.cz", which is likely
not what the application meant.

My question is, is this %u quoting a standard (i.e., urllib is in the
wrong), is it not (i.e., the application is in the wrong and urllib
silently ignores the '%u0' - why?), and most importantly, is there a
simple workaround to get it working as expected?

Cheers!

 
Reply With Quote
 
 
 
 
Gabriel Genellina
Guest
Posts: n/a
 
      11-14-2007
En Tue, 13 Nov 2007 13:14:18 -0300, koara <(E-Mail Removed)> escribió:

> i am using urllib.unquote_plus to unquote a string. Sometimes i get a
> strange string like for example "spolu%u017E%E1ci.cz" to unquote. Here
> the problem is that some application decided to quote a non-ascii
> character as %uxxxx directly, instead of using an encoding and quoting
> byte per byte.
>
> Python (2.4.1) simply returns "'spolu%u017E\xe1ci.cz", which is likely
> not what the application meant.
>
> My question is, is this %u quoting a standard (i.e., urllib is in the
> wrong),


Not that I know of (and that doesn't prove anything).

> is it not (i.e., the application is in the wrong and urllib
> silently ignores the '%u0' - why?), and most importantly, is there a
> simple workaround to get it working as expected?


Try this (untested):

def unquote_plus_u(source):
result = unquote_plus(source)
if '%u' in result:
result = result.replace('%u','\\u').decode('unicode_escape' )
return result

--
Gabriel Genellina

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: os.lisdir, gets unicode, returns unicode... USUALLY?!?!? Jean-Paul Calderone Python 23 11-21-2006 10:25 AM
os.lisdir, gets unicode, returns unicode... USUALLY?!?!? gabor Python 13 11-18-2006 09:23 AM
Unicode digit to unicode string Gabriele *darkbard* Farina Python 2 05-16-2006 01:15 PM
unicode wrap unicode object? ygao Python 6 04-08-2006 09:54 AM
Unicode + jsp + mysql + tomcat = unicode still not displaying Robert Mark Bram Java 0 09-28-2003 05:37 AM



Advertisments