Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > URL Character Decoding

Reply
Thread Tools

URL Character Decoding

 
 
Kirk McDonald
Guest
Posts: n/a
 
      01-30-2006
If you have a link such as, e.g.:

<a href="index.py?title=Main Menu">Main menu!</a>

The space will be translated to the character code '%20' when you later
retrieve the GET data. Not knowing if there was a library function that
would convert these back to their actual characters, I've written the
following:

import re

def sub_func(m):
return chr(int(m.group()[1:], 16))

def parse_title(title):
p = re.compile(r'%[0-9][0-9]')
return re.sub(p, sub_func, title)

(I know I could probably use a lambda function instead of sub_func, but
I come to Python via C++ and am still not entirely used to them. This is
clearer to me, at least.)

I guess what I'm asking is: Is there a library function (in Python or
mod_python) that knows how to do this? Or, failing that, is there a
different regex I could use to get rid of the substitution function?

-Kirk McDonald
 
Reply With Quote
 
 
 
 
Kirk McDonald
Guest
Posts: n/a
 
      01-30-2006
Kirk McDonald wrote:
> If you have a link such as, e.g.:
>
> <a href="index.py?title=Main Menu">Main menu!</a>
>
> The space will be translated to the character code '%20' when you later
> retrieve the GET data. Not knowing if there was a library function that
> would convert these back to their actual characters, I've written the
> following:
>
> import re
>
> def sub_func(m):
> return chr(int(m.group()[1:], 16))
>
> def parse_title(title):
> p = re.compile(r'%[0-9][0-9]')
> return re.sub(p, sub_func, title)
>
> (I know I could probably use a lambda function instead of sub_func, but
> I come to Python via C++ and am still not entirely used to them. This is
> clearer to me, at least.)
>
> I guess what I'm asking is: Is there a library function (in Python or
> mod_python) that knows how to do this? Or, failing that, is there a
> different regex I could use to get rid of the substitution function?
>
> -Kirk McDonald


Actually, I just noticed this doesn't really work at all. The URL
character codes are in hex, so not only does the regex not match what it
should, but sub_func fails miserably. See why I wanted a library function?

-Kirk McDonald
 
Reply With Quote
 
 
 
 
Kirk McDonald
Guest
Posts: n/a
 
      01-30-2006
Kirk McDonald wrote:
> Actually, I just noticed this doesn't really work at all. The URL
> character codes are in hex, so not only does the regex not match what it
> should, but sub_func fails miserably. See why I wanted a library function?
>
> -Kirk McDonald


Not to keep talking to myself, but looks like sub_func works fine, and
the regex just needs to be r'%[0-9a-fA-F][0-9a-fA-F]'. But even so.

-Kirk McDonald
 
Reply With Quote
 
Paul McGuire
Guest
Posts: n/a
 
      01-30-2006
"Kirk McDonald" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> If you have a link such as, e.g.:
>
> <a href="index.py?title=Main Menu">Main menu!</a>
>
> The space will be translated to the character code '%20' when you later
> retrieve the GET data.
>
> I guess what I'm asking is: Is there a library function (in Python or
> mod_python) that knows how to do this? Or, failing that, is there a
> different regex I could use to get rid of the substitution function?
>
> -Kirk McDonald



>>> import urllib
>>> urllib.quote("index.py?title=Main Menu")

'index.py%3Ftitle%3DMain%20Menu'
>>> urllib.unquote("index.py%3Ftitle%3DMain%20Menu")

'index.py?title=Main Menu'


 
Reply With Quote
 
Kirk McDonald
Guest
Posts: n/a
 
      01-30-2006
Paul McGuire wrote:
> "Kirk McDonald" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
>
>>If you have a link such as, e.g.:
>>
>><a href="index.py?title=Main Menu">Main menu!</a>
>>
>>The space will be translated to the character code '%20' when you later
>>retrieve the GET data.
>>
>>I guess what I'm asking is: Is there a library function (in Python or
>>mod_python) that knows how to do this? Or, failing that, is there a
>>different regex I could use to get rid of the substitution function?
>>
>>-Kirk McDonald

>
>
>
>>>>import urllib
>>>>urllib.quote("index.py?title=Main Menu")

>
> 'index.py%3Ftitle%3DMain%20Menu'
>
>>>>urllib.unquote("index.py%3Ftitle%3DMain%20Menu ")

>
> 'index.py?title=Main Menu'
>
>


Perfect! Thanks.

-Kirk McDonald
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Prevent REXML from doing any character decoding helzer Ruby 1 09-21-2007 03:04 PM
URL decoding/encoding problem flyingco C Programming 4 11-27-2006 08:50 AM
Query parameters, Javascript and URL decoding issue R L Vandaveer ASP .Net 0 12-22-2005 04:48 PM
URL Decoding Issue --- HELP! Ron Clabo ASP .Net 3 04-28-2005 07:36 PM
Decoding url HP ASP .Net 2 01-12-2005 08:16 PM



Advertisments