Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > URL 'special character' replacements

Reply
Thread Tools

URL 'special character' replacements

 
 
Claude Henchoz
Guest
Posts: n/a
 
      01-09-2006
Hi guys

I have a huge list of URLs. These URLs all have ASCII codes for special
characters, like "%20" for a space or "%21" for an exclamation mark.

I've already googled quite some time, but I have not been able to find
any elegant way on how to replace these with their 'real' counterparts
(" " and "!").

Of course, I could just replace(), but that seems to be a lot of work.

Thanks for any help.

Cheers, Claude

 
Reply With Quote
 
 
 
 
Richie Hindle
Guest
Posts: n/a
 
      01-09-2006

[Claude]
> I have a huge list of URLs. These URLs all have ASCII codes for special
> characters, like "%20" for a space or "%21" for an exclamation mark.


You need urllib.unquote:

>>> import urllib
>>> help(urllib.unquote)

Help on function unquote in module urllib:

unquote(s)
unquote('abc%20def') -> 'abc def'.

--
Richie Hindle
http://www.velocityreviews.com/forums/(E-Mail Removed)
 
Reply With Quote
 
 
 
 
Duncan Booth
Guest
Posts: n/a
 
      01-09-2006
Claude Henchoz wrote:

> I have a huge list of URLs. These URLs all have ASCII codes for special
> characters, like "%20" for a space or "%21" for an exclamation mark.
>
> I've already googled quite some time, but I have not been able to find
> any elegant way on how to replace these with their 'real' counterparts
> (" " and "!").
>
> Of course, I could just replace(), but that seems to be a lot of work.
>


urllib.unquote() or urllib.unquote_plus() as appropriate:

unquote( string)

Replace "%xx" escapes by their single-character equivalent.
Example: unquote('/%7Econnolly/') yields '/~connolly/'.


unquote_plus( string)

Like unquote(), but also replaces plus signs by spaces, as required for
unquoting HTML form values.

 
Reply With Quote
 
Fredrik Lundh
Guest
Posts: n/a
 
      01-09-2006
Claude Henchoz wrote:

> I have a huge list of URLs. These URLs all have ASCII codes for special
> characters, like "%20" for a space or "%21" for an exclamation mark.
>
> I've already googled quite some time, but I have not been able to find
> any elegant way on how to replace these with their 'real' counterparts
> (" " and "!").
>
> Of course, I could just replace(), but that seems to be a lot of work.


>>> import urllib
>>> urllib.unquote("http://docs.python.org/lib/module-urllib.html%20%21")

'http://docs.python.org/lib/module-urllib.html !'

</F>



 
Reply With Quote
 
Tim N. van der Leeuw
Guest
Posts: n/a
 
      01-09-2006
My outline for a solution would be:

- Use StringIO or cStringIO for reading the original URLs character for
character, and to build the result URLs character for character

- When you read a '%' then read the next 2 character (should be
digits!!!) and create a new string with them
- The numbers like '20' etc. are hexadecimal values, meaning integers
with base 16.
Get the actual int-value like this:
code_int = int(code_str, 16)
- Convert to character as: code_chr = chr(code_int)
- Write this character to the output cStringIO buffer
- When the whole URL is done, do getvalue() to get the string of the
new URL and close the cStringIO buffer.

Is that sufficiently comprehensible? Or still too convoluted for you?

(PS: I researched doing it the manual way, 'the hard way'. However,
there are plenty of libraries in Python for all sorts of internet
stuff. Perhaps urllib or urllib2 already has the functionality that you
need -- didn't look it up)

cheers,

--Tim

 
Reply With Quote
 
Brett g Porter
Guest
Posts: n/a
 
      01-09-2006
Claude Henchoz wrote:
> Hi guys
>
> I have a huge list of URLs. These URLs all have ASCII codes for special
> characters, like "%20" for a space or "%21" for an exclamation mark.
>
> I've already googled quite some time, but I have not been able to find
> any elegant way on how to replace these with their 'real' counterparts
> (" " and "!").
>
> Of course, I could just replace(), but that seems to be a lot of work.
>
> Thanks for any help.
>
> Cheers, Claude
>


The standard library module 'urllib' gies you two choices, depending on
the exact behavior you'd like:

http://www.python.org/doc/2.3.2/lib/module-urllib.html
unquote(string)
Replace "%xx" escapes by their single-character equivalent.

Example: unquote('/%7Econnolly/') yields '/~connolly/'.

unquote_plus(string)
Like unquote(), but also replaces plus signs by spaces, as required
for unquoting HTML form values.


--
// Today's Oblique Strategy ( Brian Eno/Peter Schmidt):
// Accretion
// Brett g Porter * (E-Mail Removed)

 
Reply With Quote
 
Claude Henchoz
Guest
Posts: n/a
 
      01-09-2006
Thanks guys, I like the urllib solution. Stupid me, looked at urllib
reference, but thought that "quote" and "unquote" deal with
_&_n_b_s_p_;_ style entities.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
DVD Rental Licenses: Perpetual plus cheap replacements? Dee V. Dee DVD Video 3 03-26-2005 07:53 PM
Sony InfoLithium Battery Replacements Mike Schudel Digital Photography 2 07-22-2004 12:52 AM
Replacements For Single-Disk Failures In Box-Sets? Confessor DVD Video 12 07-09-2004 05:12 PM
Replacements for BP-511 for Canon cameras Rick Langston Digital Photography 12 03-07-2004 04:03 PM
Possible to get replacements for lost discs? SoHillsGuy DVD Video 1 08-17-2003 11:59 PM



Advertisments