Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Python's CGI and Javascripts uriEncode: A disconnect.

Reply
Thread Tools

Python's CGI and Javascripts uriEncode: A disconnect.

 
 
Elf M. Sternberg
Guest
Posts: n/a
 
      07-01-2003
It's all Netscape's fault.

RFC 2396 (URI Specifications) specifies that a space shall be encoded
using %20 and the plus symbol is always safe. Netscape (and possibly
even earlier browsers like Mosaic) used the plus symbol '+' as a
substitute for the space in the last part of the URI, arguments to the
object referenced (you know, all the stuff after the question mark in
a URL).

The ECMA-262 "Javascript" standard now supported by both Netscape and
Internet Explorer honor RFC 2396, translating spaces into their hex
equivalent %20 and leaving pluses alone.

The Python library cgi.FieldStorage decodes it backwards, expecting
pluses to be spaces and %2b to represent pluses. This behavior is
present even in python 2.2, and arguably helps support older browsers.
But when web applications are heavily javascript-dependent, this can
cause major headaches.

Other than override cgi.FieldStorage's parse_qsl, is there anyway to
fix this disconnect?

Elf
 
Reply With Quote
 
 
 
 
Andrew Clover
Guest
Posts: n/a
 
      07-05-2003
Elf M. Sternberg <(E-Mail Removed)> wrote:

> Netscape (and possibly even earlier browsers like Mosaic) used the
> plus symbol '+' as a substitute for the space in the last part of
> the URI


This is correct in a query parameter. eg. in ...?foo=abc+def, the symbol
is a space.

This is part of the specification for the media type
application/x-www-form-urlencoded, defined by HTML itself (section
17.13.4.1 of the 4.01 spec). This states that spaces should normally
be encoded as '+', however really using '%20' is just as good and
causes less confusion, so that's what newer browsers (and I) do.

Elsewhere, spaces should not be encoded as '+'.

The reasoning for this initial decision is unclear - presumably it is
intended to improve readability, but URIs with query parts are
generally not going to be very readable anyway.

> The ECMA-262 "Javascript" standard now supported by both Netscape and
> Internet Explorer honor RFC 2396, translating spaces into their hex
> equivalent %20 and leaving pluses alone.


Depends which function you are talking about. The 'escape' and 'encodeURI'
built-in functions are not designed to encode single URI query parameter
values, they're designed to encode larger chunks of URI. As such they do
not need to encode plus characters.

The encodeURIComponent function *does*, and it is this function that you
should use if you want some JavaScript code to submit a query parameter.

The only drawback is that encodeURIComponent is relatively new, so you
won't find it on medium-old browsers like Netscape 4 and IE 5.0. (The
same goes for encodeURI - you only get 'escape' in older browsers.)

> The Python library cgi.FieldStorage decodes it backwards, expecting
> pluses to be spaces and %2b to represent pluses.


The Python library is correct per spec. If your scripts are not encoding
plus symbols in query parameters to %2B, they are at fault (and will go
equally wrong in any other language).

Possible solutions:

a. use encodeURIComponent() instead. This is best, but won't work
universally.
b. use escape(), then replace any pluses in its output with %2B. This
is OK, but won't handle Unicode properly or predictably. (note: in IE,
encodeURI() also fails to handle Unicode predictably.)
c. roll your own encodeURIComponent function.

It's a bit off-topic for c.l.py, but here's a (c.)-style solution I've used
before:

function encPar(wide) {
var narrow= encUtf8(wide);
var enc= '';
for (var i= 0; i<narrow.length; i++) {
if (encPar_OK.indexOf(narrow.charAt(i))==-1)
enc= enc+encHex2(narrow.charCodeAt(i));
else
enc= enc+narrow.charAt(i);
}
return enc;
}
var encPar_OK= 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVW XYZ'+
'0123456789*@-_./';

function encHex2(v) {
return '%'+encHex2_DIGITS.charAt(v>>>4)+encHex2_DIGITS.ch arAt(v&0xF);
}
var encHex2_DIGITS= '0123456789ABCDEF';

function encUtf8(wide) {
var c, s;
var enc= '';
var i= 0;
while(i<wide.length) {
c= wide.charCodeAt(i++);
// handle UTF-16 surrogates
if (c>=0xDC00 && c<0xE000) continue;
if (c>=0xD800 && c<0xDC00) {
if (i>=wide.length) continue;
s= wide.charCodeAt(i++);
if (s<0xDC00 || c>=0xDE00) continue;
c= ((c-0xD800)<<10)+(s-0xDC00)+0x10000;
}
// output value
if (c<0x80) enc+=
String.fromCharCode(c);
else if (c<0x800) enc+=
String.fromCharCode(0xC0+(c>>6),0x80+(c&0x3F));
else if (c<0x10000) enc+=
String.fromCharCode(0xE0+(c>>12),0x80+(c>>6&0x3F), 0x80+(c&0x3F));
else enc+=
String.fromCharCode(0xF0+(c>>1,0x80+(c>>12&0x3F) ,
0x80+(c>>6&0x3F),0x80+(c&0x3F));
}
return enc;
}

if that's of any use.

Kind of sucks having to do this, eh?

--
Andrew Clover
(E-Mail Removed)
http://www.doxdesk.com/
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
JSP and Javascripts on same page (Passing values to JSP fromJavascript) Husain Javascript 3 04-14-2008 10:36 AM
dynamically loading and using javascripts in firefox stroumf Javascript 2 11-08-2006 04:47 PM
AJAX and disapeard javascripts tags <script></script> Grzegorz Ślusarek Javascript 1 10-12-2005 12:37 PM
Re: Including javascripts in HTML - it appears as the script never loads Bob Walton Perl 0 07-19-2003 02:30 AM
Can someone ExplainRun at server and Client javascripts Ric Pullen ASP .Net Web Controls 0 07-11-2003 11:56 AM



Advertisments