Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > HTML > Unicode in application/x-www-form-urlencoded?

Reply
Thread Tools

Unicode in application/x-www-form-urlencoded?

 
 
Leif K-Brooks
Guest
Posts: n/a
 
      11-28-2004
What is the proper encoding for a browser to use to encode non-ASCII
characters? HTML 4.01 doesn't seem to say anything about it, but XForms
1.0 specifies that browsers should use UTF-8.

Of the browsers I've tested, Firefox 1.0 seems to use UTF-8 whereas
Konqueror 3.2.2 seems to use latin-1 or a question mark if the character
can't be represented that way. Does anyone know what IE does? Is there
anything besides user-agent sniffing that Web authors can do to accept
Unicode characters in our forums?
 
Reply With Quote
 
 
 
 
Toby Inkster
Guest
Posts: n/a
 
      11-28-2004
Leif K-Brooks wrote:

> What is the proper encoding for a browser to use to encode non-ASCII
> characters? HTML 4.01 doesn't seem to say anything about it, but XForms
> 1.0 specifies that browsers should use UTF-8.


The HTML 2.0 spec says:
| The form field names and values are escaped: space characters are
| replaced by `+', and then reserved characters are escaped as per [URL]
http://www.w3.org/MarkUp/html-spec/html-spec_8.html#SEC8.2.1

RFC 1738 (the document referenced as [URL]) says:
| Octets must be encoded if they have no corresponding graphic
| character within the US-ASCII coded character set
http://www.ietf.org/rfc/rfc1738.txt

Clearly non-ASCII characters have "no corresponding graphic character
withint the US-ASCII coded character set", so they "must be encoded".
Encoded exactly how, who knows?

The default charset for HTTP/1.1 is ISO-8859-1, so perhaps that?

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact

 
Reply With Quote
 
 
 
 
Leif K-Brooks
Guest
Posts: n/a
 
      11-28-2004
Toby Inkster wrote:
> Leif K-Brooks wrote:
>
>
>>What is the proper encoding for a browser to use to encode non-ASCII
>>characters

>
> RFC 1738 (the document referenced as [URL]) says:
> | Octets must be encoded if they have no corresponding graphic
> | character within the US-ASCII coded character set
> http://www.ietf.org/rfc/rfc1738.txt
>
> Clearly non-ASCII characters have "no corresponding graphic character
> withint the US-ASCII coded character set", so they "must be encoded".
> Encoded exactly how, who knows?


True, although it mentions octets (8-bit sequences), not characters; I
think what it means by "encoded" is e.g. changing a 0xFF byte into the
string "%FF". Many octets of a UTF8-encoded string would probably need
to be encoded that way, but it still doesn't answer whether UTF8 or some
other character encoding should be used.

After a bit more testing, though, it seems that most browsers simply use
the same encoding that they were sent. The differences between browsers
in my original testing was caused by stupidly failing to specify an
encoding in the Content-Type.

Thanks a lot for the help.
 
Reply With Quote
 
Courtney
Guest
Posts: n/a
 
      11-29-2004

"Leif K-Brooks" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> What is the proper encoding for a browser to use to encode non-ASCII
> characters? HTML 4.01 doesn't seem to say anything about it, but XForms
> 1.0 specifies that browsers should use UTF-8.
>
> Of the browsers I've tested, Firefox 1.0 seems to use UTF-8 whereas
> Konqueror 3.2.2 seems to use latin-1 or a question mark if the character
> can't be represented that way. Does anyone know what IE does? Is there
> anything besides user-agent sniffing that Web authors can do to accept
> Unicode characters in our forums?


Why not specify when you create the page?

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Untitled Document</title>
</head>
<body>
</body>
</html>

This will cause the browser to use UTF-8 reguardless of what the default for
the browser is.

courtney sends...


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: os.lisdir, gets unicode, returns unicode... USUALLY?!?!? Jean-Paul Calderone Python 23 11-21-2006 10:25 AM
os.lisdir, gets unicode, returns unicode... USUALLY?!?!? gabor Python 13 11-18-2006 09:23 AM
Unicode digit to unicode string Gabriele *darkbard* Farina Python 2 05-16-2006 01:15 PM
unicode wrap unicode object? ygao Python 6 04-08-2006 09:54 AM
Unicode + jsp + mysql + tomcat = unicode still not displaying Robert Mark Bram Java 0 09-28-2003 05:37 AM



Advertisments