Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Differences in UTF-8 html form inputs

Reply
Thread Tools

Differences in UTF-8 html form inputs

 
 
Realbot
Guest
Posts: n/a
 
      01-08-2005
Hi,

I'm having some problems with a web application of mine.
To make things clearer here is an html input form which shows it.
It inputs two strings with GET and POST and it uses HTML::Mason.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Test utf</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<form name="formutfget" method="GET">
Enter text (get):<br>
<input type="text" name="textget" size="20" maxlength="30">
</form>
<form name="formutfpost" method="POST">
Enter text (post):<br>
<input type="text" name="textpost" size="20" maxlength="30">
</form>
Value of GET: <% $textget %><br>
Hex of GET: <% $hexget %><br>
Value of POST: <% $textpost %><br>
Hex of POST: <% $hexpost %><br>
</body>
</html>
<%args>
$textget => ''
$textpost => ''
$hexget => ''
$hexpost => ''
</%args>
<%init>
$hexget = unpack('H*', $textget);
$hexpost = unpack('H*', $textpost);
</%init>

The strange thing is that running this form under these environments
Debian Woody - perl 5.6.1 - Mozilla 1.4.3/Firefox 1.0
Debian Sid - perl 5.8.4 - Mozilla 1.4.3/Firefox 1.0
using as input the string "Δωδεκανήσων" (I don't know what it means btw...), I get as output

Value of GET: Δωδεκανήσων
Hex of GET: 26233931363b26233936393b26233934383b26233934393b26 233935343b26233934353b26233935373b26233934323b2623 3936333b26233936393b26233935373b
Value of POST: Δωδεκανήσων
Hex of POST: 26233931363b26233936393b26233934383b26233934393b26 233935343b26233934353b26233935373b26233934323b2623 3936333b26233936393b26233935373b

while in OpenBSD - perl 5.8.0 - Mozilla 1.4.3/Firefox 1.0 with the same input string I get

Value of GET: Δωδεκανήσων
Hex of GET: ce94cf89ceb4ceb5cebaceb1cebdceaecf83cf89cebd
Value of POST: Δωδεκανήσων
Hex of POST: ce94cf89ceb4ceb5cebaceb1cebdceaecf83cf89cebd

So, it seems that in the former I get escaped unicode character and in the latter UTF-8 ones.
I thought that it could be a 5.6 vs 5.8 difference but as you can see even under Debian Sid I got the same unicode chars.
Could it be an OpenBSD peculiarity? I've Googled but with no luck, maybe someone can shed some light on it...

Thanks!

 
Reply With Quote
 
 
 
 
Chris Mattern
Guest
Posts: n/a
 
      01-08-2005
Realbot wrote:


> using as input the string "???????????" (I don't know what it means
> btw...),


"Dodecahedron"--i.e., a solid shape with 12 faces. If you're a gamer
who owns "funny dice", your 12-sided dice are dodecahedrons (or, if
you prefer, dodecahedra).
--
Christopher Mattern

"Which one you figure tracked us?"
"The ugly one, sir."
"...Could you be more specific?"
 
Reply With Quote
 
 
 
 
Alan J. Flavell
Guest
Posts: n/a
 
      01-10-2005
On Sat, 8 Jan 2005, Realbot wrote:

> I'm having some problems with a web application of mine.


Forms submission including characters outside of us-ascii is
non-trivial, and isn't in itself a Perl problem.

OT: commentary of mine at
http://ppewww.ph.gla.ac.uk/~flavell/...form-i18n.html

Until one can get that part sorted out to one's satisfaction, any
fiddling around that one might do in one's Perl script would be a bit
pointless, IMHO. And discussion of the web part would be more at home
on comp.infosystems.www.authoring.cgi (beware the automoderation bot).

> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">


If we assume that the page itself is really coded in utf-8 (note that
in the event of a dispute, the server's actual HTTP Content-type
header wins over anything that you might secrete in a meta
http-equiv), then you can expect current browsers to submit
utf-8-encoded form data. But not-quite-so-new browsers - even some
which support utf-8 display - get utf-8 forms submission sadly wrong.

> <form name="formutfget" method="GET">


In -theory- the method GET supports nothing better than the us-ascii
character coding. But see my commentary for further discussion.

> The strange thing is that running this form under these environments

[...]

> So, it seems that in the former I get escaped unicode character and
> in the latter UTF-8 ones.


It looks as if somebody is trying to ape the misbegotten behaviour of
MSIE.

In a practical sense there isn't one right answer - there are several
compromises, depending on which browsers support what. But none of
the details here are features of the Perl programming language,
AFAICS.

good luck
 
Reply With Quote
 
Realbot
Guest
Posts: n/a
 
      01-10-2005
Alan J. Flavell wrote:
> On Sat, 8 Jan 2005, Realbot wrote:
>
> Forms submission including characters outside of us-ascii is
> non-trivial, and isn't in itself a Perl problem.
>
> OT: commentary of mine at
> http://ppewww.ph.gla.ac.uk/~flavell/...form-i18n.html


I read it avidly before posting, very well written.

> Until one can get that part sorted out to one's satisfaction, any
> fiddling around that one might do in one's Perl script would be a bit
> pointless, IMHO. And discussion of the web part would be more at home
> on comp.infosystems.www.authoring.cgi (beware the automoderation bot).
>
>
>> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

>
>
> If we assume that the page itself is really coded in utf-8 (note that
> in the event of a dispute, the server's actual HTTP Content-type
> header wins over anything that you might secrete in a meta
> http-equiv), then you can expect current browsers to submit
> utf-8-encoded form data.


I found out that this was the exact problem. Apache installed on all Debian versions is configured with

AddDefaultCharset on

which completely ignores the encoding given in META tag and uses always the default encoding.
In Apache installation under OpenBSD the parameter was not present and so it was correct.
When I removed that nasty parameter everything worked on Debian too...

> In a practical sense there isn't one right answer - there are several
> compromises, depending on which browsers support what. But none of
> the details here are features of the Perl programming language,
> AFAICS.


Now I know!

Thanks a lot.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
DEVELOP THE WINNING EDGE, SMALL DIFFERENCES IN YOUR PERFORMANCE CANLEAD TO LARGE DIFFERENCES IN YOUR RESULTS Home_Job_opportunity C Programming 0 01-14-2009 03:51 PM
DEVELOP THE WINNING EDGE, SMALL DIFFERENCES IN YOUR PERFORMANCE CANLEAD TO LARGE DIFFERENCES IN YOUR RESULTS Home_Job_opportunity C Programming 0 01-08-2009 04:31 PM
Using Asp.net web form and controls .vs html form and inputs Ellie ASP .Net 2 10-30-2008 01:04 PM
Generating Form Inputs HugeBob XML 4 08-07-2006 01:22 PM
assigning values to form inputs ASP .Net 3 08-24-2004 12:16 PM



Advertisments