Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > CGI query string encoding issue...

Reply
Thread Tools

CGI query string encoding issue...

 
 
howa
Guest
Posts: n/a
 
      03-04-2009
Hello, consider my simple cgi program below:

#=======
#!/usr/bin/perl
use strict;

use CGI;
my $q = new CGI;
my $s = $q->param("s");
print $q->header( -type => "text/html" );

print utf8::valid ($s);
#=======

Then I call, e.g.

http://www.example.com/cgi-bin/test.cgi?s=abc (print 1, ok)
http://www.example.com/cgi-bin/test.cgi?s=$BCfJ8(B (also print 1, but my
paramater s is BIG5 traditional Chinese encoding, not utf8!)

So now I am really confused with the encoding stuff... Can anyone
modify my program above ... so not to print 1 if my $s contains non-
UTF8 characters?

Thanks.

 
Reply With Quote
 
 
 
 
howa
Guest
Posts: n/a
 
      03-05-2009
Hi,

On Mar 4, 11:41 pm, Chris Mattern <sys...@sumire.gwu.edu> wrote:
> Not really, because there's no way to look at an arbitrary bit string and
> know that it's BIG5, or utf8, or whatever. You said parameter s is a BIG5
> string in the second case--but it's also a utf8 string: "^[$BCfJ8^[(B".
>



A simpler example, using encoded string, for the url:
http://www.example.com/cgi-bin/test.cgi?s=$BCf(B

http://www.example.com/cgi-bin/test.cgi?s=%A4%A4 (BIG-5, 0xa440 to
0xc67e, http://en.wikipedia.org/wiki/Big5)
http://www.example.com/cgi-bin/test.cgi?s=%E4%B8%AD (UTF-8, see
variable $valid_utf8_regexp at http://cpansearch.perl.org/src/MARKF...b/Test/utf8.pm)


As you can see, BIG5 char %A4%A4 is definitely out of UTF8 range but
utf8::valid() return 1

 
Reply With Quote
 
 
 
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      03-05-2009
howa wrote:
> Hello, consider my simple cgi program below:
>
> #=======
> #!/usr/bin/perl
> use strict;
>
> use CGI;
> my $q = new CGI;
> my $s = $q->param("s");
> print $q->header( -type => "text/html" );
>
> print utf8::valid ($s);
> #=======
>
> Then I call, e.g.
>
> http://www.example.com/cgi-bin/test.cgi?s=abc (print 1, ok)
> http://www.example.com/cgi-bin/test.cgi?s=$BCfJ8(B (also print 1, but my
> paramater s is BIG5 traditional Chinese encoding, not utf8!)


I'm not sure about the meaning of utf8::valid (), but the docs
recommends the use of utf8::is_utf8().

Does the below code make sense to you?

$ cat test.pl
use Encode;
$big5_uriencoded = '%A4%A4';
( $big5_bytes = $big5_uriencoded ) =~ s/%(..)/chr(hex $1)/eg;
print '$big5_bytes ', utf8::is_utf8($big5_bytes) ? 'is' : 'is not',
" in UTF-8 internally.\n";
$string = decode('Big5', $big5_bytes);
print '$string ', utf8::is_utf8($string) ? 'is' : 'is not',
" in UTF-8 internally.\n\n";

$ perl test.pl
$big5_bytes is not in UTF-8 internally.
$string is in UTF-8 internally.

I believe it tells us that it's not possible to encode $big5_bytes
directly to UTF-8, while that's possible with $string.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
Eric Pozharski
Guest
Posts: n/a
 
      03-06-2009
On 2009-03-05, Gunnar Hjalmarsson <> wrote:
*SKIP*
> I'm not sure about the meaning of utf8::valid (), but the docs
> recommends the use of utf8::is_utf8().


Those just do different tests (or are supposed to do). But (and) see
below (there're some "smart defaults" on the road):

perl -wle '
#use encoding 'utf8';
@x = ( qq|\x{DF}\x{0100}|, q|a|, qq|\x{DF}|, qq|\x{0100}| );
foreach my $y (@x) {
printf qq|valid (%i) - is (%i) - |, utf8::valid($y), utf8::is_utf8($y);
print $y;
utf8::encode($y);
printf qq|valid (%i) + is (%i) + |, utf8::valid($y), utf8::is_utf8($y);
print $y;
utf8::decode($y);
printf qq|valid (%i) / is (%i) / |, utf8::valid($y), utf8::is_utf8($y);
print $y;
}
'
Wide character in print at -e line 6.
valid (1) - is (1) - ßĀ
valid (1) + is (0) + ßĀ
Wide character in print at -e line 12.
valid (1) / is (1) / ßĀ
valid (1) - is (0) - a
valid (1) + is (0) + a
valid (1) / is (0) / a
valid (1) - is (0) - �
valid (1) + is (0) + ß
valid (1) / is (1) / �
Wide character in print at -e line 6.
valid (1) - is (1) - Ā
valid (1) + is (0) + Ā
Wide character in print at -e line 12.
valid (1) / is (1) / Ā

While with C<use encoding> uncommented (output only):

valid (1) - is (1) - ßĀ
valid (1) + is (0) + ÃÄ
valid (1) / is (1) / ßĀ
valid (1) - is (1) - a
valid (1) + is (0) + a
valid (1) / is (0) / a
valid (1) - is (1) - �
valid (1) + is (0) + �
valid (1) / is (1) / �
valid (1) - is (1) - Ā
valid (1) + is (0) + Ä
valid (1) / is (1) / Ā

*CUT*

p.s. I'm not sure how all that would go out of slrn.

p.p.s. Would some kind perlist to look at B<utf8::valid> code, please?

--
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
how to convert url with query string to url without query string nick Javascript 1 02-13-2011 11:20 PM
CGI (read multipart form): Accept-Charset encoding error (CGI::InvalidEncoding) Stefan Fischer Ruby 2 02-23-2010 08:17 AM
CGI - How to use upload_hook and query the query string ? roadrunner Perl Misc 1 02-08-2006 01:50 AM
Encoding Query String Wayne Wengert ASP .Net 4 07-06-2005 11:22 PM
query string encoding/decoding =?Utf-8?B?TWFyaw==?= ASP .Net 7 04-05-2004 04:02 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57