Velocity Reviews > Perl > Is it ok to change $ENV{'QUERY_STRING'} before "use CGI;" is called..? # Is it ok to change$ENV{'QUERY_STRING'} before "use CGI;" is called..?

Raymundo
Guest
Posts: n/a

 03-04-2007
Dear,

When a web-broswer sends a GET request, I can get "keywords" or
"param"eters using CGI module, as you know:
$q = new CGI;$name = $q->param('name'); However, when browser's request includes multi-byte characters, they can be encoded using UTF-8 or EUC-KR(in Korea, for example) according to the option in the browswer. ("Send URL in UTF-8" in IE, "network.standard-url.encode-utf8" in FF, etc.) At first, I tried to check the value which I got from$q->param() like
this:

$name =$q->param("name");
$name = check_and_convert($name);
....

sub check_and_convert {
# this subroutine guesses the encoding of parameter using
Encode::Guess
# if not UTF-8, it converts the parameter to UTF-8 encoded string and
return it
}

But there are so many parameters and also so many codes using them. I
found that it's almost impossible, or so inconvenient to check
whenever the parameters are fetched.

Second, I tried to "check and convert" $ENV{QUERY_STRING} value before a CGI object is created: # convert QUERY_STRING to UTF-8 here$ENV{QUERY_STRING} = check_and_convert($ENV{QUERY_STRING}); # then create CGI object$q = new CGI;
# I can get name=XXX and XXX is encoded in UTF-8
$name$q->param("name")

In this case, I think, I don't need to check each parameter in any
other following codes... All the values are now UTF-8 encoded.

As far as I had tested, it looked successful. But I'm not sure that
such approach is good(?) and safe. (I think it's somewhat tricky to
change the environment variable in script..)

Is there any other environment variable or anything else that I should
check before "new CGI;" is called? Can I be sure that I'll not lose
any information when I change QUERY_STRING?

Any advices would be appreciated. I'm soryy I'm not good at English.
Raymundo at South Korea.

Guest
Posts: n/a

 03-04-2007
Raymundo <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> Dear,
>
> When a web-broswer sends a GET request, I can get "keywords" or
> "param"eters using CGI module, as you know:
> $q = new CGI; >$name = $q->param('name'); > > However, when browser's request includes multi-byte characters, they > can be encoded using UTF-8 or EUC-KR(in Korea, for example) according > to the option in the browswer. ("Send URL in UTF-8" in IE, > "network.standard-url.encode-utf8" in FF, etc.) > > At first, I tried to check the value which I got from$q->param() like
> this:
>
> $name =$q->param("name");
> $name = check_and_convert($name);
> ...
>
> sub check_and_convert {
> # this subroutine guesses the encoding of parameter using
> Encode::Guess
> # if not UTF-8, it converts the parameter to UTF-8 encoded string and
> return it
> }
>
>
> But there are so many parameters and also so many codes using them. I
> found that it's almost impossible, or so inconvenient to check
> whenever the parameters are fetched.
>
>
> Second, I tried to "check and convert" $ENV{QUERY_STRING} value before > a CGI object is created: > > # convert QUERY_STRING to UTF-8 here >$ENV{QUERY_STRING} = check_and_convert($ENV{QUERY_STRING}); > # then create CGI object >$q = new CGI;
> # I can get name=XXX and XXX is encoded in UTF-8
> $name$q->param("name")
>
> In this case, I think, I don't need to check each parameter in any
> other following codes... All the values are now UTF-8 encoded.
>
> As far as I had tested, it looked successful. But I'm not sure that
> such approach is good(?) and safe. (I think it's somewhat tricky to
> change the environment variable in script..)
>
> Is there any other environment variable or anything else that I should
> check before "new CGI;" is called? Can I be sure that I'll not lose
> any information when I change QUERY_STRING?
>
> Any advices would be appreciated. I'm soryy I'm not good at English.
> Raymundo at South Korea.

You should avoid changing the environment like that. Use the interface
that CGI provides. The ->Vars method gives you a hash that contains
the parameter values keyed by their names. Convert it as follows
(untested):

my $param =$q->Vars;
$_ = check_and_convert($_) for values %$param; This supposes that check_and_convert() leaves null bytes alone. If that isn't sure, use$_ = join( "\0", map check_and_convert( $_), split /\0/,$_) for
values %$param; See perldoc CGI for the significance of null bytes in the values. Either way you will convert all values in one go. Use the converted hash instead of the ->param method for parameter access. Anno Ben Morrow Guest Posts: n/a  03-05-2007 Quoth "Raymundo" <(E-Mail Removed)>: > Dear, > > When a web-broswer sends a GET request, I can get "keywords" or > "param"eters using CGI module, as you know: >$q = new CGI;
> $name =$q->param('name');
>
> However, when browser's request includes multi-byte characters, they
> can be encoded using UTF-8 or EUC-KR(in Korea, for example) according
> to the option in the browswer. ("Send URL in UTF-8" in IE,
> "network.standard-url.encode-utf8" in FF, etc.)
>
> At first, I tried to check the value which I got from $q->param() like > this: > >$name = $q->param("name"); >$name = check_and_convert($name); > ... > > sub check_and_convert { > # this subroutine guesses the encoding of parameter using > Encode::Guess > # if not UTF-8, it converts the parameter to UTF-8 encoded string and > return it > } I would not recommend using Encode::Guess. It isn't safe. For a (detailed) explanation of details of I18N form submission, see http://xrl.us/u68e . Executive summary: serve forms as 'text/html; charset=utf-8' and assume the results are in UTF-8. You should decode *after* getting the values from CGI->param. Ben -- 'Deserve [death]? I daresay he did. Many live that deserve death. And some die that deserve life. Can you give it to them? Then do not be too eager to deal out death in judgement. For even the very wise cannot see all ends.' http://www.velocityreviews.com/forums/(E-Mail Removed) Raymundo Guest Posts: n/a  03-06-2007 Thank you, Anno and Ben. Anno's suggestion: my$param = $q->Vars;$_ = check_and_convert( $_) for values %$param;
works well with GET request. But it makes a problem with POST request
check_and_convert affects the contents of POST request. (If I comment
out check_and_convert line, script works well)

I'm interested in only GET request, because POST request includes
"charset=" field in its header and I can convert, if needed, the
encoding of the contents. So I'm planning to add if clause:
if ($q->request_method() eq "GET") { my$param = $q->Vars;$_ = check_and_convert( $_) for values %$param;
}

Ben, would you please tell me why Encode::Guess isn't safe? Does it
have a security problem?

Anyway,

> For a (detailed) explanation of details of I18N form submission, see
> http://xrl.us/u68e. Executive summary: serve forms as 'text/html;
> charset=utf-8' and assume the results are in UTF-8.

The script does so when it prints forms and receives POST data from
the forms, which seemed to be doing well.

The problem is related to GET request, that is, when URL includes
multi-bytes characters. W3C recommends that multi-bytes chars in URL
should be %-encoded. (http://www.w3.org/TR/REC-html40/interact/
forms.html#form-content-type) But I still want to support when
visitors type URL using their fingers (they would not like to type "%EC
%90.." and when other webpage gives a link to my page not using %-
encoded string.

....

Returing to my first post in this thread... Is it so bad idea to
change the environment variable QUERY_STRING? It solves every problem
change may affect only the script and its child processes, and the
script doesn't fork any child process.

Raymundo at South Korea.

Ben Morrow
Guest
Posts: n/a

 03-06-2007

Quoth "Raymundo" <(E-Mail Removed)>:
>
> Ben, would you please tell me why Encode::Guess isn't safe? Does it
> have a security problem?

Not security, per se; it's just that it's impossible to reliably
distinguish between (say) UTF-8 and ISO8859-1 that just happens to look
like UTF-8.

> > For a (detailed) explanation of details of I18N form submission, see
> > http://xrl.us/u68e. Executive summary: serve forms as 'text/html;
> > charset=utf-8' and assume the results are in UTF-8.

Also, if you read the page linked, you will see that many browsers do...
rather stupid things when the user enters text into a form that is not
representable in the encoding of the page. Since UTF-8 can represent
everything, it doesn't have that problem.

> The script does so when it prints forms and receives POST data from
> the forms, which seemed to be doing well.
>
> The problem is related to GET request, that is, when URL includes
> multi-bytes characters. W3C recommends that multi-bytes chars in URL
> should be %-encoded. (http://www.w3.org/TR/REC-html40/interact/
> forms.html#form-content-type) But I still want to support when
> visitors type URL using their fingers (they would not like to type "%EC
> %90.." and when other webpage gives a link to my page not using %-
> encoded string.

Well... a not-url-encoded URL is invalid. At least Firefox appears to
automatically translate (say) a URL typed into the address bar into its
correct URL-escaped form before submitting it to the server; I don't
know what IE or Konq/Safari or Opera do.

> Returing to my first post in this thread... Is it so bad idea to
> change the environment variable QUERY_STRING? It solves every problem
> change may affect only the script and its child processes, and the
> script doesn't fork any child process.

If you're using CGI.pm to process QUERY_STRING, then you should stick to
that. Messing about is just asking for trouble. What is the problem with
decoding the submitted values afterwards? (It can still be one line or
so of code, if you do it right. See Anno's example.)

Ben

--
I must not fear. Fear is the mind-killer. I will face my fear and
I will let it pass through me. When the fear is gone there will be
nothing. Only I will remain.
(E-Mail Removed) Frank Herbert, 'Dune'

Raymundo
Guest
Posts: n/a

 03-06-2007
oops.. I wrote a reply. It took about 3 hours. (It's too difficult to
me to write in English) I posted it an hour ago but I can't see it
even now. I'm afraid it's lost :'(

In fact, the Perl script that I'm modifying is not my own code. It is
UseModWiki (http://www.usemod.com/cgi-bin/wiki.pl) and I've been
modifying it to use it for my personal homepage. (But I'm just a
novice in Perl so it's not easy

In wiki site, the URL of each page consists of script URL and "the
title of that page", like ".../wiki.pl?Perl". I'm a Korean and my wiki
has many pages whose names are in Korean.

> Well... a not-url-encoded URL is invalid. At least Firefox appears to
> automatically translate (say) a URL typed into the address bar into its
> correct URL-escaped form before submitting it to the server; I don't
> know what IE or Konq/Safari or Opera do.

As you said, multi-byte characters in URL is invalid. I know it :'( So
url-encoded URL is the answer. However, see the following URLs:
1: .../wiki.pl?Linux <- Everyone can know it is the page about "Linux"
2: .../wiki.pl?%EB%A6%AC%EB%88%85%EC%8A%A4 <- Can anyone guess what
3: .../wiki.pl?¸®´ª½º <- (If you can't see the Korean chars, plz see
http://gypark.pe.kr/upload/linux_in_korean.gif ) Everyone who are able
to read Korean can know it is the page about Linux. (I'll type
"LINUX(ko)" for this word from now on)

URL 2 is valid, but its appearance is so.... :-/ And I must give up
the big advantage of wiki, "URL represent the content"

URL 3 is said to be invalid. But I still want to support it. That is,
when someone types that URL in the address bar of a browser, or
someone clicks the link to URL 3 in other site, I want my wiki.pl
script show the proper page, "LINUX(ko)".

Fortunately, web browsers like FF, IE, and Safari convert the URL into
%-encoded form before they submit it, as you said. Therefore, I think,
it's not main issue that URL contains multi-bytes chars, because the
server will receive %-encoded request. The problem is that, as I'd
said in my first article, the %-encoded form of "LINUX(ko)" is not
unique. It can be "%EB%A6%AC%EB%88%85%EC%8A%A4" (UTF-8 sequence) or
"%B8%AE%B4%AA%BD%BA" (EUC-KR, in Korea) The browsers choose which
encoding to use according to the option in them. (for FF,
choose it and even can't know what is chosen explictily, which is the
reason that wiki.pl should "guess".

> > Returing to my first post in this thread... Is it so bad idea to
> > change the environment variable QUERY_STRING? It solves every problem
> > change may affect only the script and its child processes, and the
> > script doesn't fork any child process.

>
> If you're using CGI.pm to process QUERY_STRING, then you should stick to
> that. Messing about is just asking for trouble. What is the problem with
> decoding the submitted values afterwards? (It can still be one line or
> so of code, if you do it right. See Anno's example.)

"The problem with decoding the submitted values afterward" is...
(following are come from my testing results. it may be fixed but I'm
not so expert in Perl)

1) There are hundreds of lines that call "->param()". I don't think
it's good idea to insert so many "guess_and_convert()" after those
lines.

1-1) In fact, those lines actually call "GetParam()" subroutine and
GetParam() calles ->param in it. So it can be a solution to insert
guess_and_convert() in GetParam(). However, GetParam() fetches the
value of a parameter not only from GET request but also from POST
request and even from saved files. For now, I'm not sure it's ok to
modify GetParam(). In addition, it seems to be inefficient to call
convert routine every time a single parameter is fetched.

2) Concering Anno's example, it looks good because it calls convert
routine only once. However, it shows some problem while processing
to debug but failed to find why. I think it is the second best way to
apply that code with additional if-clause: if ($q->request_method() eq "GET") 3) In the original code, there are some lines that access$ENV{QUERY_STRING} directly, without calling CGI functions. I need to
apply "guess_and_convert" to those lines.

So I cling to Q_S like this. As far as I know: (please correct me
if I am wrong)
1) Q_S is related to only GET request. (All the forms in wiki.pl calls
"wiki.pl" without any appending URL query when it submits)

2) Q_S may be in the form of "keywords" or
"param1=value1&param2=value2...". guess_and_convert() will not change
the important characters like "&", "=", "+". It will not change any
other ASCII characters. It will just change the multi-byte chars.
Because those characters have been already encoded by browser, this
change is just the change of the number and the sequence of the "%HH"
runs. There is, I think, no problem when CGI object is created and
initialized using Q_S.

3) Changing Q_S affects only the running script and it's child
process.

4) After I began to test my approach, no problem shown until now. (Of
course, this can't be the proof that it will never make a problem. So

5) Most of all, I expect that I don't need to care about it when the
rest of code is updated. (at least until the browser's behavior change
dramatically or CGI module)

If anyone give me concrete examples of the problem that may appear
when I convert the encoding of Q_S, I'll give up my way immediately...

Raymundo

Ben Morrow
Guest
Posts: n/a

 03-06-2007

Quoth "Raymundo" <(E-Mail Removed)>:
> In fact, the Perl script that I'm modifying is not my own code. It is
> UseModWiki (http://www.usemod.com/cgi-bin/wiki.pl) and I've been
> modifying it to use it for my personal homepage. (But I'm just a
> novice in Perl so it's not easy

I would have been helpful if you'd mentioned this at the start.

> In wiki site, the URL of each page consists of script URL and "the
> title of that page", like ".../wiki.pl?Perl". I'm a Korean and my wiki
> has many pages whose names are in Korean.
>
> > Well... a not-url-encoded URL is invalid. At least Firefox appears to
> > automatically translate (say) a URL typed into the address bar into its
> > correct URL-escaped form before submitting it to the server; I don't
> > know what IE or Konq/Safari or Opera do.

>
> As you said, multi-byte characters in URL is invalid. I know it :'( So
> url-encoded URL is the answer. However, see the following URLs:
> 1: .../wiki.pl?Linux <- Everyone can know it is the page about "Linux"
> 2: .../wiki.pl?%EB%A6%AC%EB%88%85%EC%8A%A4 <- Can anyone guess what
> the title of this page is?? :-/ It's "Linux" in Korean

[ I've stripped the top-bit-set characters: my newsreader appears to
have mangled them ]
> 3: .../wiki.pl? <- (If you can't see the Korean chars, plz see
> http://gypark.pe.kr/upload/linux_in_korean.gif ) Everyone who are able
> to read Korean can know it is the page about Linux. (I'll type
> "LINUX(ko)" for this word from now on)
>
> URL 2 is valid, but its appearance is so.... :-/ And I must give up
> the big advantage of wiki, "URL represent the content"
>
> URL 3 is said to be invalid. But I still want to support it. That is,
> when someone types that URL in the address bar of a browser, or
> someone clicks the link to URL 3 in other site,

Is it common practice for people to write links to URLs with multibyte
chars in them? Since the actual link itself is not user-visible (the
text of the link is, but that's quite different) there's no reason not
to encode it correctly, is there? Of course, if it *is* common practice,
you may well want to handle it (if you can), regardless of its
incorrectness.

> I want my wiki.pl script show the proper page, "LINUX(ko)".

Firstly, let me say that I entirely sympathise with this desire . It
is a major failing in the design of URLs that they are so unfriendly to
people whose native language is not English.

That said, I do not think you can win here . At least my copy of FF
will convert .../wiki.pl?KOREAN_CHARS into %-encodings *in the address
bar* before it submits the URL. IE6 appears to do the opposite: that is,
AFAICT it both displays the URL as typed in the address bar and actually
submits a multi-byte URL to the server. Your Q_S munging will need to be
quite subtle, to handle cases like .../wiki.pl?foo%3bbar, and correctly
distinguish them from .../wiki.pl?foo;bar, which presumably means
something quite different.

> Fortunately, web browsers like FF, IE, and Safari convert the URL into
> %-encoded form before they submit it, as you said. Therefore, I think,
> it's not main issue that URL contains multi-bytes chars, because the
> server will receive %-encoded request. The problem is that, as I'd
> said in my first article, the %-encoded form of "LINUX(ko)" is not
> unique. It can be "%EB%A6%AC%EB%88%85%EC%8A%A4" (UTF-8 sequence) or
> "%B8%AE%B4%AA%BD%BA" (EUC-KR, in Korea) The browsers choose which
> encoding to use according to the option in them. (for FF,
> "network.standard-url.encode-utf8" in "about:config") Server can't
> choose it and even can't know what is chosen explictily, which is the
> reason that wiki.pl should "guess".

OK, so you're in an impossible situation and you're trying to do the
best you can. Encode::Guess may be your best option here .

> > > Returing to my first post in this thread... Is it so bad idea to
> > > change the environment variable QUERY_STRING? It solves every problem
> > > change may affect only the script and its child processes, and the
> > > script doesn't fork any child process.

> >
> > If you're using CGI.pm to process QUERY_STRING, then you should stick to
> > that. Messing about is just asking for trouble. What is the problem with
> > decoding the submitted values afterwards? (It can still be one line or
> > so of code, if you do it right. See Anno's example.)

>
> "The problem with decoding the submitted values afterward" is...
> (following are come from my testing results. it may be fixed but I'm
> not so expert in Perl)
>
> 1) There are hundreds of lines that call "->param()". I don't think
> it's good idea to insert so many "guess_and_convert()" after those
> lines.
>
> 1-1) In fact, those lines actually call "GetParam()" subroutine and
> GetParam() calles ->param in it. So it can be a solution to insert
> guess_and_convert() in GetParam(). However, GetParam() fetches the
> value of a parameter not only from GET request but also from POST
> request and even from saved files. For now, I'm not sure it's ok to
> modify GetParam(). In addition, it seems to be inefficient to call
> convert routine every time a single parameter is fetched.

I would say the Right Answer in this case is to write your own GetParam
sub which calls the original GetParam, and then applies your
Encode::Guess logic. If the script isn't changing the values of the
paramters, only accessing them, you can avoid the multiple guessing by
using the Memoize module on your sub.

> 2) Concering Anno's example, it looks good because it calls convert
> routine only once. However, it shows some problem while processing
> to debug but failed to find why. I think it is the second best way to
> apply that code with additional if-clause: if ($q->request_method() eq > "GET") What sort of problems? If your guessing routine is guessing incorrectly for some of you real data, this indicates it's not safe to use it anyway. > 3) In the original code, there are some lines that access >$ENV{QUERY_STRING} directly, without calling CGI functions. I need to
> apply "guess_and_convert" to those lines.

Well, that's just evil . My standard recommendation at this point
would be to throw out whatever it is you're using and find something
that's decently written.

> So I cling to Q_S like this. As far as I know: (please correct me
> if I am wrong)
> 1) Q_S is related to only GET request. (All the forms in wiki.pl calls
> "wiki.pl" without any appending URL query when it submits)

You may be correct in this case that your wiki.pl only uses a query
string for GET requests. It is certainly possible to POST to a URL with
a query string.

> 2) Q_S may be in the form of "keywords" or
> "param1=value1&param2=value2...". guess_and_convert() will not change
> the important characters like "&", "=", "+". It will not change any
> other ASCII characters. It will just change the multi-byte chars.
> Because those characters have been already encoded by browser, this
> change is just the change of the number and the sequence of the "%HH"
> runs. There is, I think, no problem when CGI object is created and
> initialized using Q_S.

Err... OK. You must make sure you alter Q_S *before* any CGI.pm calls
are mode, though.

> 3) Changing Q_S affects only the running script and it's child
> process.

I don't know what happens under mod_perl, if you ever move your script
to that envionment. Under standard CGI, this is certainly true.

It seems to me that you are trying to take a piece of rather
badly-written code you don't really understand, and alter it do do
something that isn't really possible anyway. Given that you're in that
much of a mess, a simple edit of $ENV{QUERY_STRING} may well be the best way out . Ben -- All persons, living or dead, are entirely coincidental. (E-Mail Removed) Kurt Vonnegut Raymundo Guest Posts: n/a  03-07-2007 > > 3: .../wiki.pl? <- (If you can't see the Korean chars, plz see > >http://gypark.pe.kr/upload/linux_in_korean.gif) Everyone who are able > > to read Korean can know it is the page about Linux. (I'll type > > "LINUX(ko)" for this word from now on) > > > URL 2 is valid, but its appearance is so.... :-/ And I must give up > > the big advantage of wiki, "URL represent the content" > > > URL 3 is said to be invalid. But I still want to support it. That is, > > when someone types that URL in the address bar of a browser, or > > someone clicks the link to URL 3 in other site, > > Is it common practice for people to write links to URLs with multibyte > chars in them? Since the actual link itself is not user-visible (the > text of the link is, but that's quite different) there's no reason not > to encode it correctly, is there? Of course, if it *is* common practice, > you may well want to handle it (if you can), regardless of its > incorrectness. Do you mean this case? [a href="actual link itself"] text of the link [/a] (I replaced "less than" and "greater than" signs with brackets, so that any smart(?) news-reader doesn't process it as real link) Yes, you're right. In that case the URL is hidden to user, so it doesn't matter that URL is "...%EB%A6". And this is very typical in plain html documents. However many recent CGI tools, like blog(MovableType, TatterTools, etc) and almost (as far as I know) wikis, provide the feature of "auto- linking"(say). Someone post an article in plain text to his/her blog, then the blog tool looks for URL pattern in the text, converts it to "a href" links, and print it in its html output. In this case, "text of the link" is equal to "actual link". Another example is, wiki provides the concept of "interwiki" for a convenient linking. That is, when I submit the text: UseMod:UseModWiki Google:UseModWiki (even though google is not a wiki..) In html output, they are converted automatically to the following links, respectively: [a href="http://www.usemod.com/cgi-bin/wiki.pl? UseModWiki"]UseMod:UseModWiki[/a] [a href="http://www.google.com/search?q=UseModWiki"]Google:UseModWiki[/ a] (The mapping table, between a interwiki name like "Google:" and the real URL like "http://www.google.com/search?q=", is stored in a file in the server) In this case, someone may want to put a link to my page in his wiki. Then "Raymundo:LINUX(ko)" is much (x 100) easier for him and more understandable to other visitors than "Raymundo:%EB%A6%AC%EB%88%85%EC %8A%A4". I've already modified my wiki, so that it encodes the actual link when it processes interwiki. But it's impossible to force every developers of all wikis in the world. Anyway this type of links can be common practice nowadays, in my opinion. > > I want my wiki.pl script show the proper page, "LINUX(ko)". > > Firstly, let me say that I entirely sympathise with this desire . It > is a major failing in the design of URLs that they are so unfriendly to > people whose native language is not English. > > That said, I do not think you can win here . At least my copy of FF > will convert .../wiki.pl?KOREAN_CHARS into %-encodings *in the address > bar* before it submits the URL. IE6 appears to do the opposite: that is, > AFAICT it both displays the URL as typed in the address bar and actually > submits a multi-byte URL to the server. Your Q_S munging will need to be > quite subtle, to handle cases like .../wiki.pl?foo%3bbar, and correctly > distinguish them from .../wiki.pl?foo;bar, which presumably means > something quite different. I agree IE6 acts differently (and strange). This is the access_log of apache server when a request URL includes "wiki/LINUX(ko)": "GET /wiki/\xb8\xae\xb4\xaa\xbd\xba" <- IE, EUC-KR "GET /wiki/%B8%AE%B4%AA%BD%BA <- FF, EUC-KR "GET /wiki/%EB%A6%AC%EB%88%85%EC%8A%A4" <- IE and FF, UTF-8 I don't know why IE's requests are in diffrent forms as the encoding differs. It does url-encode if its option is set to use UTF-8 request, but it doesn't if the option is unchecked. But as fas as I have tested, my wiki.pl showed no difference between when a request came from FF and from IE. I'll consider what you mention with the example ";" and "%3b" and test more. > > 2) Concering Anno's example, it looks good because it calls convert > > routine only once. However, it shows some problem while processing > > POST request, like file uploading, receiving trackback, etc. I tried > > to debug but failed to find why. I think it is the second best way to > > apply that code with additional if-clause: if ($q->request_method() eq
> > "GET")

>
> What sort of problems? If your guessing routine is guessing incorrectly
> for some of you real data, this indicates it's not safe to use it
> anyway.

I agree and I tried to find the exact problem and the reason of it.

I'll describe here what I found until now:

At first, Anno's code was to change the values of CGI->Vars hash:

$q = new CGI; # convert my$param = $q->Vars;$_ = check_and_convert($_) for values %$param;

file. I added it myself about two years ago, getting codes from
examples in WWW.

$q->start_form('post',"$ScriptName", 'multipart/form-data') . "\n";
"<input type='hidden' name='upload' value='1'>" . "\n";
$q->filefield("upload_file","",60,80) . "\n"; # <-- file selection field "&nbsp;&nbsp;" . "\n"; print$q->submit('Upload') . "\n";
$q->endform User is supposed to click "open" button, choose a file in a file selection window, and click "Upload" button to submit. To save the file in server, the following code is used:$file = $q->upload('upload_file'); open(FILE, ">file_in_local_disk_of_server"); binmode FILE; while (<$file>) {
print FILE $_; # read from client's file and write to server's disk } close(FILE); I put "die;" for check:$file = $q->upload('upload_file'); die "[$file]"; # here
open(FILE, ">file_in_local_disk_of_server");

\text.txt]". But when Vars is converted, script dies printing "[]".
That means $file lost the information that it's a file handle. How can I keep it as valid file handle? Even without converting, I found that any write access to$file causes the same problem.

my $param =$q->Vars;
$$param{'upload_file'} .= ""; # no other string appended, but it lose file handle or even$$param{'upload_file'} = $$param{'upload_file'}; # it also lose file handle!!! :-O So there is nothing that check_and_convert() can do. Modifying "- >Vars" itself cause problem. If I have to choose this approach anyway, I can do like this: my param = q->Vars; foreach (keys %param) {$$param{$_} = guess_and_convert($$param{$_}) if ($_ ne "upload_file"); # don't try to assign param{'upload_file'} } But there is no confirm that all other parameters are ordinary strings. > > So I cling to Q_S like this. As far as I know: (please correct me > > if I am wrong) > > 1) Q_S is related to only GET request. (All the forms in wiki.pl calls > > "wiki.pl" without any appending URL query when it submits) > > You may be correct in this case that your wiki.pl only uses a query > string for GET requests. It is certainly possible to POST to a URL with > a query string. Yes, I have to consider it in the future. And I still believe it doesn't matter, because "query string" in URL is anyway just a string which can't have any invisible information (like$file in above).

> > 2) Q_S may be in the form of "keywords" or
> > "param1=value1&param2=value2...". guess_and_convert() will not change
> > the important characters like "&", "=", "+". It will not change any
> > other ASCII characters. It will just change the multi-byte chars.
> > Because those characters have been already encoded by browser, this
> > change is just the change of the number and the sequence of the "%HH"
> > runs. There is, I think, no problem when CGI object is created and
> > initialized using Q_S.

>
> Err... OK. You must make sure you alter Q_S *before* any CGI.pm calls
> are mode, though.

I agree.

> > 3) Changing Q_S affects only the running script and it's child
> > process.

>
> I don't know what happens under mod_perl, if you ever move your script
> to that envionment. Under standard CGI, this is certainly true.

That's the type of answer I want! I've never thought of mod_perl or
anything like it. (Actually I have no idea of what it is.)

> It seems to me that you are trying to take a piece of rather
> badly-written code you don't really understand, and alter it do do
> something that isn't really possible anyway. Given that you're in that
> much of a mess, a simple edit of \$ENV{QUERY_STRING} may well be the best
> way out .
>
> Ben
>

I plan to check and test more things and choose what to do.

I thank you for your constant help. Have a nice day!

Raymundo at South Korea.

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is OffTrackbacks are On Pingbacks are On Refbacks are Off Forum Rules

 Similar Threads Thread Thread Starter Forum Replies Last Post Tony Neville NZ Computing 7 09-22-2006 01:02 PM neilmcguigan@gmail.com ASP .Net 3 03-27-2006 07:51 PM matt r DVD Video 2 11-25-2004 08:37 AM Marek Kurowski C Programming 1 08-18-2004 06:41 PM Kevin Spencer ASP .Net 0 07-07-2003 08:46 PM