Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: CGI and Unicode

Thread Tools

Re: CGI and Unicode

Jeremy Yallop
Posts: n/a
Jim Hefferon wrote:
> I have been struggling with getting Unicode out of Python's cgi
> module. I have a small script illustrating the problem at the bottom
> but first I need to explain.


> But when I ask what is the type of the variable that I get from
> the cgi module, it comes out as StringType, not UnicodeType. My
> browser is Galeon on the latest Debian and I've also tested it
> with IE on NT.
> What am I missing?

The problem, I think, is the lack of consistency amongst browsers in
indicating the encoding of the submitted data. For instance, when
responding to the form in your script, Opera includes a "Content-type"
header containing:


whereas the "Content-type" header sent by Mozilla (and I suspect most
other browsers[0]) doesn't indicate the charset:


If all browsers always included did this, then the cgi module could
reliably detect the data encoding and store the parameters as Unicode
strings when appropriate. As it stands, there's usually insufficient
information for cgi to detect when Unicode is being sent or what the
encoding is. If /you/ can determine by other means that the submitted
data is UTF-8 encoded (which is probably the case if the form was part
of a UTF-8 encoded document) there's nothing stopping you from
decoding it yourself (using codecs.utf_8_decode or unicode(string,
'utf-8'), for example).

Oh, one last thing (which you probably know, but just in case...): you
can access the submitted headers through the environment variables of
the CGI process.

import os
for key, value in os.environ.items():
print '<p>%30s : %s</p>' % (key, value)

Hope this helps,


[0] A quick skim of rfc 1867 seems to indicate that the charset clause
isn't standard.
Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Python unicode utf-8 characters and MySQL unicode utf-8 characters Grzegorz Śliwiński Python 2 01-19-2011 07:31 AM
Help for Unicode char and Unicode char based string in Ruby Chirag Mistry Ruby 6 02-08-2008 12:45 PM
cgi and cgi-bin zippy Perl Misc 5 02-02-2005 01:46 AM
Re: CGI and Unicode Andrew Clover Python 1 06-24-2003 03:03 AM
Re: CGI and Unicode Gilles Lenfant Python 0 06-23-2003 08:37 PM