Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > How to use 8bit character sets?

Reply
Thread Tools

How to use 8bit character sets?

 
 
copx
Guest
Posts: n/a
 
      06-12-2005
For some reason Python (on Windows) doesn't use the system's default
character set and that's a serious problem for me.
I need to process German textfiles (containing umlauts and other > 7bit
ASCII characters) and generally work with strings which need to be processed
using the local encoding (I need to display the text using a Tk-based GUI
for example). The only solution I managed to find was converting between
unicode and latin-1 all the time (the textfiles aren't unicode, the output
of the program isn't supposed to be unicode either). Everything worked fine
until I tried to run the program on a Windows 9x machine.. It seems that
Python on Win9x doesn't really support unicode (IIRC Win9x doesn't have real
unicode support so that's not suprising).
Is it possible to tell Python to use an 8bit charset (latin-1 in my case)
for textfile and string processing by default?

copx


 
Reply With Quote
 
 
 
 
Chris Curvey
Guest
Posts: n/a
 
      06-13-2005
Check out sitecustomize.py.

http://diveintopython.org/xml_processing/unicode.html

 
Reply With Quote
 
 
 
 
copx
Guest
Posts: n/a
 
      06-13-2005

"Chris Curvey" <(E-Mail Removed)> schrieb im Newsbeitrag
news:(E-Mail Removed) oups.com...
> Check out sitecustomize.py.
>
> http://diveintopython.org/xml_processing/unicode.html


Thanks but I'm looking for a way to do this on application level (i.e. I
want my app to run in an unmodified interpreter enviroment).

copx



 
Reply With Quote
 
John Machin
Guest
Posts: n/a
 
      06-13-2005
copx wrote:
> For some reason Python (on Windows) doesn't use the system's default
> character set and that's a serious problem for me.
> I need to process German textfiles (containing umlauts and other > 7bit
> ASCII characters) and generally work with strings which need to be processed
> using the local encoding (I need to display the text using a Tk-based GUI
> for example). The only solution I managed to find was converting between
> unicode and latin-1 all the time (the textfiles aren't unicode, the output
> of the program isn't supposed to be unicode either). Everything worked fine
> until I tried to run the program on a Windows 9x machine.. It seems that
> Python on Win9x doesn't really support unicode (IIRC Win9x doesn't have real
> unicode support so that's not suprising).
> Is it possible to tell Python to use an 8bit charset (latin-1 in my case)
> for textfile and string processing by default?
>
> copx



1. Your description of your problem is extremely vague. If you were to
supply a minimal script that "works" [on what platform?? what version of
Python??], with a description of what you understand by "works", and
what happens differently when you run that script on a Win9x box [for
what value(s) of x?? what version of Python??], we might be able to help
you. N.B. somewhere near the top of the script you should have something
like:

import sys
print "Python version:", sys.version
print "platform:", sys.platform
print "default encoding:", sys.getdefaultencoding()
try:
print "Windows version:", sys.getwindowsversion()
except AttributeError:
print "sys.getwindowsversion not available"

2. You should read this:

http://www.catb.org/~esr/faqs/smart-questions.html

3. You should not rely on a crutch like a default encoding, especially
one obtained by a kludge like sitecustomize.py. If your app expects to
receive data in encoding x and send data in encoding y, these facts are
properties of the application and the data, NOT the box you are running
on. If you had a requirement to read MacCyrillic from a Classic Mac and
write KOI8 for consumption on a Windows PC, you should be able to do it
on a SPARC Solaris box in Timbuktu or Walla Walla, Wa., without having
to fiddle with site-wide configuration.

4. AFAIK, support for Unicode is provided by Python with no assistance
from the operating system. The multitudinous deficiencies in Win9x
should have no bearing on the problem. Have you tried to run your
program on a Win2K or WinXP box?

HTH,

John
 
Reply With Quote
 
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
Guest
Posts: n/a
 
      06-13-2005
copx wrote:
> For some reason Python (on Windows) doesn't use the system's default
> character set and that's a serious problem for me.


I very much doubt this statement: Python does "use" the system's default
character set on Windows. What makes you think it doesn't?

> Is it possible to tell Python to use an 8bit charset (latin-1 in my case)
> for textfile and string processing by default?


That is the default.

Regards,
Martin
 
Reply With Quote
 
John Roth
Guest
Posts: n/a
 
      06-13-2005
""Martin v. L÷wis"" <(E-Mail Removed)> wrote in message
news:42ad11a7$0$24953$(E-Mail Removed)...
> copx wrote:
>> For some reason Python (on Windows) doesn't use the system's default
>> character set and that's a serious problem for me.

>
> I very much doubt this statement: Python does "use" the system's default
> character set on Windows. What makes you think it doesn't?
>
>> Is it possible to tell Python to use an 8bit charset (latin-1 in my case)
>> for textfile and string processing by default?

>
> That is the default.


As far as I can tell, there are actually two defaults, which tends
to confuse things. One is used whenever a unicode to 8-bit
conversion is needed on output to stdout, stderr or similar;
that's usually Latin-1 (or whatever the installation has set up.)
The other is used whenever the unicode to 8-bit conversion
doesn't have a context - that's usually Ascii-7.

John Roth

>
> Regards,
> Martin


 
Reply With Quote
 
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
Guest
Posts: n/a
 
      06-13-2005
John Roth wrote:
>> That is the default.

>
>
> As far as I can tell, there are actually two defaults, which tends
> to confuse things.


Notice that there are two defaults already in the operating system:
Windows has the notion of the "ANSI code page" and the "OEM code
page", which are used in different contexts.

> One is used whenever a unicode to 8-bit
> conversion is needed on output to stdout, stderr or similar;
> that's usually Latin-1 (or whatever the installation has set up.)


You mean, in Python? No, this is not how it works. On output
of 8-bit strings to stdout, no conversion is ever performed:
the byte strings are written to stdout as-is.

> The other is used whenever the unicode to 8-bit conversion
> doesn't have a context - that's usually Ascii-7.


Again, you seem to be talking about Unicode conversions -
it's not clear that the OP is actually interested in
Unicode conversion in the first place.

Regards,
Martin
 
Reply With Quote
 
John Roth
Guest
Posts: n/a
 
      06-13-2005
""Martin v. L÷wis"" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> John Roth wrote:
>>> That is the default.

>>
>>
>> As far as I can tell, there are actually two defaults, which tends
>> to confuse things.

>
> Notice that there are two defaults already in the operating system:
> Windows has the notion of the "ANSI code page" and the "OEM code
> page", which are used in different contexts.
>
>> One is used whenever a unicode to 8-bit
>> conversion is needed on output to stdout, stderr or similar;
>> that's usually Latin-1 (or whatever the installation has set up.)

>
> You mean, in Python? No, this is not how it works. On output
> of 8-bit strings to stdout, no conversion is ever performed:
> the byte strings are written to stdout as-is.


That's true, but I was talking about outputing unicode strings,
not 8-bit strings. As you say below, the OP may not have
been talking about that.

>> The other is used whenever the unicode to 8-bit conversion
>> doesn't have a context - that's usually Ascii-7.

>
> Again, you seem to be talking about Unicode conversions -
> it's not clear that the OP is actually interested in
> Unicode conversion in the first place.
>
> Regards,
> Martin


John Roth

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
8bit * 8bit pipelined multiplier humble VHDL 0 10-28-2006 06:16 PM
8bit counter to 7seg jo.spreutels@gmail.com VHDL 5 06-12-2005 08:52 AM
Convert a monochrome (1bit) image into a grayscale (8bit) one =?ISO-8859-1?Q?Christian_H=F6ntsch-Rode?= Java 6 02-03-2005 11:22 AM
8bit to 7bit numbers marko Perl 0 08-23-2003 10:50 AM



Advertisments