Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Unicode conversion in 'print'

Reply
Thread Tools

Unicode conversion in 'print'

 
 
Ricardo Bugalho
Guest
Posts: n/a
 
      01-13-2005
Hello,
I'm using Python 2.3.4 and I noticed that, when stdout is a terminal, the
'print' statement converts Unicode strings into the encoding defined by
the locales instead of the one returned by sys.getdefaultencoding().
However, I can't find any references to it. Anyone knows where it's
descrbed?

Example:

!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys, locale

print 'Python encoding:', sys.getdefaultencoding()
print 'System encoding:', locale.getpreferredencoding()
print 'Test string: ', u'Olá mundo'


If stdout is a terminal, works fine
$ python x.py
Python encoding: ascii
System encoding: UTF-8
Test string: Olá mundo

If I redirect the output to a file, raises an UnicodeEncodeError exception
$ python x.py > x.txt
Traceback (most recent call last):
File "x.py", line 8, in ?
print 'Test string: ', u'Olá mundo'
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 2: ordinal not in range(12


--
Ricardo

 
Reply With Quote
 
 
 
 
Serge Orlov
Guest
Posts: n/a
 
      01-13-2005
Ricardo Bugalho wrote:
> Hello,
> I'm using Python 2.3.4 and I noticed that, when stdout is a

terminal,
> the 'print' statement converts Unicode strings into the encoding
> defined by the locales instead of the one returned by
> sys.getdefaultencoding().


Sure. It uses the encoding of you console. Here is explanation why it
uses locale to get the encoding of console:
http://www.python.org/moin/PrintFails

> However, I can't find any references to it. Anyone knows where it's
> descrbed?


I've just wrote about it here:
http://www.python.org/moin/DefaultEncoding

>
> Example:
>
> !/usr/bin/env python
> # -*- coding: utf-8 -*-
>
> import sys, locale
>
> print 'Python encoding:', sys.getdefaultencoding()
> print 'System encoding:', locale.getpreferredencoding()
> print 'Test string: ', u'Ol mundo'
>
>
> If stdout is a terminal, works fine
> $ python x.py
> Python encoding: ascii
> System encoding: UTF-8
> Test string: Ol mundo
>
> If I redirect the output to a file, raises an UnicodeEncodeError

exception
> $ python x.py > x.txt
> Traceback (most recent call last):
> File "x.py", line 8, in ?
> print 'Test string: ', u'Ol mundo'
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in

position 2: ordinal not in range(12
>


http://www.python.org/moin/ShellRedirectionFails

Feel free to reply here if something is not clear, corrections in wiki
are also welcome.

Serge.

 
Reply With Quote
 
 
 
 
Ricardo Bugalho
Guest
Posts: n/a
 
      01-14-2005
Hi,
thanks for the information. But what I was really looking for was
informaion on when and why Python started doing it (previously, it always
used sys.getdefaultencoding())) and why it was done only for 'print' when
stdout is a terminal instead of always.

On Thu, 13 Jan 2005 14:33:20 -0800, Serge Orlov wrote:

> Sure. It uses the encoding of you console. Here is explanation why it uses
> locale to get the encoding of console:
> http://www.python.org/moin/PrintFails
>

--
Ricardo

 
Reply With Quote
 
Serge Orlov
Guest
Posts: n/a
 
      01-14-2005
Ricardo Bugalho wrote:
> Hi,
> thanks for the information. But what I was really looking for was
> informaion on when and why Python started doing it (previously, it
> always used sys.getdefaultencoding()))


I don't have access to any other version except 2.2 at the moment but I
believe it happened between 2.2 and 2.3 for Windows and UNIX terminals.
On other unsupported terminals I suspect sys.getdefaultencoding is
still used. The reason for the change is proper support of unicode
input/output.


> and why it was done only for 'print' when
> stdout is a terminal instead of always.


The real question is why not *never* use sys.getdefaultencoding()
for printing. If you leave sys.getdefaultencoding() at Python default
value ('ascii') you won't need to worry about it <wink>
sys.getdefaultencoding() is a temporary measure for big projects to
use within one Python version.

Serge.

 
Reply With Quote
 
=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=
Guest
Posts: n/a
 
      01-14-2005
Ricardo Bugalho wrote:
> thanks for the information. But what I was really looking for was
> informaion on when and why Python started doing it (previously, it always
> used sys.getdefaultencoding())) and why it was done only for 'print' when
> stdout is a terminal instead of always.


It does that since 2.2, in response to many complains that you cannot
print a Unicode string in interactive mode, unless the Unicode string
contains only ASCII characters. It does that only if sys.stdout is
a real terminal, because otherwise it is not possible to determine
what the encoding of sys.stdout is.

Regards,
Martin
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[unicode] inconvenient unicode conversion of non-string arguments Holger Joukl Python 5 12-13-2006 10:10 PM
os.lisdir, gets unicode, returns unicode... USUALLY?!?!? gabor Python 13 11-18-2006 09:23 AM
Unicode digit to unicode string Gabriele *darkbard* Farina Python 2 05-16-2006 01:15 PM
unicode wrap unicode object? ygao Python 6 04-08-2006 09:54 AM
Unicode + jsp + mysql + tomcat = unicode still not displaying Robert Mark Bram Java 0 09-28-2003 05:37 AM



Advertisments