Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > UTF-8 output problems

Reply
Thread Tools

UTF-8 output problems

 
 
Michael B. Trausch
Guest
Posts: n/a
 
      03-10-2007
I am having a slight problem with UTF-8 output with Python. I have the
following program:

x = 0

while x < 0x4000:
print u"This is Unicode code point %d (0x%x): %s" % (x, x,
unichr(x))
x += 1

This program works perfectly when run directly:

mbt@pepper:~/tmp$ python test.py
This is Unicode code point 0 (0x0):
This is Unicode code point 1 (0x1):
This is Unicode code point 2 (0x2):
This is Unicode code point 3 (0x3):
This is Unicode code point 4 (0x4):
This is Unicode code point 5 (0x5):
This is Unicode code point 6 (0x6):
This is Unicode code point 7 (0x7):
This is Unicode code point 8 (0x:
This is Unicode code point 9 (0x9):
This is Unicode code point 10 (0xa):
(... continued)

However, when I attempt to redirect the output to a file:

mbt@pepper:~/tmp$ python test.py >f
Traceback (most recent call last):
File "test.py", line 6, in <module>
print u"This is Unicode code point %d (0x%x): %s" % (x, x,
unichr(x))
UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in
position 39: ordinal not in range(12

This is slightly confusing to me. The output goes all the way to the
end of the program when it is not redirected. Why is Python treating
the situation differently when the output is redirected? This failure
occurs for all redirection, by the way: >, >>, 1>2, pipes, and so forth.

Any ideas?

— Mike

--
Michael B. Trausch
http://www.velocityreviews.com/forums/(E-Mail Removed)
Phone: (404) 592-5746
Jabber IM:
(E-Mail Removed)
(E-Mail Removed)
Demand Freedom! Use open and free protocols, standards, and software!

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQBF8gW+0kE/IBnFmjARAg4SAJ0RBrk/+W1udAMJXVGN1ev5Cid1MwCePLEj
N/AcFNwgm9mgYtP61Z9HYs0=
=w41X
-----END PGP SIGNATURE-----

 
Reply With Quote
 
 
 
 
Marc 'BlackJack' Rintsch
Guest
Posts: n/a
 
      03-10-2007
In <(E-Mail Removed)>, Michael B.
Trausch wrote:

> However, when I attempt to redirect the output to a file:
>
> mbt@pepper:~/tmp$ python test.py >f
> Traceback (most recent call last):
> File "test.py", line 6, in <module>
> print u"This is Unicode code point %d (0x%x): %s" % (x, x,
> unichr(x))
> UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in
> position 39: ordinal not in range(12
>
> This is slightly confusing to me. The output goes all the way to the
> end of the program when it is not redirected. Why is Python treating
> the situation differently when the output is redirected?


If you print to a terminal `sys.stdout` is connected to that terminal and
there are ways to figure out that it is a terminal (`os.isatty()`) and
which encoding the terminal excepts. At least in most cases. But there
is no way to tell what encoding a file or pipe should have. So Python
refuses to guess.

If an encoding could be determined the `sys.stdout.encoding` attribute is
set to the name, otherwise it's `None`.

Ciao,
Marc 'BlackJack' Rintsch
 
Reply With Quote
 
 
 
 
Laurent Pointal
Guest
Posts: n/a
 
      03-10-2007
Michael B. Trausch wrote:

> I am having a slight problem with UTF-8 output with Python. I have the
> following program:
>
> x = 0
>
> while x < 0x4000:
> print u"This is Unicode code point %d (0x%x): %s" % (x, x,
> unichr(x))
> x += 1
>
> This program works perfectly when run directly:
>
> mbt@pepper:~/tmp$ python test.py
> This is Unicode code point 0 (0x0):
> This is Unicode code point 1 (0x1):
> This is Unicode code point 2 (0x2):
> This is Unicode code point 3 (0x3):
> This is Unicode code point 4 (0x4):
> This is Unicode code point 5 (0x5):
> This is Unicode code point 6 (0x6):
> This is Unicode code point 7 (0x7):
> This is Unicode code point 8 (0x:
> This is Unicode code point 9 (0x9):
> This is Unicode code point 10 (0xa):
> (... continued)
>
> However, when I attempt to redirect the output to a file:
>
> mbt@pepper:~/tmp$ python test.py >f
> Traceback (most recent call last):
> File "test.py", line 6, in <module>
> print u"This is Unicode code point %d (0x%x): %s" % (x, x,
> unichr(x))
> UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in
> position 39: ordinal not in range(12
>
> This is slightly confusing to me. The output goes all the way to the
> end of the program when it is not redirected. Why is Python treating
> the situation differently when the output is redirected? This failure
> occurs for all redirection, by the way: >, >>, 1>2, pipes, and so forth.
>
> Any ideas?


In complement to Marc reply, you can open a file with a specific encoding
(see codecs.open() function), and use print >> f,... to fill that file.

A+

Laurent.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Strange problem - page output contains output from another request Paul ASP .Net 1 04-10-2007 03:41 PM
parse output screen ok but cant get desired output new file! chuck amadi Python 1 06-23-2004 02:16 PM
Sony Precision Cinema Progressive Output vs Component 480p Output Otto Pylot DVD Video 1 04-18-2004 09:49 PM
Is Fuji S3000 3.2m/pixel output, or 6 m/pixel interpolated output? Peter H Digital Photography 43 12-04-2003 02:35 PM
Output / Debug window output bug? John Bentley ASP .Net 0 09-10-2003 07:38 AM



Advertisments