Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Unicode blues in Python3

Reply
Thread Tools

Unicode blues in Python3

 
 
nn
Guest
Posts: n/a
 
      03-24-2010


Steven D'Aprano wrote:
> On Tue, 23 Mar 2010 11:46:33 -0700, nn wrote:
>
> > Actually what I want is to write a particular byte to standard output,
> > and I want this to work regardless of where that output gets sent to.

>
> What do you mean "work"?
>
> Do you mean "display a particular glyph" or something else?
>
> In bash:
>
> $ echo -e "\0101" # octal 101 = decimal 65
> A
> $ echo -e "\0375" # decimal 253
> �
>
> but if I change the terminal encoding, I get this:
>
> $ echo -e "\0375"
> ý
>
> Or this:
>
> $ echo -e "\0375"
> ²
>
> depending on which encoding I use.
>
> I think your question is malformed. You need to work out what behaviour
> you actually want, before you can ask for help on how to get it.
>
>
>
> --
> Steven


Yes sorry it is a bit ambiguous. I don't really care what glyph is,
the program reading my output reads 8 bit values expects the binary
value 0xFD as control character and lets everything else through as is.
 
Reply With Quote
 
 
 
 
Antoine Pitrou
Guest
Posts: n/a
 
      03-24-2010
Le Tue, 23 Mar 2010 10:33:33 -0700, nn a écrit*:

> I know that unicode is the way to go in Python 3.1, but it is getting in
> my way right now in my Unix scripts. How do I write a chr(253) to a
> file?
>
> #nntst2.py
> import sys,codecs
> mychar=chr(253)
> print(sys.stdout.encoding)
> print(mychar)


print() writes to the text (unicode) layer of sys.stdout.
If you want to access the binary (bytes) layer, you must use
sys.stdout.buffer. So:

sys.stdout.buffer.write(chr(253).encode('latin1'))

or:

sys.stdout.buffer.write(bytes([253]))

See http://docs.python.org/py3k/library/...tIOBase.buffer


 
Reply With Quote
 
 
 
 
Michael Torrie
Guest
Posts: n/a
 
      03-24-2010
Steven D'Aprano wrote:
> I think your question is malformed. You need to work out what behaviour
> you actually want, before you can ask for help on how to get it.


It may or may not be malformed, but I understand the question. So let
eme translate for you. How can he write arbitrary bytes ( 0x0 through
0xff) to stdout without having them mangled by encodings. It's a very
simple question, really. Looks like Antoine Pitrou has answered this
question quite nicely as well.
 
Reply With Quote
 
nn
Guest
Posts: n/a
 
      03-24-2010


Antoine Pitrou wrote:
> Le Tue, 23 Mar 2010 10:33:33 -0700, nn a crit*:
>
> > I know that unicode is the way to go in Python 3.1, but it is getting in
> > my way right now in my Unix scripts. How do I write a chr(253) to a
> > file?
> >
> > #nntst2.py
> > import sys,codecs
> > mychar=chr(253)
> > print(sys.stdout.encoding)
> > print(mychar)

>
> print() writes to the text (unicode) layer of sys.stdout.
> If you want to access the binary (bytes) layer, you must use
> sys.stdout.buffer. So:
>
> sys.stdout.buffer.write(chr(253).encode('latin1'))
>
> or:
>
> sys.stdout.buffer.write(bytes([253]))
>
> See http://docs.python.org/py3k/library/...tIOBase.buffer


Just what I needed! Now I full control of the output.

Thanks Antoine. The new io stack is still a bit of a mystery to me.

Thanks everybody else, and sorry for confusing the issue. Latin1 just
happens to be very convenient to manipulate bytes and is what I
thought of initially to handle my mix of textual and non-textual data.
 
Reply With Quote
 
John Nagle
Guest
Posts: n/a
 
      03-24-2010
nn wrote:

> To be more informative I am both writing text and binary data
> together. That is I am embedding text from another source into stream
> that uses non-ascii characters as "control" characters. In Python2 I
> was processing it mostly as text containing a few "funny" characters.


OK. Then you need to be writing arrays of bytes, not strings.
Encoding is your problem. This has nothing to do with Unicode.

John Nagle
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: is the same betweent python3 and python3.2? Andrew Berg Python 0 06-16-2012 11:11 AM
WinXP, Python3.1.2,dir-listing to XML - problem with unicode file names kai_nerda Python 0 04-03-2010 02:40 AM
python3 Unicode is slow Dale Gerdemann Python 1 10-25-2009 01:11 PM
unicode wrap unicode object? ygao Python 6 04-08-2006 09:54 AM
Unicode + jsp + mysql + tomcat = unicode still not displaying Robert Mark Bram Java 0 09-28-2003 05:37 AM



Advertisments