Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Unicode blues in Python3

Reply
Thread Tools

Unicode blues in Python3

 
 
nn
Guest
Posts: n/a
 
      03-23-2010
I know that unicode is the way to go in Python 3.1, but it is getting
in my way right now in my Unix scripts. How do I write a chr(253) to a
file?

#nntst2.py
import sys,codecs
mychar=chr(253)
print(sys.stdout.encoding)
print(mychar)

> ./nntst2.py

ISO8859-1


> ./nntst2.py >nnout2

Traceback (most recent call last):
File "./nntst2.py", line 5, in <module>
print(mychar)
UnicodeEncodeError: 'ascii' codec can't encode character '\xfd' in
position 0: ordinal not in range(12

> cat nnout2

ascii

...Oh great!

ok lets try this:
#nntst3.py
import sys,codecs
mychar=chr(253)
print(sys.stdout.encoding)
print(mychar.encode('latin1'))

> ./nntst3.py

ISO8859-1
b'\xfd'

> ./nntst3.py >nnout3


> cat nnout3

ascii
b'\xfd'

...Eh... not what I want really.

#nntst4.py
import sys,codecs
mychar=chr(253)
print(sys.stdout.encoding)
sys.stdout=codecs.getwriter("latin1")(sys.stdout)
print(mychar)

> ./nntst4.py

ISO8859-1
Traceback (most recent call last):
File "./nntst4.py", line 6, in <module>
print(mychar)
File "Python-3.1.2/Lib/codecs.py", line 356, in write
self.stream.write(data)
TypeError: must be str, not bytes

...OK, this is not working either.

Is there any way to write a value 253 to standard output?
 
Reply With Quote
 
 
 
 
Rami Chowdhury
Guest
Posts: n/a
 
      03-23-2010
On Tuesday 23 March 2010 10:33:33 nn wrote:
> I know that unicode is the way to go in Python 3.1, but it is getting
> in my way right now in my Unix scripts. How do I write a chr(253) to a
> file?
>
> #nntst2.py
> import sys,codecs
> mychar=chr(253)
> print(sys.stdout.encoding)
> print(mychar)


The following code works for me:

$ cat nnout5.py
#!/usr/bin/python3.1

import sys
mychar = chr(253)
sys.stdout.write(mychar)
$ echo $(cat nnout)


Can I ask why you're using print() in the first place, rather than writing
directly to a file? Python 3.x, AFAIK, distinguishes between text and binary
files and will let you specify the encoding you want for strings you write.

Hope that helps,
Rami
>
> > ./nntst2.py

>
> ISO8859-1
>
>
> > ./nntst2.py >nnout2

>
> Traceback (most recent call last):
> File "./nntst2.py", line 5, in <module>
> print(mychar)
> UnicodeEncodeError: 'ascii' codec can't encode character '\xfd' in
> position 0: ordinal not in range(12
>
> > cat nnout2

>
> ascii
>
> ..Oh great!
>
> ok lets try this:
> #nntst3.py
> import sys,codecs
> mychar=chr(253)
> print(sys.stdout.encoding)
> print(mychar.encode('latin1'))
>
> > ./nntst3.py

>
> ISO8859-1
> b'\xfd'
>
> > ./nntst3.py >nnout3
> >
> > cat nnout3

>
> ascii
> b'\xfd'
>
> ..Eh... not what I want really.
>
> #nntst4.py
> import sys,codecs
> mychar=chr(253)
> print(sys.stdout.encoding)
> sys.stdout=codecs.getwriter("latin1")(sys.stdout)
> print(mychar)
>
> > ./nntst4.py

>
> ISO8859-1
> Traceback (most recent call last):
> File "./nntst4.py", line 6, in <module>
> print(mychar)
> File "Python-3.1.2/Lib/codecs.py", line 356, in write
> self.stream.write(data)
> TypeError: must be str, not bytes
>
> ..OK, this is not working either.
>
> Is there any way to write a value 253 to standard output?


----
Rami Chowdhury
"Ninety percent of everything is crap." -- Sturgeon's Law
408-597-7068 (US) / 07875-841-046 (UK) / 01819-245544 (BD)
 
Reply With Quote
 
 
 
 
nn
Guest
Posts: n/a
 
      03-23-2010


Rami Chowdhury wrote:
> On Tuesday 23 March 2010 10:33:33 nn wrote:
> > I know that unicode is the way to go in Python 3.1, but it is getting
> > in my way right now in my Unix scripts. How do I write a chr(253) to a
> > file?
> >
> > #nntst2.py
> > import sys,codecs
> > mychar=chr(253)
> > print(sys.stdout.encoding)
> > print(mychar)

>
> The following code works for me:
>
> $ cat nnout5.py
> #!/usr/bin/python3.1
>
> import sys
> mychar = chr(253)
> sys.stdout.write(mychar)
> $ echo $(cat nnout)
>
>
> Can I ask why you're using print() in the first place, rather than writing
> directly to a file? Python 3.x, AFAIK, distinguishes between text and binary > files and will let you specify the encoding you want for strings you write.
>
> Hope that helps,
> Rami
> >
> > > ./nntst2.py

> >
> > ISO8859-1
> >
> >
> > > ./nntst2.py >nnout2

> >
> > Traceback (most recent call last):
> > File "./nntst2.py", line 5, in <module>
> > print(mychar)
> > UnicodeEncodeError: 'ascii' codec can't encode character '\xfd' in
> > position 0: ordinal not in range(12
> >
> > > cat nnout2

> >
> > ascii
> >
> > ..Oh great!
> >
> > ok lets try this:
> > #nntst3.py
> > import sys,codecs
> > mychar=chr(253)
> > print(sys.stdout.encoding)
> > print(mychar.encode('latin1'))
> >
> > > ./nntst3.py

> >
> > ISO8859-1
> > b'\xfd'
> >
> > > ./nntst3.py >nnout3
> > >
> > > cat nnout3

> >
> > ascii
> > b'\xfd'
> >
> > ..Eh... not what I want really.
> >
> > #nntst4.py
> > import sys,codecs
> > mychar=chr(253)
> > print(sys.stdout.encoding)
> > sys.stdout=codecs.getwriter("latin1")(sys.stdout)
> > print(mychar)
> >
> > > ./nntst4.py

> >
> > ISO8859-1
> > Traceback (most recent call last):
> > File "./nntst4.py", line 6, in <module>
> > print(mychar)
> > File "Python-3.1.2/Lib/codecs.py", line 356, in write
> > self.stream.write(data)
> > TypeError: must be str, not bytes
> >
> > ..OK, this is not working either.
> >
> > Is there any way to write a value 253 to standard output?

>


#nntst5.py
import sys
mychar=chr(253)
sys.stdout.write(mychar)

> ./nntst5.py >nnout5

Traceback (most recent call last):
File "./nntst5.py", line 4, in <module>
sys.stdout.write(mychar)
UnicodeEncodeError: 'ascii' codec can't encode character '\xfd' in
position 0: ordinal not in range(12

equivalent to print.

I use print so I can do tests and debug runs to the screen or pipe it
to some other tool and then configure the production bash script to
write the final output to a file of my choosing.
 
Reply With Quote
 
Gary Herron
Guest
Posts: n/a
 
      03-23-2010
nn wrote:
> I know that unicode is the way to go in Python 3.1, but it is getting
> in my way right now in my Unix scripts. How do I write a chr(253) to a
> file?
>


Python3 make a distinction between bytes and string(i.e., unicode)
types, and you are still thinking in the Python2 mode that does *NOT*
make such a distinction. What you appear to want is to write a
particular byte to a file -- so use the bytes type and a file open in
binary mode:

>>> b=bytes([253])
>>> f = open("abc", 'wb')
>>> f.write(b)

1
>>> f.close()


On unix (at least), the "od" program can verify the contents is correct:
> od abc -d

0000000 253
0000001


Hope that helps.

Gary Herron



> #nntst2.py
> import sys,codecs
> mychar=chr(253)
> print(sys.stdout.encoding)
> print(mychar)
>
> > ./nntst2.py

> ISO8859-1
>
>
> > ./nntst2.py >nnout2

> Traceback (most recent call last):
> File "./nntst2.py", line 5, in <module>
> print(mychar)
> UnicodeEncodeError: 'ascii' codec can't encode character '\xfd' in
> position 0: ordinal not in range(12
>
>
>> cat nnout2
>>

> ascii
>
> ..Oh great!
>
> ok lets try this:
> #nntst3.py
> import sys,codecs
> mychar=chr(253)
> print(sys.stdout.encoding)
> print(mychar.encode('latin1'))
>
>
>> ./nntst3.py
>>

> ISO8859-1
> b'\xfd'
>
>
>> ./nntst3.py >nnout3
>>

>
>
>> cat nnout3
>>

> ascii
> b'\xfd'
>
> ..Eh... not what I want really.
>
> #nntst4.py
> import sys,codecs
> mychar=chr(253)
> print(sys.stdout.encoding)
> sys.stdout=codecs.getwriter("latin1")(sys.stdout)
> print(mychar)
>
> > ./nntst4.py

> ISO8859-1
> Traceback (most recent call last):
> File "./nntst4.py", line 6, in <module>
> print(mychar)
> File "Python-3.1.2/Lib/codecs.py", line 356, in write
> self.stream.write(data)
> TypeError: must be str, not bytes
>
> ..OK, this is not working either.
>
> Is there any way to write a value 253 to standard output?
>



 
Reply With Quote
 
nn
Guest
Posts: n/a
 
      03-23-2010


Gary Herron wrote:
> nn wrote:
> > I know that unicode is the way to go in Python 3.1, but it is getting
> > in my way right now in my Unix scripts. How do I write a chr(253) to a
> > file?
> >

>
> Python3 make a distinction between bytes and string(i.e., unicode)
> types, and you are still thinking in the Python2 mode that does *NOT*
> make such a distinction. What you appear to want is to write a
> particular byte to a file -- so use the bytes type and a file open in
> binary mode:
>
> >>> b=bytes([253])
> >>> f = open("abc", 'wb')
> >>> f.write(b)

> 1
> >>> f.close()

>
> On unix (at least), the "od" program can verify the contents is correct:
> > od abc -d

> 0000000 253
> 0000001
>
>
> Hope that helps.
>
> Gary Herron
>
>
>
> > #nntst2.py
> > import sys,codecs
> > mychar=chr(253)
> > print(sys.stdout.encoding)
> > print(mychar)
> >
> > > ./nntst2.py

> > ISO8859-1
> >
> >
> > > ./nntst2.py >nnout2

> > Traceback (most recent call last):
> > File "./nntst2.py", line 5, in <module>
> > print(mychar)
> > UnicodeEncodeError: 'ascii' codec can't encode character '\xfd' in
> > position 0: ordinal not in range(12
> >
> >
> >> cat nnout2
> >>

> > ascii
> >
> > ..Oh great!
> >
> > ok lets try this:
> > #nntst3.py
> > import sys,codecs
> > mychar=chr(253)
> > print(sys.stdout.encoding)
> > print(mychar.encode('latin1'))
> >
> >
> >> ./nntst3.py
> >>

> > ISO8859-1
> > b'\xfd'
> >
> >
> >> ./nntst3.py >nnout3
> >>

> >
> >
> >> cat nnout3
> >>

> > ascii
> > b'\xfd'
> >
> > ..Eh... not what I want really.
> >
> > #nntst4.py
> > import sys,codecs
> > mychar=chr(253)
> > print(sys.stdout.encoding)
> > sys.stdout=codecs.getwriter("latin1")(sys.stdout)
> > print(mychar)
> >
> > > ./nntst4.py

> > ISO8859-1
> > Traceback (most recent call last):
> > File "./nntst4.py", line 6, in <module>
> > print(mychar)
> > File "Python-3.1.2/Lib/codecs.py", line 356, in write
> > self.stream.write(data)
> > TypeError: must be str, not bytes
> >
> > ..OK, this is not working either.
> >
> > Is there any way to write a value 253 to standard output?
> >


Actually what I want is to write a particular byte to standard output,
and I want this to work regardless of where that output gets sent to.
I am aware that I could do
open('nnout','w',encoding='latin1').write(mychar) but I am porting a
python2 program and don't want to rewrite everything that uses that
script.
 
Reply With Quote
 
Stefan Behnel
Guest
Posts: n/a
 
      03-23-2010
nn, 23.03.2010 19:46:
> Actually what I want is to write a particular byte to standard output,
> and I want this to work regardless of where that output gets sent to.
> I am aware that I could do
> open('nnout','w',encoding='latin1').write(mychar) but I am porting a
> python2 program and don't want to rewrite everything that uses that
> script.


Are you writing text or binary data to stdout?

Stefan

 
Reply With Quote
 
nn
Guest
Posts: n/a
 
      03-23-2010


Stefan Behnel wrote:
> nn, 23.03.2010 19:46:
> > Actually what I want is to write a particular byte to standard output,
> > and I want this to work regardless of where that output gets sent to.
> > I am aware that I could do
> > open('nnout','w',encoding='latin1').write(mychar) but I am porting a
> > python2 program and don't want to rewrite everything that uses that
> > script.

>
> Are you writing text or binary data to stdout?
>
> Stefan


latin1 charset text.
 
Reply With Quote
 
Martin v. Loewis
Guest
Posts: n/a
 
      03-23-2010
nn wrote:
>
> Stefan Behnel wrote:
>> nn, 23.03.2010 19:46:
>>> Actually what I want is to write a particular byte to standard output,
>>> and I want this to work regardless of where that output gets sent to.
>>> I am aware that I could do
>>> open('nnout','w',encoding='latin1').write(mychar) but I am porting a
>>> python2 program and don't want to rewrite everything that uses that
>>> script.

>> Are you writing text or binary data to stdout?
>>
>> Stefan

>
> latin1 charset text.


Are you sure about that? If you carefully reconsider, could you come to
the conclusion that you are not writing text at all, but binary data?

If it really was text that you write, why do you need to use
U+00FD (LATIN SMALL LETTER Y WITH ACUTE). To my knowledge, that
character is really infrequently used in practice. So that you try to
write it strongly suggests that it is not actually text what you are
writing.

Also, your formulation suggests the same:

"Is there any way to write a value 253 to standard output?"

If you would really be writing text, you'd ask


"Is there any way to write '' to standard output?"

Regards,
Martin
 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      03-24-2010
On Tue, 23 Mar 2010 11:46:33 -0700, nn wrote:

> Actually what I want is to write a particular byte to standard output,
> and I want this to work regardless of where that output gets sent to.


What do you mean "work"?

Do you mean "display a particular glyph" or something else?

In bash:

$ echo -e "\0101" # octal 101 = decimal 65
A
$ echo -e "\0375" # decimal 253


but if I change the terminal encoding, I get this:

$ echo -e "\0375"
ý

Or this:

$ echo -e "\0375"
²

depending on which encoding I use.

I think your question is malformed. You need to work out what behaviour
you actually want, before you can ask for help on how to get it.



--
Steven
 
Reply With Quote
 
nn
Guest
Posts: n/a
 
      03-24-2010


Martin v. Loewis wrote:
> nn wrote:
> >
> > Stefan Behnel wrote:
> >> nn, 23.03.2010 19:46:
> >>> Actually what I want is to write a particular byte to standard output,
> >>> and I want this to work regardless of where that output gets sent to.
> >>> I am aware that I could do
> >>> open('nnout','w',encoding='latin1').write(mychar) but I am porting a
> >>> python2 program and don't want to rewrite everything that uses that
> >>> script.
> >> Are you writing text or binary data to stdout?
> >>
> >> Stefan

> >
> > latin1 charset text.

>
> Are you sure about that? If you carefully reconsider, could you come to
> the conclusion that you are not writing text at all, but binary data?
>
> If it really was text that you write, why do you need to use
> U+00FD (LATIN SMALL LETTER Y WITH ACUTE). To my knowledge, that
> character is really infrequently used in practice. So that you try to
> write it strongly suggests that it is not actually text what you are
> writing.
>
> Also, your formulation suggests the same:
>
> "Is there any way to write a value 253 to standard output?"
>
> If you would really be writing text, you'd ask
>
>
> "Is there any way to write '�' to standard output?"
>
> Regards,
> Martin


To be more informative I am both writing text and binary data
together. That is I am embedding text from another source into stream
that uses non-ascii characters as "control" characters. In Python2 I
was processing it mostly as text containing a few "funny" characters.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: is the same betweent python3 and python3.2? Andrew Berg Python 0 06-16-2012 11:11 AM
WinXP, Python3.1.2,dir-listing to XML - problem with unicode file names kai_nerda Python 0 04-03-2010 02:40 AM
python3 Unicode is slow Dale Gerdemann Python 1 10-25-2009 01:11 PM
unicode wrap unicode object? ygao Python 6 04-08-2006 09:54 AM
Unicode + jsp + mysql + tomcat = unicode still not displaying Robert Mark Bram Java 0 09-28-2003 05:37 AM



Advertisments