Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   Unicode blues in Python3 (http://www.velocityreviews.com/forums/t718507-unicode-blues-in-python3.html)

nn 03-23-2010 05:33 PM

Unicode blues in Python3
 
I know that unicode is the way to go in Python 3.1, but it is getting
in my way right now in my Unix scripts. How do I write a chr(253) to a
file?

#nntst2.py
import sys,codecs
mychar=chr(253)
print(sys.stdout.encoding)
print(mychar)

> ./nntst2.py

ISO8859-1


> ./nntst2.py >nnout2

Traceback (most recent call last):
File "./nntst2.py", line 5, in <module>
print(mychar)
UnicodeEncodeError: 'ascii' codec can't encode character '\xfd' in
position 0: ordinal not in range(128)

> cat nnout2

ascii

...Oh great!

ok lets try this:
#nntst3.py
import sys,codecs
mychar=chr(253)
print(sys.stdout.encoding)
print(mychar.encode('latin1'))

> ./nntst3.py

ISO8859-1
b'\xfd'

> ./nntst3.py >nnout3


> cat nnout3

ascii
b'\xfd'

...Eh... not what I want really.

#nntst4.py
import sys,codecs
mychar=chr(253)
print(sys.stdout.encoding)
sys.stdout=codecs.getwriter("latin1")(sys.stdout)
print(mychar)

> ./nntst4.py

ISO8859-1
Traceback (most recent call last):
File "./nntst4.py", line 6, in <module>
print(mychar)
File "Python-3.1.2/Lib/codecs.py", line 356, in write
self.stream.write(data)
TypeError: must be str, not bytes

...OK, this is not working either.

Is there any way to write a value 253 to standard output?

Rami Chowdhury 03-23-2010 06:00 PM

Re: Unicode blues in Python3
 
On Tuesday 23 March 2010 10:33:33 nn wrote:
> I know that unicode is the way to go in Python 3.1, but it is getting
> in my way right now in my Unix scripts. How do I write a chr(253) to a
> file?
>
> #nntst2.py
> import sys,codecs
> mychar=chr(253)
> print(sys.stdout.encoding)
> print(mychar)


The following code works for me:

$ cat nnout5.py
#!/usr/bin/python3.1

import sys
mychar = chr(253)
sys.stdout.write(mychar)
$ echo $(cat nnout)


Can I ask why you're using print() in the first place, rather than writing
directly to a file? Python 3.x, AFAIK, distinguishes between text and binary
files and will let you specify the encoding you want for strings you write.

Hope that helps,
Rami
>
> > ./nntst2.py

>
> ISO8859-1
>
>
> > ./nntst2.py >nnout2

>
> Traceback (most recent call last):
> File "./nntst2.py", line 5, in <module>
> print(mychar)
> UnicodeEncodeError: 'ascii' codec can't encode character '\xfd' in
> position 0: ordinal not in range(128)
>
> > cat nnout2

>
> ascii
>
> ..Oh great!
>
> ok lets try this:
> #nntst3.py
> import sys,codecs
> mychar=chr(253)
> print(sys.stdout.encoding)
> print(mychar.encode('latin1'))
>
> > ./nntst3.py

>
> ISO8859-1
> b'\xfd'
>
> > ./nntst3.py >nnout3
> >
> > cat nnout3

>
> ascii
> b'\xfd'
>
> ..Eh... not what I want really.
>
> #nntst4.py
> import sys,codecs
> mychar=chr(253)
> print(sys.stdout.encoding)
> sys.stdout=codecs.getwriter("latin1")(sys.stdout)
> print(mychar)
>
> > ./nntst4.py

>
> ISO8859-1
> Traceback (most recent call last):
> File "./nntst4.py", line 6, in <module>
> print(mychar)
> File "Python-3.1.2/Lib/codecs.py", line 356, in write
> self.stream.write(data)
> TypeError: must be str, not bytes
>
> ..OK, this is not working either.
>
> Is there any way to write a value 253 to standard output?


----
Rami Chowdhury
"Ninety percent of everything is crap." -- Sturgeon's Law
408-597-7068 (US) / 07875-841-046 (UK) / 01819-245544 (BD)

nn 03-23-2010 06:09 PM

Re: Unicode blues in Python3
 


Rami Chowdhury wrote:
> On Tuesday 23 March 2010 10:33:33 nn wrote:
> > I know that unicode is the way to go in Python 3.1, but it is getting
> > in my way right now in my Unix scripts. How do I write a chr(253) to a
> > file?
> >
> > #nntst2.py
> > import sys,codecs
> > mychar=chr(253)
> > print(sys.stdout.encoding)
> > print(mychar)

>
> The following code works for me:
>
> $ cat nnout5.py
> #!/usr/bin/python3.1
>
> import sys
> mychar = chr(253)
> sys.stdout.write(mychar)
> $ echo $(cat nnout)
>
>
> Can I ask why you're using print() in the first place, rather than writing
> directly to a file? Python 3.x, AFAIK, distinguishes between text and binary > files and will let you specify the encoding you want for strings you write.
>
> Hope that helps,
> Rami
> >
> > > ./nntst2.py

> >
> > ISO8859-1
> >
> >
> > > ./nntst2.py >nnout2

> >
> > Traceback (most recent call last):
> > File "./nntst2.py", line 5, in <module>
> > print(mychar)
> > UnicodeEncodeError: 'ascii' codec can't encode character '\xfd' in
> > position 0: ordinal not in range(128)
> >
> > > cat nnout2

> >
> > ascii
> >
> > ..Oh great!
> >
> > ok lets try this:
> > #nntst3.py
> > import sys,codecs
> > mychar=chr(253)
> > print(sys.stdout.encoding)
> > print(mychar.encode('latin1'))
> >
> > > ./nntst3.py

> >
> > ISO8859-1
> > b'\xfd'
> >
> > > ./nntst3.py >nnout3
> > >
> > > cat nnout3

> >
> > ascii
> > b'\xfd'
> >
> > ..Eh... not what I want really.
> >
> > #nntst4.py
> > import sys,codecs
> > mychar=chr(253)
> > print(sys.stdout.encoding)
> > sys.stdout=codecs.getwriter("latin1")(sys.stdout)
> > print(mychar)
> >
> > > ./nntst4.py

> >
> > ISO8859-1
> > Traceback (most recent call last):
> > File "./nntst4.py", line 6, in <module>
> > print(mychar)
> > File "Python-3.1.2/Lib/codecs.py", line 356, in write
> > self.stream.write(data)
> > TypeError: must be str, not bytes
> >
> > ..OK, this is not working either.
> >
> > Is there any way to write a value 253 to standard output?

>


#nntst5.py
import sys
mychar=chr(253)
sys.stdout.write(mychar)

> ./nntst5.py >nnout5

Traceback (most recent call last):
File "./nntst5.py", line 4, in <module>
sys.stdout.write(mychar)
UnicodeEncodeError: 'ascii' codec can't encode character '\xfd' in
position 0: ordinal not in range(128)

equivalent to print.

I use print so I can do tests and debug runs to the screen or pipe it
to some other tool and then configure the production bash script to
write the final output to a file of my choosing.

Gary Herron 03-23-2010 06:11 PM

Re: Unicode blues in Python3
 
nn wrote:
> I know that unicode is the way to go in Python 3.1, but it is getting
> in my way right now in my Unix scripts. How do I write a chr(253) to a
> file?
>


Python3 make a distinction between bytes and string(i.e., unicode)
types, and you are still thinking in the Python2 mode that does *NOT*
make such a distinction. What you appear to want is to write a
particular byte to a file -- so use the bytes type and a file open in
binary mode:

>>> b=bytes([253])
>>> f = open("abc", 'wb')
>>> f.write(b)

1
>>> f.close()


On unix (at least), the "od" program can verify the contents is correct:
> od abc -d

0000000 253
0000001


Hope that helps.

Gary Herron



> #nntst2.py
> import sys,codecs
> mychar=chr(253)
> print(sys.stdout.encoding)
> print(mychar)
>
> > ./nntst2.py

> ISO8859-1
>
>
> > ./nntst2.py >nnout2

> Traceback (most recent call last):
> File "./nntst2.py", line 5, in <module>
> print(mychar)
> UnicodeEncodeError: 'ascii' codec can't encode character '\xfd' in
> position 0: ordinal not in range(128)
>
>
>> cat nnout2
>>

> ascii
>
> ..Oh great!
>
> ok lets try this:
> #nntst3.py
> import sys,codecs
> mychar=chr(253)
> print(sys.stdout.encoding)
> print(mychar.encode('latin1'))
>
>
>> ./nntst3.py
>>

> ISO8859-1
> b'\xfd'
>
>
>> ./nntst3.py >nnout3
>>

>
>
>> cat nnout3
>>

> ascii
> b'\xfd'
>
> ..Eh... not what I want really.
>
> #nntst4.py
> import sys,codecs
> mychar=chr(253)
> print(sys.stdout.encoding)
> sys.stdout=codecs.getwriter("latin1")(sys.stdout)
> print(mychar)
>
> > ./nntst4.py

> ISO8859-1
> Traceback (most recent call last):
> File "./nntst4.py", line 6, in <module>
> print(mychar)
> File "Python-3.1.2/Lib/codecs.py", line 356, in write
> self.stream.write(data)
> TypeError: must be str, not bytes
>
> ..OK, this is not working either.
>
> Is there any way to write a value 253 to standard output?
>




nn 03-23-2010 06:46 PM

Re: Unicode blues in Python3
 


Gary Herron wrote:
> nn wrote:
> > I know that unicode is the way to go in Python 3.1, but it is getting
> > in my way right now in my Unix scripts. How do I write a chr(253) to a
> > file?
> >

>
> Python3 make a distinction between bytes and string(i.e., unicode)
> types, and you are still thinking in the Python2 mode that does *NOT*
> make such a distinction. What you appear to want is to write a
> particular byte to a file -- so use the bytes type and a file open in
> binary mode:
>
> >>> b=bytes([253])
> >>> f = open("abc", 'wb')
> >>> f.write(b)

> 1
> >>> f.close()

>
> On unix (at least), the "od" program can verify the contents is correct:
> > od abc -d

> 0000000 253
> 0000001
>
>
> Hope that helps.
>
> Gary Herron
>
>
>
> > #nntst2.py
> > import sys,codecs
> > mychar=chr(253)
> > print(sys.stdout.encoding)
> > print(mychar)
> >
> > > ./nntst2.py

> > ISO8859-1
> >
> >
> > > ./nntst2.py >nnout2

> > Traceback (most recent call last):
> > File "./nntst2.py", line 5, in <module>
> > print(mychar)
> > UnicodeEncodeError: 'ascii' codec can't encode character '\xfd' in
> > position 0: ordinal not in range(128)
> >
> >
> >> cat nnout2
> >>

> > ascii
> >
> > ..Oh great!
> >
> > ok lets try this:
> > #nntst3.py
> > import sys,codecs
> > mychar=chr(253)
> > print(sys.stdout.encoding)
> > print(mychar.encode('latin1'))
> >
> >
> >> ./nntst3.py
> >>

> > ISO8859-1
> > b'\xfd'
> >
> >
> >> ./nntst3.py >nnout3
> >>

> >
> >
> >> cat nnout3
> >>

> > ascii
> > b'\xfd'
> >
> > ..Eh... not what I want really.
> >
> > #nntst4.py
> > import sys,codecs
> > mychar=chr(253)
> > print(sys.stdout.encoding)
> > sys.stdout=codecs.getwriter("latin1")(sys.stdout)
> > print(mychar)
> >
> > > ./nntst4.py

> > ISO8859-1
> > Traceback (most recent call last):
> > File "./nntst4.py", line 6, in <module>
> > print(mychar)
> > File "Python-3.1.2/Lib/codecs.py", line 356, in write
> > self.stream.write(data)
> > TypeError: must be str, not bytes
> >
> > ..OK, this is not working either.
> >
> > Is there any way to write a value 253 to standard output?
> >


Actually what I want is to write a particular byte to standard output,
and I want this to work regardless of where that output gets sent to.
I am aware that I could do
open('nnout','w',encoding='latin1').write(mychar) but I am porting a
python2 program and don't want to rewrite everything that uses that
script.

Stefan Behnel 03-23-2010 07:57 PM

Re: Unicode blues in Python3
 
nn, 23.03.2010 19:46:
> Actually what I want is to write a particular byte to standard output,
> and I want this to work regardless of where that output gets sent to.
> I am aware that I could do
> open('nnout','w',encoding='latin1').write(mychar) but I am porting a
> python2 program and don't want to rewrite everything that uses that
> script.


Are you writing text or binary data to stdout?

Stefan


nn 03-23-2010 08:36 PM

Re: Unicode blues in Python3
 


Stefan Behnel wrote:
> nn, 23.03.2010 19:46:
> > Actually what I want is to write a particular byte to standard output,
> > and I want this to work regardless of where that output gets sent to.
> > I am aware that I could do
> > open('nnout','w',encoding='latin1').write(mychar) but I am porting a
> > python2 program and don't want to rewrite everything that uses that
> > script.

>
> Are you writing text or binary data to stdout?
>
> Stefan


latin1 charset text.

Martin v. Loewis 03-23-2010 10:42 PM

Re: Unicode blues in Python3
 
nn wrote:
>
> Stefan Behnel wrote:
>> nn, 23.03.2010 19:46:
>>> Actually what I want is to write a particular byte to standard output,
>>> and I want this to work regardless of where that output gets sent to.
>>> I am aware that I could do
>>> open('nnout','w',encoding='latin1').write(mychar) but I am porting a
>>> python2 program and don't want to rewrite everything that uses that
>>> script.

>> Are you writing text or binary data to stdout?
>>
>> Stefan

>
> latin1 charset text.


Are you sure about that? If you carefully reconsider, could you come to
the conclusion that you are not writing text at all, but binary data?

If it really was text that you write, why do you need to use
U+00FD (LATIN SMALL LETTER Y WITH ACUTE). To my knowledge, that
character is really infrequently used in practice. So that you try to
write it strongly suggests that it is not actually text what you are
writing.

Also, your formulation suggests the same:

"Is there any way to write a value 253 to standard output?"

If you would really be writing text, you'd ask


"Is there any way to write '' to standard output?"

Regards,
Martin

Steven D'Aprano 03-24-2010 04:41 AM

Re: Unicode blues in Python3
 
On Tue, 23 Mar 2010 11:46:33 -0700, nn wrote:

> Actually what I want is to write a particular byte to standard output,
> and I want this to work regardless of where that output gets sent to.


What do you mean "work"?

Do you mean "display a particular glyph" or something else?

In bash:

$ echo -e "\0101" # octal 101 = decimal 65
A
$ echo -e "\0375" # decimal 253


but if I change the terminal encoding, I get this:

$ echo -e "\0375"
ý

Or this:

$ echo -e "\0375"
²

depending on which encoding I use.

I think your question is malformed. You need to work out what behaviour
you actually want, before you can ask for help on how to get it.



--
Steven

nn 03-24-2010 01:03 PM

Re: Unicode blues in Python3
 


Martin v. Loewis wrote:
> nn wrote:
> >
> > Stefan Behnel wrote:
> >> nn, 23.03.2010 19:46:
> >>> Actually what I want is to write a particular byte to standard output,
> >>> and I want this to work regardless of where that output gets sent to.
> >>> I am aware that I could do
> >>> open('nnout','w',encoding='latin1').write(mychar) but I am porting a
> >>> python2 program and don't want to rewrite everything that uses that
> >>> script.
> >> Are you writing text or binary data to stdout?
> >>
> >> Stefan

> >
> > latin1 charset text.

>
> Are you sure about that? If you carefully reconsider, could you come to
> the conclusion that you are not writing text at all, but binary data?
>
> If it really was text that you write, why do you need to use
> U+00FD (LATIN SMALL LETTER Y WITH ACUTE). To my knowledge, that
> character is really infrequently used in practice. So that you try to
> write it strongly suggests that it is not actually text what you are
> writing.
>
> Also, your formulation suggests the same:
>
> "Is there any way to write a value 253 to standard output?"
>
> If you would really be writing text, you'd ask
>
>
> "Is there any way to write '�' to standard output?"
>
> Regards,
> Martin


To be more informative I am both writing text and binary data
together. That is I am embedding text from another source into stream
that uses non-ascii characters as "control" characters. In Python2 I
was processing it mostly as text containing a few "funny" characters.


All times are GMT. The time now is 09:56 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.