Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Printing characters outside of the ASCII range

Reply
Thread Tools

Printing characters outside of the ASCII range

 
 
Ian Kelly
Guest
Posts: n/a
 
      11-09-2012
On Fri, Nov 9, 2012 at 2:46 PM, danielk <(E-Mail Removed)> wrote:
> D:\home\python>pytest.py
> Traceback (most recent call last):
> File "D:\home\python\pytest.py", line 1, in <module>
> print(chr(253).decode('latin1'))
> AttributeError: 'str' object has no attribute 'decode'
>
> Do I need to import something?


Ramit should have written "encode", not "decode". But the above still
would not work, because chr(253) gives you the character at *Unicode*
code point 253, not the character with CP437 ordinal 253 that your
terminal can actually print. The Unicode equivalents of those
characters are:

>>> list(map(ord, bytes([252, 253, 254]).decode('cp437')))

[8319, 178, 9632]

So these are what you would need to encode to CP437 for printing.

>>> print(chr(8319))


>>> print(chr(17)

²
>>> print(chr(9632))

*

That's probably not the way you want to go about printing them,
though, unless you mean to be inserting them manually. Is the data
you get from your database a string, or a bytes object? If the
former, just do:

print(data.encode('cp437'))

If the latter, then it should be printable as is, unless it is in some
other encoding than CP437.
 
Reply With Quote
 
 
 
 
wxjmfauth@gmail.com
Guest
Posts: n/a
 
      11-10-2012
Le vendredi 9 novembre 2012 18:17:54 UTC+1, danielk a écrit*:
> I'm converting an application to Python 3. The app works fine on Python 2..
>
>
>
> Simply put, this simple one-liner:
>
>
>
> print(chr(254))
>
>
>
> errors out with:
>
>
>
> Traceback (most recent call last):
>
> File "D:\home\python\tst.py", line 1, in <module>
>
> print(chr(254))
>
> File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
>
> return codecs.charmap_encode(input,self.errors,encoding_m ap)[0]
>
> UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>
>
>
>
> I'm using this character as a delimiter in my application.
>
>
>
> What do I have to do to convert this string so that it does not error out?


-----

There is nothing wrong in having the character with
the code point 0xfe in the cp437 coding scheme as
a delimiter.

If it is coming from a byte string, you should
decode it properly

>>> b'=\xfe=\xfe='.decode('cp437')

'=*=*='

or you can use directly the unicode equivalent

>>> '=\u25a0=\u25a0='

'=*=*='

That's for "input". For "output" see:
http://groups.google.com/group/comp....f2f7f5a4962e8#


The choice of that character as a delimiter is not wrong.
It's a little bit unfortunate, because it falls high in
the "unicode table".

>>> import fourbiunicode as fu
>>> fu.UnicodeBlock('\u25a0')

'Geometric Shapes'
>>>
>>> fu.UnicodeBlock(b'\xfe'.decode('cp437'))

'Geometric Shapes'

(Another form of explanation)
jmf
 
Reply With Quote
 
 
 
 
danielk
Guest
Posts: n/a
 
      11-11-2012
On Friday, November 9, 2012 5:11:12 PM UTC-5, Ian wrote:
> On Fri, Nov 9, 2012 at 2:46 PM, danielk <(E-Mail Removed)> wrote:
>
> > D:\home\python>pytest.py

>
> > Traceback (most recent call last):

>
> > File "D:\home\python\pytest.py", line 1, in <module>

>
> > print(chr(253).decode('latin1'))

>
> > AttributeError: 'str' object has no attribute 'decode'

>
> >

>
> > Do I need to import something?

>
>
>
> Ramit should have written "encode", not "decode". But the above still
>
> would not work, because chr(253) gives you the character at *Unicode*
>
> code point 253, not the character with CP437 ordinal 253 that your
>
> terminal can actually print. The Unicode equivalents of those
>
> characters are:
>
>
>
> >>> list(map(ord, bytes([252, 253, 254]).decode('cp437')))

>
> [8319, 178, 9632]
>
>
>
> So these are what you would need to encode to CP437 for printing.
>
>
>
> >>> print(chr(8319))

>
> ⁿ
>
> >>> print(chr(17)

>
> ²
>
> >>> print(chr(9632))

>
> *
>
>
>
> That's probably not the way you want to go about printing them,
>
> though, unless you mean to be inserting them manually. Is the data
>
> you get from your database a string, or a bytes object? If the
>
> former, just do:
>
>
>
> print(data.encode('cp437'))
>
>
>
> If the latter, then it should be printable as is, unless it is in some
>
> other encoding than CP437.


Ian's solution gives me what I need (thanks Ian!). But I notice a difference between '__str__' and '__repr__'.

class Pytest(str):
def __init__(self, data = None):
if data == None: data = ""
self.data = data

def __repr__(self):
return (self.data).encode('cp437')

>>> import pytest
>>> p = pytest.Pytest("abc" + chr(17 + "def")
>>> print(p)

abc²def
>>> print(p.data)

abc²def
>>> print(type(p.data))

<class 'str'>

If I change '__repr__' to '__str__' then I get:

>>> import pytest
>>> p = pytest.Pytest("abc" + chr(17 + "def")
>>> print(p)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __str__ returned non-string (type bytes)

Why is '__str__' behaving differently than '__repr__' ? I'd like to be ableto use '__str__' because the result is not executable code, it's just a string of the record contents.

The documentation for the 'encode' method says: "Return an encoded version of the string as a bytes object." Yet when I displayed the type, it said itwas <class 'str'>, which I'm taking to be 'type string', or can a 'string'also be 'a string of bytes' ?

I'm trying to get my head around all this codecs/unicode stuff. I haven't had to deal with it until now but I'm determined to not let it get the best of me

My goals are:

a) display a 'raw' database record with the delimiters intact, and
b) allow the client to create a string that represents a database record. So, if they know the record format then they should be able to create a database object like it does above, but with the chr(25x) characters. I will handle the conversion of the chr(25x) characters internally.
 
Reply With Quote
 
danielk
Guest
Posts: n/a
 
      11-11-2012
On Friday, November 9, 2012 5:11:12 PM UTC-5, Ian wrote:
> On Fri, Nov 9, 2012 at 2:46 PM, danielk <(E-Mail Removed)> wrote:
>
> > D:\home\python>pytest.py

>
> > Traceback (most recent call last):

>
> > File "D:\home\python\pytest.py", line 1, in <module>

>
> > print(chr(253).decode('latin1'))

>
> > AttributeError: 'str' object has no attribute 'decode'

>
> >

>
> > Do I need to import something?

>
>
>
> Ramit should have written "encode", not "decode". But the above still
>
> would not work, because chr(253) gives you the character at *Unicode*
>
> code point 253, not the character with CP437 ordinal 253 that your
>
> terminal can actually print. The Unicode equivalents of those
>
> characters are:
>
>
>
> >>> list(map(ord, bytes([252, 253, 254]).decode('cp437')))

>
> [8319, 178, 9632]
>
>
>
> So these are what you would need to encode to CP437 for printing.
>
>
>
> >>> print(chr(8319))

>
> ⁿ
>
> >>> print(chr(17)

>
> ²
>
> >>> print(chr(9632))

>
> *
>
>
>
> That's probably not the way you want to go about printing them,
>
> though, unless you mean to be inserting them manually. Is the data
>
> you get from your database a string, or a bytes object? If the
>
> former, just do:
>
>
>
> print(data.encode('cp437'))
>
>
>
> If the latter, then it should be printable as is, unless it is in some
>
> other encoding than CP437.


Ian's solution gives me what I need (thanks Ian!). But I notice a difference between '__str__' and '__repr__'.

class Pytest(str):
def __init__(self, data = None):
if data == None: data = ""
self.data = data

def __repr__(self):
return (self.data).encode('cp437')

>>> import pytest
>>> p = pytest.Pytest("abc" + chr(17 + "def")
>>> print(p)

abc²def
>>> print(p.data)

abc²def
>>> print(type(p.data))

<class 'str'>

If I change '__repr__' to '__str__' then I get:

>>> import pytest
>>> p = pytest.Pytest("abc" + chr(17 + "def")
>>> print(p)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __str__ returned non-string (type bytes)

Why is '__str__' behaving differently than '__repr__' ? I'd like to be ableto use '__str__' because the result is not executable code, it's just a string of the record contents.

The documentation for the 'encode' method says: "Return an encoded version of the string as a bytes object." Yet when I displayed the type, it said itwas <class 'str'>, which I'm taking to be 'type string', or can a 'string'also be 'a string of bytes' ?

I'm trying to get my head around all this codecs/unicode stuff. I haven't had to deal with it until now but I'm determined to not let it get the best of me

My goals are:

a) display a 'raw' database record with the delimiters intact, and
b) allow the client to create a string that represents a database record. So, if they know the record format then they should be able to create a database object like it does above, but with the chr(25x) characters. I will handle the conversion of the chr(25x) characters internally.
 
Reply With Quote
 
Thomas Rachel
Guest
Posts: n/a
 
      11-11-2012
Am 09.11.2012 18:17 schrieb danielk:

> I'm using this character as a delimiter in my application.


Then you probably use the *byte* 254 as opposed to the *character* 254.

So it might be better to either switch to byte strings, or output the
representation of the string instead of itself.

So do print(repr(chr(254))) or, for byte strings, print(bytes([254])).


Thomas
 
Reply With Quote
 
diccon.tesson@gmail.com
Guest
Posts: n/a
 
      4 Weeks Ago
Your handling Pick Multi value fields aren't you
Just hit the same issue, thanks all here for various solutions.
Interfacing with OpenQM / Scarlet DME here.
 
Reply With Quote
 
Mark Lawrence
Guest
Posts: n/a
 
      4 Weeks Ago
On 19/03/2014 13:11, http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> Your handling Pick Multi value fields aren't you
> Just hit the same issue, thanks all here for various solutions.
> Interfacing with OpenQM / Scarlet DME here.
>


The context is conspicious by its absence. In future would you please
be kind enough to provide some.

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com


 
Reply With Quote
 
Zachary Ware
Guest
Posts: n/a
 
      4 Weeks Ago
On 19/03/2014 13:11, (E-Mail Removed) wrote:
> Your handling Pick Multi value fields aren't you
> Just hit the same issue, thanks all here for various solutions.
> Interfacing with OpenQM / Scarlet DME here.


For future posts, please be sure to quote what you're replying to.
Google Groups makes things easy to find and reply to, but this is a
mailing list. When we receive a mail with just a subject line and a
cryptic message, we're likely to think it spam and ignore future mail
from that sender. It's also a bit less than ideal to reply to years
old threads.

On Wed, Mar 19, 2014 at 9:19 AM, Mark Lawrence <(E-Mail Removed)> wrote:
> The context is conspicious by its absence. In future would you please be
> kind enough to provide some.


In a fit of curiosity, I went looking:
https://mail.python.org/pipermail/py...er/634803.html
I'm almost surprised it wasn't any older than that

Ironically, on my way down the November 2012 archive page, I noticed a
long thread about "Obnoxious postings from Google Groups".

--
Zach
 
Reply With Quote
 
Mark Lawrence
Guest
Posts: n/a
 
      4 Weeks Ago
On 19/03/2014 14:43, Zachary Ware wrote:
> Ironically, on my way down the November 2012 archive page, I noticed a
> long thread about "Obnoxious postings from Google Groups".
>


Thankfully the number of grotty postings from gg has dropped
considerably. Sadly our resident unicode expert quite deliberately
continues to use it in a manner which is designed to annoy.

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
ways to check for octets outside of the safe ASCII range? Ivan Shmakov Perl Misc 5 12-12-2011 04:35 AM
Re: convert unicode characters to visibly similar ascii characters Laszlo Nagy Python 6 07-02-2008 04:42 PM
Re: convert unicode characters to visibly similar ascii characters Terry Reedy Python 0 07-01-2008 07:46 PM
help with pix inside->outside + dmz->outside + inside->outside->dmz Jack Cisco 0 09-19-2007 01:57 AM
[FR/EN] how to convert the characters ASCII(0-255) to ASCII(0-127) Alextophi Perl Misc 8 12-30-2005 10:43 AM



Advertisments