Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Problem with Unicode char in Python 3.3.0

Reply
Thread Tools

Problem with Unicode char in Python 3.3.0

 
 
Franck Ditter
Guest
Posts: n/a
 
      01-06-2013
Hi !
I work on MacOS-X Lion and IDLE/Python 3.3.0
I can't get the treble key (U1D11E) !

>>> "\U1D11E"

SyntaxError: (unicode error) 'unicodeescape' codec can't
decode bytes in position 0-6: end of string in escape sequence

How can I display musical keys ?

Thanks,

franck
 
Reply With Quote
 
 
 
 
Peter Otten
Guest
Posts: n/a
 
      01-06-2013
Franck Ditter wrote:

> I work on MacOS-X Lion and IDLE/Python 3.3.0
> I can't get the treble key (U1D11E) !
>
>>>> "\U1D11E"

> SyntaxError: (unicode error) 'unicodeescape' codec can't
> decode bytes in position 0-6: end of string in escape sequence
>
> How can I display musical keys ?


Try
>>> "\U0001D11E"

'π„ž'


 
Reply With Quote
 
 
 
 
marduk
Guest
Posts: n/a
 
      01-06-2013


On Sun, Jan 6, 2013, at 11:43 AM, Franck Ditter wrote:
> Hi !
> I work on MacOS-X Lion and IDLE/Python 3.3.0
> I can't get the treble key (U1D11E) !
>
> >>> "\U1D11E"

> SyntaxError: (unicode error) 'unicodeescape' codec can't
> decode bytes in position 0-6: end of string in escape sequence
>


You probably meant:

>>> '\U0001d11e'



For that synax you must use either '\uXXXX' or '\UXXXXXXXX' (i.e.
specify either 4 or 8 hex digits).

http://docs.python.org/2/howto/unico...on-source-code

 
Reply With Quote
 
Franck Ditter
Guest
Posts: n/a
 
      01-07-2013
In article <mailman.175.1357492817.2939.python->,
marduk <> wrote:

> On Sun, Jan 6, 2013, at 11:43 AM, Franck Ditter wrote:
> > Hi !
> > I work on MacOS-X Lion and IDLE/Python 3.3.0
> > I can't get the treble key (U1D11E) !
> >
> > >>> "\U1D11E"

> > SyntaxError: (unicode error) 'unicodeescape' codec can't
> > decode bytes in position 0-6: end of string in escape sequence
> >

>
> You probably meant:
>
> >>> '\U0001d11e'

>
>
> For that synax you must use either '\uXXXX' or '\UXXXXXXXX' (i.e.
> specify either 4 or 8 hex digits).
>
> http://docs.python.org/2/howto/unico...on-source-code


<<< print('\U0001d11e')
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
print('\U0001d11e')
UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d11e'
in position 0: Non-BMP character not supported in Tk
 
Reply With Quote
 
Chris Angelico
Guest
Posts: n/a
 
      01-07-2013
On Mon, Jan 7, 2013 at 11:57 PM, Franck Ditter <> wrote:
> <<< print('\U0001d11e')
> Traceback (most recent call last):
> File "<pyshell#1>", line 1, in <module>
> print('\U0001d11e')
> UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d11e'
> in position 0: Non-BMP character not supported in Tk


That's a different issue; IDLE can't handle non-BMP characters. Try it
from the terminal if you can - on my Linux systems (Debians and
Ubuntus with GNOME and gnome-terminal), the terminal is set to UTF-8
and quite happily accepts the full Unicode range. On Windows, that may
well not be the case, though.

ChrisA
 
Reply With Quote
 
Terry Reedy
Guest
Posts: n/a
 
      01-07-2013
On 1/7/2013 7:57 AM, Franck Ditter wrote:

> <<< print('\U0001d11e')
> Traceback (most recent call last):
> File "<pyshell#1>", line 1, in <module>
> print('\U0001d11e')
> UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d11e'
> in position 0: Non-BMP character not supported in Tk


The message comes from printing to a tk text widget (the IDLE shell),
not from creating the 1 char string. c = '\U0001d11e' works fine. When
you have problems with creating and printing unicode, *separate*
creating from printing to see where the problem is. (I do not know if
the brand new tcl/tk 8.6 is any better.)

The windows console also chokes, but with a different message.

>>> c='\U0001d11e'
>>> print(c)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Programs\Python33\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_m ap)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001d11e'
in posit
ion 0: character maps to <undefined>

Yes, this is very annoying, especially in Win 7.

--
Terry Jan Reedy

 
Reply With Quote
 
Terry Reedy
Guest
Posts: n/a
 
      01-08-2013
On 1/7/2013 8:12 AM, Terry Reedy wrote:
> On 1/7/2013 7:57 AM, Franck Ditter wrote:
>
>> <<< print('\U0001d11e')
>> Traceback (most recent call last):
>> File "<pyshell#1>", line 1, in <module>
>> print('\U0001d11e')
>> UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d11e'
>> in position 0: Non-BMP character not supported in Tk

>
> The message comes from printing to a tk text widget (the IDLE shell),
> not from creating the 1 char string. c = '\U0001d11e' works fine. When
> you have problems with creating and printing unicode, *separate*
> creating from printing to see where the problem is. (I do not know if
> the brand new tcl/tk 8.6 is any better.)
>
> The windows console also chokes, but with a different message.
>
> >>> c='\U0001d11e'
> >>> print(c)

> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "C:\Programs\Python33\lib\encodings\cp437.py", line 19, in encode
> return codecs.charmap_encode(input,self.errors,encoding_m ap)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\U0001d11e'
> in posit
> ion 0: character maps to <undefined>
>
> Yes, this is very annoying, especially in Win 7.


The above is in 3.3, in which '\U0001d11e' is actually translated to a
length 1 string. In 3.2-, that literal is translated (on 3.2- narrow
builds, as on Windows) to a length 2 string surrogate pair (in the BMP).
On printing, the pair of surrogates got translated to a square box used
for all characters for which the font does not have a glyph. π„žWhen cut
and pasted, it shows in this mail composer as a weird music sign with
peculiar behavior.
3 -s, 3 spaces, paste, 3 spaces, 3 -s, but it may disappear.
--- π„ž ---
So 3.3 is the first Windows version to get the UnicodeEncodeError on
printing.

--
Terry Jan Reedy


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Help for Unicode char and Unicode char based string in Ruby Chirag Mistry Ruby 6 02-08-2008 12:45 PM
(const char *cp) and (char *p) are consistent type, (const char **cpp) and (char **pp) are not consistent lovecreatesbeauty C Programming 1 05-09-2006 08:01 AM
Problem- strcat with char and char indexed from char array aldonnelley@gmail.com C++ 3 04-20-2006 07:32 AM
/usr/bin/ld: ../../dist/lib/libjsdombase_s.a(BlockGrouper.o)(.text+0x98): unresolvable relocation against symbol `std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostre silverburgh.meryl@gmail.com C++ 3 03-09-2006 12:14 AM
the difference between char a[6] and char *p=new char[6] . wwj C++ 7 11-05-2003 12:59 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57