Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > [perl-python] unicode study with unicodedata module

Reply
Thread Tools

[perl-python] unicode study with unicodedata module

 
 
Xah Lee
Guest
Posts: n/a
 
      03-15-2005
python has this nice unicodedata module that deals with unicode nicely.

#-*- coding: utf-8 -*-
# python

from unicodedata import *

# each unicode char has a unique name.
# one can use the “lookup” func to find it

mychar=lookup('greek cApital letter sIgma')
# note letter case doesn't matter
print mychar.encode('utf-8')

m=lookup('CJK UNIFIED IDEOGRAPH-5929')
# for some reason, case must be right here.
print m.encode('utf-8')

# to find a char's name, use the “name” function
print name(u'天')

basically, in unicode, each char has a number of attributes (called
properties) besides its name. These attributes provides necessary info
to form letters, words, or processing such as sorting, capitalization,
etc, of varous human scripts. For example, Latin alphabets has two
forms of upper case and lower case. Korean alphabets are stacked
together. While many symbols corresponds to numbers, and there are also

combining forms used for example to put a bar over any letter or
character. Also some writings systems are directional. In order to form

these symbols for display or process them for computing, info of these
on each char is necessary.

the rest of functions in unicodedata return these attributes.

see unicodedata doc:
http://python.org/doc/2.4/lib/module-unicodedata.html

Official word on unicode character properties:
http://www.unicode.org/uni2book/ch04.pdf

--
i don't know what's the state of Perl's unicode. Is there something
similar?

--
this post is archived at
http://xahlee.org/perl-python/unicodedata_module.html

Xah
http://www.velocityreviews.com/forums/(E-Mail Removed)
http://xahlee.org/PageTwo_dir/more.html

 
Reply With Quote
 
 
 
 
Brian McCauley
Guest
Posts: n/a
 
      03-15-2005
Xah Lee wrote:

> i don't know what's the state of Perl's unicode.


perldoc perlunicode

 
Reply With Quote
 
 
 
 
Xah Lee
Guest
Posts: n/a
 
      03-16-2005
here's a snippet of code that prints a range of unicode chars, along
with their ordinal in hex, and name.

chars without a name are skipped. (some of such are undefined code
points.)

On Microsoft Windows the encoding might need to be changed to utf-16.

Change the range to see different unicode chars.

# -*- coding: utf-8 -*-

from unicodedata import *

l=[]
for i in range(0x0000, 0x0fff):
l.append(eval('u"\\u%04x"' % i))

for x in l:
if name(x,'-')!='-':
print x.encode('utf-8'),'|', "%04x"%(ord(x)), '|', name(x,'-')
--
http://xahlee.org/perl-python/unicodedata_module.html

anyone wants to supply a Perl version?

Xah
(E-Mail Removed)
http://xahlee.org/PageTwo_dir/more.html



Brian McCauley wrote:
> Xah Lee wrote:
>
> > i don't know what's the state of Perl's unicode.

>
> perldoc perlunicode


 
Reply With Quote
 
Xah Lee
Guest
Posts: n/a
 
      03-16-2005
**** google incorporated for editing my subject name without
permission.

and **** google incorporated for editing my message content without
permission.

http://xahlee.org/UnixResource_dir/w...e_license.html

Xah
(E-Mail Removed)
http://xahlee.org/PageTwo_dir/more.html

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[perl-python] unicode study with unicodedata module Xah Lee Python 5 03-16-2005 10:08 AM
unicodedata . normalize (NFD - NFC) inconsistency Christos TZOTZIOY Georgiou Python 3 11-10-2004 08:48 AM
unicodedata name for \u000a Ken Beesley Python 7 08-22-2004 04:00 PM
Re: unicodedata name for \u000a Ken Beesley Python 1 08-22-2004 09:52 AM
Unicode 4.0 updates to unicodedata? David Opstad Python 1 09-19-2003 04:52 AM



Advertisments