Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > ordering Japanese text

Reply
Thread Tools

ordering Japanese text

 
 
mab2001@gmail.com
Guest
Posts: n/a
 
      05-04-2006
Hi,

I am using Sadhiro Tomoyuki's Lingua::JA::Sort::JIS module to sort
Japanese names of stores. I have come close to achieving the order my
client has asked for but am having a little difficulty matching their
request exactly. The problem seems to be collating kana glyphs with
manyogana glyphs. (Please excuse me if I am misusing any terms - this
is my first introduction to Japanese.)

Here is an example of 13 store names ordered with
Lingua::JA::Sort::JIS::msort:

1. $B0K@*C0(B JR$B5~ETE9(B
2. $B%"%Z%C%/%9(B $BJ!;3(B
3. $B%"%_%e%W%i%6(B $B</;yEg(B
4. $B%*%/%N(B $B00@n(B
5. $B$5$/$iLnI42_E9(B $B@gBf(B
6. $B$5$D$^20(B $B</;yEg(B
7. $B%9%?%s%9(B $BJF;R(B
8. $B$=$4$&(B $B?@8ME9(B
9. $B$=$4$&(B $B@iMUE9(B
10. $B$=$4$&(B $BBg5\E9(B
11. $B$=$4$&(B $B2#IME9(B
12. $B%@%$%"%b%s%I%7%F%#%"%k%k(B $B3`86(B
13. $B%K%e!<%:(B $B7'K\(B

My client tells me that entry 1 should actually come after the 3rd
entry and before the fourth. From this description on manyogana, I'm
thinking they're saying that collation of the glyph $B0K(B should be based
on its katakana adaptation $B%$(B which makes sense:

http://en.wikipedia.org/wiki/Manyogana

Note I'm basing many of my statements on staring at and comparing these
glyphs online and so I might be far off.

So my questions are:

1. Is my client correct in their ordering?
2. I believe I've tried all the combinations of collation levels and
kanji classes in the Lingua::JA::Sort::JIS jcmp function but have not
achieved the desired ordering. Have I perhaps missed the correct
combination?
3. Is the solution to first convert the manyogana characters to
katakana and then do the msort? If so does anyone know of a Perl module
to do this or a nice reference that I could use more programmatically
than the image on the link above?
4. Can anyone think of any other glyphs or classes of Japanese glyphs
similar to manyogana that I should be worried about?

Thanks for any help you can give me!

Best,
Mike

 
Reply With Quote
 
 
 
 
mab2001@gmail.com
Guest
Posts: n/a
 
      05-06-2006
After a discussion of this on the perl-i18n mailing list, I've come to
understand the problem a bit more. In Japanese, text ordering is based
on phonetization. But as in english, there are multiple pronunciations
of a particular piece of text. Moreover, the "more correct"
pronunciation among the possibilities is influenced by the context of
the text. So in other words, the problem is intractable if all you have
is the text alone and inefficient even if you have more information
(because of the myriad factors that influence pronunciation).

The solution that I am using then is to store with each piece of
kana/kanji text, a kana-only phonetization of that text. I then rely on
the content editors to know the context of the text and supply an
accurate phonetization in kana. (In other words, I'm putting the
responsibility on someone else!) There does exist a determinate
ordering of the kana-only text and so this becomes a tractable problem.

Mike

 
Reply With Quote
 
 
 
 
Guest
Posts: n/a
 
      05-07-2006
In comp.lang.perl.misc wrote:

: The solution that I am using then is to store with each piece of
: kana/kanji text, a kana-only phonetization of that text. I then rely on
: the content editors to know the context of the text and supply an
: accurate phonetization in kana. (In other words, I'm putting the
: responsibility on someone else!) There does exist a determinate
: ordering of the kana-only text and so this becomes a tractable problem.

This is indeed best practice; have a look at Sharp Zaurus and other Japanese
PIMs which regularly offer a "pronounciation" field next to "name in written
form", the first is kana, the latter is kanji.

Oliver.


--
Dr. Oliver Corff e-mail:
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Z-Ordering (Morton ordering) question nbigaouette C Programming 2 11-06-2009 05:26 AM
Form submission Encoding Problem - saving Japanese text to the database boney Java 2 09-01-2006 06:12 PM
Japanese text maol ASP .Net 0 07-07-2006 11:22 AM
Japanese text displayed incorrectly in Linux server Arjunan Venkatesh Java 2 03-15-2005 01:20 AM
Problem in parsing xml document with japanese text Prakash XML 0 01-09-2004 06:51 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57