Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Re: Isn't java.lang.Character.html#{ isLetterFromLang(int codePoint,String ISOLangDef) missing from the spec?

Reply
Thread Tools

Re: Isn't java.lang.Character.html#{ isLetterFromLang(int codePoint,String ISOLangDef) missing from the spec?

 
 
Arne Vajhj
Guest
Posts: n/a
 
      12-05-2010
On 04-12-2010 20:34, http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
>> The concept will be fundamentally broken if one language
>> has more than one alphabet (I don't know if such case exist,
>> but it could).

> ~
> Well, there are plenty of languages using more than one alphabet. Japanese comes to mind:
> ~
> http://en.wikipedia.org/wiki/Japanese_writing_system
> ~
> It actually uses 4 writing systems: Kanji, Hiragana, Katakana, Rōmaji


Then the function is unimplementable.

>> And the benefits are very limited given the practice
>> of writing names as they are in their native language
>> even though the letters are not used in the language
>> of the text.

> ~
> If all you take into account are nominal entries (POS bearing a name) in language,
>

I don't think that the benefits are -very limited-, since in every
language
> those are a very small part of all it goes on


Names and specialized terms and phrases are in a lot of text.

Arne

 
Reply With Quote
 
 
 
 
Tom Anderson
Guest
Posts: n/a
 
      12-05-2010
On Sat, 4 Dec 2010, Arne Vajh?j wrote:

> On 04-12-2010 20:34, (E-Mail Removed) wrote:
>>> The concept will be fundamentally broken if one language
>>> has more than one alphabet (I don't know if such case exist,
>>> but it could).

>> ~
>> Well, there are plenty of languages using more than one alphabet.
>> Japanese comes to mind:
>> ~
>> http://en.wikipedia.org/wiki/Japanese_writing_system
>> ~
>> It actually uses 4 writing systems: Kanji, Hiragana, Katakana,
>> Rōmaji

>
> Then the function is unimplementable.


Why? Why doesn't the function simply return true for glyphs from any of
those groups and "ja"?

I note that in particular, it might return true for the fullwidth romaji,
but not the standard width ones.

tom

--
Now I am thoroughly confused. -- Colin Brace sums up RT3090 support
in Linux
 
Reply With Quote
 
 
 
 
Arne Vajhj
Guest
Posts: n/a
 
      12-05-2010
On 05-12-2010 07:07, Tom Anderson wrote:
> On Sat, 4 Dec 2010, Arne Vajh?j wrote:
>> On 04-12-2010 20:34, (E-Mail Removed) wrote:
>>>> The concept will be fundamentally broken if one language
>>>> has more than one alphabet (I don't know if such case exist,
>>>> but it could).
>>> ~
>>> Well, there are plenty of languages using more than one alphabet.
>>> Japanese comes to mind:
>>> ~
>>> http://en.wikipedia.org/wiki/Japanese_writing_system
>>> ~
>>> It actually uses 4 writing systems: Kanji, Hiragana, Katakana,
>>> Rōmaji

>>
>> Then the function is unimplementable.

>
> Why? Why doesn't the function simply return true for glyphs from any of
> those groups and "ja"?
>
> I note that in particular, it might return true for the fullwidth
> romaji, but not the standard width ones.


I guess it could.

But then what does the result mean?

It does not mean that the code point is valid in any
text in that language.

Arne

 
Reply With Quote
 
Tom Anderson
Guest
Posts: n/a
 
      12-05-2010
On Sun, 5 Dec 2010, Arne Vajh?j wrote:

> On 05-12-2010 07:07, Tom Anderson wrote:
>> On Sat, 4 Dec 2010, Arne Vajh?j wrote:
>>> On 04-12-2010 20:34, (E-Mail Removed) wrote:
>>>>> The concept will be fundamentally broken if one language
>>>>> has more than one alphabet (I don't know if such case exist,
>>>>> but it could).
>>>> ~
>>>> Well, there are plenty of languages using more than one alphabet.
>>>> Japanese comes to mind:
>>>> ~
>>>> http://en.wikipedia.org/wiki/Japanese_writing_system
>>>> ~
>>>> It actually uses 4 writing systems: Kanji, Hiragana, Katakana,
>>>> Rōmaji
>>>
>>> Then the function is unimplementable.

>>
>> Why? Why doesn't the function simply return true for glyphs from any of
>> those groups and "ja"?
>>
>> I note that in particular, it might return true for the fullwidth
>> romaji, but not the standard width ones.

>
> I guess it could.
>
> But then what does the result mean?


It means that the character could be part of a text in that language.

> It does not mean that the code point is valid in any text in that
> language.


Surely that's exactly what it means?

tom

--
The coolest thing to do with your data will be thought of by someone
else. -- Rufus Pollock
 
Reply With Quote
 
Arne Vajhj
Guest
Posts: n/a
 
      12-05-2010
On 05-12-2010 10:10, Tom Anderson wrote:
> On Sun, 5 Dec 2010, Arne Vajh?j wrote:
>> On 05-12-2010 07:07, Tom Anderson wrote:
>>> On Sat, 4 Dec 2010, Arne Vajh?j wrote:
>>>> On 04-12-2010 20:34, (E-Mail Removed) wrote:
>>>>>> The concept will be fundamentally broken if one language
>>>>>> has more than one alphabet (I don't know if such case exist,
>>>>>> but it could).
>>>>> ~
>>>>> Well, there are plenty of languages using more than one alphabet.
>>>>> Japanese comes to mind:
>>>>> ~
>>>>> http://en.wikipedia.org/wiki/Japanese_writing_system
>>>>> ~
>>>>> It actually uses 4 writing systems: Kanji, Hiragana, Katakana,
>>>>> Rōmaji
>>>>
>>>> Then the function is unimplementable.
>>>
>>> Why? Why doesn't the function simply return true for glyphs from any of
>>> those groups and "ja"?
>>>
>>> I note that in particular, it might return true for the fullwidth
>>> romaji, but not the standard width ones.

>>
>> I guess it could.
>>
>> But then what does the result mean?

>
> It means that the character could be part of a text in that language.
>
>> It does not mean that the code point is valid in any text in that
>> language.

>
> Surely that's exactly what it means?


No.

The difference is between any and some.

With that semantics I find the function useless.

isLetterFromAlphabet may make more sense. If Alphabet is
sufficient well defined.

Arne

 
Reply With Quote
 
Tom Anderson
Guest
Posts: n/a
 
      12-06-2010
On Sun, 5 Dec 2010, Arne Vajh?j wrote:

> On 05-12-2010 10:10, Tom Anderson wrote:
>> On Sun, 5 Dec 2010, Arne Vajh?j wrote:
>>> On 05-12-2010 07:07, Tom Anderson wrote:
>>>> On Sat, 4 Dec 2010, Arne Vajh?j wrote:
>>>>> On 04-12-2010 20:34, (E-Mail Removed) wrote:
>>>>>>> The concept will be fundamentally broken if one language
>>>>>>> has more than one alphabet (I don't know if such case exist,
>>>>>>> but it could).
>>>>>> ~
>>>>>> Well, there are plenty of languages using more than one alphabet.
>>>>>> Japanese comes to mind:
>>>>>> ~
>>>>>> http://en.wikipedia.org/wiki/Japanese_writing_system
>>>>>> ~
>>>>>> It actually uses 4 writing systems: Kanji, Hiragana, Katakana,
>>>>>> Rōmaji
>>>>>
>>>>> Then the function is unimplementable.
>>>>
>>>> Why? Why doesn't the function simply return true for glyphs from any of
>>>> those groups and "ja"?
>>>>
>>>> I note that in particular, it might return true for the fullwidth
>>>> romaji, but not the standard width ones.
>>>
>>> I guess it could.
>>>
>>> But then what does the result mean?

>>
>> It means that the character could be part of a text in that language.
>>
>>> It does not mean that the code point is valid in any text in that
>>> language.

>>
>> Surely that's exactly what it means?

>
> No.
>
> The difference is between any and some.
>
> With that semantics I find the function useless.


I'm sorry to hear that. I don't.

Perhaps the function should return a result from an enum -
NOT_IN_THIS_LANGUAGE, USED_IN_THIS_LANGUAGE,
EXCLUSIVELY_USED_IN_THIS_LANGUAGE.

> isLetterFromAlphabet may make more sense. If Alphabet is sufficient well
> defined.


That could certainly be handy too. Since you could fairly easily
construct a many-to-many alphabet -> language mapping, you could implement
the original function on top of it.

Even better might be a function Set<Language>
languagesWhichUseThisCharacter(). Or perhaps, applying your idea,
Set<Script> scriptsWhichUseThisCharacter, with Script having a
Set<Language> languagesWrittenInThisScript().

tom

--
non, scarecrow, forensics, rituals, bacteria, scientific instruments, ..
 
Reply With Quote
 
Arne Vajhj
Guest
Posts: n/a
 
      12-07-2010
On 06-12-2010 07:48, Tom Anderson wrote:
> On Sun, 5 Dec 2010, Arne Vajh?j wrote:
>
>> On 05-12-2010 10:10, Tom Anderson wrote:
>>> On Sun, 5 Dec 2010, Arne Vajh?j wrote:
>>>> On 05-12-2010 07:07, Tom Anderson wrote:
>>>>> On Sat, 4 Dec 2010, Arne Vajh?j wrote:
>>>>>> On 04-12-2010 20:34, (E-Mail Removed) wrote:
>>>>>>>> The concept will be fundamentally broken if one language
>>>>>>>> has more than one alphabet (I don't know if such case exist,
>>>>>>>> but it could).
>>>>>>> ~
>>>>>>> Well, there are plenty of languages using more than one alphabet.
>>>>>>> Japanese comes to mind:
>>>>>>> ~
>>>>>>> http://en.wikipedia.org/wiki/Japanese_writing_system
>>>>>>> ~
>>>>>>> It actually uses 4 writing systems: Kanji, Hiragana, Katakana,
>>>>>>> Rōmaji
>>>>>>
>>>>>> Then the function is unimplementable.
>>>>>
>>>>> Why? Why doesn't the function simply return true for glyphs from
>>>>> any of
>>>>> those groups and "ja"?
>>>>>
>>>>> I note that in particular, it might return true for the fullwidth
>>>>> romaji, but not the standard width ones.
>>>>
>>>> I guess it could.
>>>>
>>>> But then what does the result mean?
>>>
>>> It means that the character could be part of a text in that language.
>>>
>>>> It does not mean that the code point is valid in any text in that
>>>> language.
>>>
>>> Surely that's exactly what it means?

>>
>> No.
>>
>> The difference is between any and some.
>>
>> With that semantics I find the function useless.

>
> I'm sorry to hear that. I don't.
>
> Perhaps the function should return a result from an enum -
> NOT_IN_THIS_LANGUAGE, USED_IN_THIS_LANGUAGE,
> EXCLUSIVELY_USED_IN_THIS_LANGUAGE.


Which is not related at all to the problem I am describing!?!?

>> isLetterFromAlphabet may make more sense. If Alphabet is sufficient
>> well defined.

>
> That could certainly be handy too. Since you could fairly easily
> construct a many-to-many alphabet -> language mapping, you could
> implement the original function on top of it.


You could.

But I can still not see the value of it.

Arne
 
Reply With Quote
 
Tom Anderson
Guest
Posts: n/a
 
      12-07-2010
On Mon, 6 Dec 2010, Arne Vajh?j wrote:

> On 06-12-2010 07:48, Tom Anderson wrote:
>> On Sun, 5 Dec 2010, Arne Vajh?j wrote:
>>
>>> On 05-12-2010 10:10, Tom Anderson wrote:
>>>> On Sun, 5 Dec 2010, Arne Vajh?j wrote:
>>>>> On 05-12-2010 07:07, Tom Anderson wrote:
>>>>>> On Sat, 4 Dec 2010, Arne Vajh?j wrote:
>>>>>>> On 04-12-2010 20:34, (E-Mail Removed) wrote:
>>>>>>>>> The concept will be fundamentally broken if one language
>>>>>>>>> has more than one alphabet (I don't know if such case exist,
>>>>>>>>> but it could).
>>>>>>>> ~
>>>>>>>> Well, there are plenty of languages using more than one alphabet.
>>>>>>>> Japanese comes to mind:
>>>>>>>> ~
>>>>>>>> http://en.wikipedia.org/wiki/Japanese_writing_system
>>>>>>>> ~
>>>>>>>> It actually uses 4 writing systems: Kanji, Hiragana, Katakana,
>>>>>>>> Rōmaji
>>>>>>>
>>>>>>> Then the function is unimplementable.
>>>>>>
>>>>>> Why? Why doesn't the function simply return true for glyphs from
>>>>>> any of
>>>>>> those groups and "ja"?
>>>>>>
>>>>>> I note that in particular, it might return true for the fullwidth
>>>>>> romaji, but not the standard width ones.
>>>>>
>>>>> I guess it could.
>>>>>
>>>>> But then what does the result mean?
>>>>
>>>> It means that the character could be part of a text in that language.
>>>>
>>>>> It does not mean that the code point is valid in any text in that
>>>>> language.
>>>>
>>>> Surely that's exactly what it means?
>>>
>>> No.
>>>
>>> The difference is between any and some.
>>>
>>> With that semantics I find the function useless.

>>
>> I'm sorry to hear that. I don't.
>>
>> Perhaps the function should return a result from an enum -
>> NOT_IN_THIS_LANGUAGE, USED_IN_THIS_LANGUAGE,
>> EXCLUSIVELY_USED_IN_THIS_LANGUAGE.

>
> Which is not related at all to the problem I am describing!?!?


Then at least one of us has misunderstood the other. Could you restate
your problem?

>>> isLetterFromAlphabet may make more sense. If Alphabet is sufficient
>>> well defined.

>>
>> That could certainly be handy too. Since you could fairly easily
>> construct a many-to-many alphabet -> language mapping, you could
>> implement the original function on top of it.

>
> You could.
>
> But I can still not see the value of it.


You could take some text and produce a set of languages it could possibly
be from. You wouldn't be able to tell many European languages apart, but
you could tell runs of typical Japanese, Chinese, Hindi, Arabic, Urdu, etc
apart.

tom

--
a moratorium on the future
 
Reply With Quote
 
Lew
Guest
Posts: n/a
 
      12-07-2010
Tom Anderson wrote:
> You could take some text and produce a set of languages it could
> possibly be from. You wouldn't be able to tell many European languages
> apart, but you could tell runs of typical Japanese, Chinese, Hindi,
> Arabic, Urdu, etc apart.


You can tell about some things regarding runs of "typical" characters, but
cannot reliably rate an entire document. Suppose this post were about Asian
art(or má ji*ng), and I mention the "four gentlemen", 四君*.
<http://en.wikipedia.org/wiki/Four_Gentlemen>

That run of ideograms is the same in Chinese, Japanese, Korean and Vietnamese.
Which one is it? Is this post in English or one of those four languages?

The run of characters "má ji*ng" - what language is that?

--
Lew
 
Reply With Quote
 
Tom Anderson
Guest
Posts: n/a
 
      12-07-2010
On Tue, 7 Dec 2010, Lew wrote:

> Tom Anderson wrote:
>> You could take some text and produce a set of languages it could
>> possibly be from. You wouldn't be able to tell many European languages
>> apart, but you could tell runs of typical Japanese, Chinese, Hindi,
>> Arabic, Urdu, etc apart.

>
> You can tell about some things regarding runs of "typical" characters, but
> cannot reliably rate an entire document. Suppose this post were about Asian
> art(or m? ji?ng), and I mention the "four gentlemen", ???.
> <http://en.wikipedia.org/wiki/Four_Gentlemen>
>
> That run of ideograms is the same in Chinese, Japanese, Korean and
> Vietnamese. Which one is it? Is this post in English or one of those four
> languages?


I'd conclude that it couldn't be in any one language.

If you were interested in mixed-language text, you could do set covering
on the results (hoping that the number of different
sets-of-possible-languages is small enough that nobody notices you're
solving an NP-hard problem), to find out possible sets of languages that
could be in the mix. I'd hope you'd end up with {English, Chinese},
{English, Japanese}, and so on.

> The run of characters "m? ji?ng" - what language is that?


Looks like Hungarian to me.

tom

--
09F911029D74E35BD84156C5635688C0 -- AACS Licensing Administrator
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Crystal Reports - Visual Basic UFL that implements this function is missing (or U2lcom.dll is missing) Les Caudle ASP .Net 3 09-03-2007 02:27 AM
Re: missing feature classes and missing fields Gary Herron Python 2 07-04-2006 10:29 PM
missing wzcdlg.dll =?Utf-8?B?RGFuZGVl?= Wireless Networking 4 11-20-2004 02:14 PM
Missing Wireless Link applet =?Utf-8?B?Q2hyaXNzaWU=?= Wireless Networking 8 10-07-2004 07:24 PM
Missing Websites and no mail =?Utf-8?B?U2hpZnR3b3JrZXI0Mw==?= Wireless Networking 0 09-11-2004 12:31 PM



Advertisments