Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Java (http://www.velocityreviews.com/forums/f30-java.html)
-   -   Why No Supplemental Characters In Character Literals? (http://www.velocityreviews.com/forums/t743068-why-no-supplemental-characters-in-character-literals.html)

Lawrence D'Oliveiro 02-04-2011 05:59 AM

Why No Supplemental Characters In Character Literals?
 
Why was it decreed in the language spec that characters beyond U+FFFF are
not allowed in character literals, when they are allowed everywhere else (in
string literals, in the program text, in character and string values etc)?

Lew 02-04-2011 06:34 AM

Re: Why No Supplemental Characters In Character Literals?
 
On 02/04/2011 12:59 AM, Lawrence D'Oliveiro wrote:
> Why was it decreed in the language spec that characters beyond U+FFFF are
> not allowed in character literals, when they are allowed everywhere else (in
> string literals, in the program text, in character and string values etc)?


Because a 'char' type holds only 16 bits.

--
Lew
Ceci n'est pas une fenêtre.
..___________.
|###] | [###|
|##/ | *\##|
|#/ * | \#|
|#----|----#|
|| | * ||
|o * | o|
|_____|_____|
|===========|

Lawrence D'Oliveiro 02-04-2011 06:59 AM

Re: Why No Supplemental Characters In Character Literals?
 
In message <iig6j2$dul$2@news.albasani.net>, Lew wrote:

> On 02/04/2011 12:59 AM, Lawrence D'Oliveiro wrote:
>
>> Why was it decreed in the language spec that characters beyond U+FFFF are
>> not allowed in character literals, when they are allowed everywhere else
>> (in string literals, in the program text, in character and string values
>> etc)?

>
> Because a 'char' type holds only 16 bits.


No it doesn’t. Otherwise you wouldn’t be allowed supplementary characters in
character and string values. Which you are.

Mike Schilling 02-04-2011 08:22 AM

Re: Why No Supplemental Characters In Character Literals?
 


"Lawrence D'Oliveiro" <ldo@geek-central.gen.new_zealand> wrote in message
news:iig84e$uqu$1@lust.ihug.co.nz...
> In message <iig6j2$dul$2@news.albasani.net>, Lew wrote:
>
>> On 02/04/2011 12:59 AM, Lawrence D'Oliveiro wrote:
>>
>>> Why was it decreed in the language spec that characters beyond U+FFFF
>>> are
>>> not allowed in character literals, when they are allowed everywhere else
>>> (in string literals, in the program text, in character and string values
>>> etc)?

>>
>> Because a 'char' type holds only 16 bits.

>
> No it doesn’t. Otherwise you wouldn’t be allowed supplementary characters
> in
> character and string values. Which you are.


Yes, it does (contain 16 bits.) It was defined to do so before there were
supplemental characters, and there was no way to extend it without breaking
compatibility with some older programs.

You can't put a supplementary character in a char. You can put them in
strings, but only encoded as UTF-16, i.e. into two 16-bit chars.


Lew 02-04-2011 12:49 PM

Re: Why No Supplemental Characters In Character Literals?
 
Lawrence D'Oliveiro wrote:
>>>> Why was it decreed in the language spec that characters beyond U+FFFF are
>>>> not allowed in character literals, when they are allowed everywhere else
>>>> (in string literals, in the program text, in character and string values
>>>> etc)?


It takes TWO 'char' values to represent a supplemental character. 'char' !=
"character".

READ the documentation.

Lew wrote:
>>> Because a 'char' type holds only 16 bits.


Lawrence D'Oliveiro wrote:
>> No it doesn’t. Otherwise you wouldn’t be allowed supplementary characters in
>> character and string values. Which you are.


I have an idea for you to try - check the documentation.
<http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2.1>

and you see in §4.2: "... char, whose values are 16-bit unsigned integers ..."

Mike Schilling wrote:
> Yes, it does (contain 16 bits.) It was defined to do so before there were
> supplemental characters, and there was no way to extend it without breaking
> compatibility with some older programs.
>
> You can't put a supplementary character in a char. You can put them in
> strings, but only encoded as UTF-16, i.e. into two 16-bit chars.


As the tutorials and JLS tell you, should you deign to read the documentation.
(It's not a bad idea to do so.)

--
Lew
Ceci n'est pas une fenêtre.
..___________.
|###] | [###|
|##/ | *\##|
|#/ * | \#|
|#----|----#|
|| | * ||
|o * | o|
|_____|_____|
|===========|

Joshua Cranmer 02-04-2011 01:04 PM

Re: Why No Supplemental Characters In Character Literals?
 
On 02/04/2011 01:59 AM, Lawrence D'Oliveiro wrote:
> In message<iig6j2$dul$2@news.albasani.net>, Lew wrote:
>
>> On 02/04/2011 12:59 AM, Lawrence D'Oliveiro wrote:
>>
>>> Why was it decreed in the language spec that characters beyond U+FFFF are
>>> not allowed in character literals, when they are allowed everywhere else
>>> (in string literals, in the program text, in character and string values
>>> etc)?

>>
>> Because a 'char' type holds only 16 bits.

>
> No it doesn’t. Otherwise you wouldn’t be allowed supplementary characters in
> character and string values. Which you are.


The JLS clearly states that a char is an unsigned 16-bit value. Non-BMP
Unicode characters cannot fit in a single unsigned 16-bit value. Where
other literals compile down, you can use these non-BMP characters
because, e.g., Strings are not individual 16-bit values but an array of
them, and can thus safely hold a pair of them.

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

Arne Vajhøj 02-04-2011 03:49 PM

Re: Why No Supplemental Characters In Character Literals?
 
On 04-02-2011 01:59, Lawrence D'Oliveiro wrote:
> In message<iig6j2$dul$2@news.albasani.net>, Lew wrote:
>> On 02/04/2011 12:59 AM, Lawrence D'Oliveiro wrote:
>>> Why was it decreed in the language spec that characters beyond U+FFFF are
>>> not allowed in character literals, when they are allowed everywhere else
>>> (in string literals, in the program text, in character and string values
>>> etc)?

>>
>> Because a 'char' type holds only 16 bits.

>
> No it doesn’t. Otherwise you wouldn’t be allowed supplementary characters in
> character and string values. Which you are.


It is very clearly specified that a Java char is 16 bit.

You can't have the codepoints above U+FFFF in a char.

You can have them in a string but then they actually takes
two chars in that string.

It is rather messy.

If you look at the Java docs for String class you will see:

charAt & codePointAt
length & codePointCount

which is not a nice API.

But since codepoints above U+FFFF was added after the String
class was defined, then the options on how to handle it were
pretty limited.

Arne


Mike Schilling 02-04-2011 05:10 PM

Re: Why No Supplemental Characters In Character Literals?
 


"Arne Vajhøj" <arne@vajhoej.dk> wrote in message
news:4d4c2019$0$23753$14726298@news.sunsite.dk...
> On 04-02-2011 01:59, Lawrence D'Oliveiro wrote:
>> In message<iig6j2$dul$2@news.albasani.net>, Lew wrote:
>>> On 02/04/2011 12:59 AM, Lawrence D'Oliveiro wrote:
>>>> Why was it decreed in the language spec that characters beyond U+FFFF
>>>> are
>>>> not allowed in character literals, when they are allowed everywhere
>>>> else
>>>> (in string literals, in the program text, in character and string
>>>> values
>>>> etc)?
>>>
>>> Because a 'char' type holds only 16 bits.

>>
>> No it doesn’t. Otherwise you wouldn’t be allowed supplementary characters
>> in
>> character and string values. Which you are.

>
> It is very clearly specified that a Java char is 16 bit.
>
> You can't have the codepoints above U+FFFF in a char.
>
> You can have them in a string but then they actually takes
> two chars in that string.
>
> It is rather messy.
>
> If you look at the Java docs for String class you will see:
>
> charAt & codePointAt
> length & codePointCount
>
> which is not a nice API.
>
> But since codepoints above U+FFFF was added after the String
> class was defined, then the options on how to handle it were
> pretty limited.


The sticky issue is, I think, that chars were defined as 16-bit. If that
had been left undefined, they could have been extended to 24 bits, which
would make things nice and regular again.


Arne Vajhøj 02-04-2011 05:33 PM

Re: Why No Supplemental Characters In Character Literals?
 
On 04-02-2011 12:10, Mike Schilling wrote:
>
>
> "Arne Vajhøj" <arne@vajhoej.dk> wrote in message
> news:4d4c2019$0$23753$14726298@news.sunsite.dk...
>> On 04-02-2011 01:59, Lawrence D'Oliveiro wrote:
>>> In message<iig6j2$dul$2@news.albasani.net>, Lew wrote:
>>>> On 02/04/2011 12:59 AM, Lawrence D'Oliveiro wrote:
>>>>> Why was it decreed in the language spec that characters beyond
>>>>> U+FFFF are
>>>>> not allowed in character literals, when they are allowed everywhere
>>>>> else
>>>>> (in string literals, in the program text, in character and string
>>>>> values
>>>>> etc)?
>>>>
>>>> Because a 'char' type holds only 16 bits.
>>>
>>> No it doesn’t. Otherwise you wouldn’t be allowed supplementary
>>> characters in
>>> character and string values. Which you are.

>>
>> It is very clearly specified that a Java char is 16 bit.
>>
>> You can't have the codepoints above U+FFFF in a char.
>>
>> You can have them in a string but then they actually takes
>> two chars in that string.
>>
>> It is rather messy.
>>
>> If you look at the Java docs for String class you will see:
>>
>> charAt & codePointAt
>> length & codePointCount
>>
>> which is not a nice API.
>>
>> But since codepoints above U+FFFF was added after the String
>> class was defined, then the options on how to handle it were
>> pretty limited.

>
> The sticky issue is, I think, that chars were defined as 16-bit. If that
> had been left undefined, they could have been extended to 24 bits, which
> would make things nice and regular again.


Yes.

But having specific bit lengths for all types was huge jump
forward compared to C89 regarding predictability of what code
would do.

Arne


Daniele Futtorovic 02-04-2011 05:37 PM

Re: Why No Supplemental Characters In Character Literals?
 
On 04/02/2011 16:49, Arne Vajhj allegedly wrote:
> It is very clearly specified that a Java char is 16 bit.
>
> You can't have the codepoints above U+FFFF in a char.
>
> You can have them in a string but then they actually takes
> two chars in that string.
>
> It is rather messy.
>
> If you look at the Java docs for String class you will see:
>
> charAt & codePointAt
> length & codePointCount
>
> which is not a nice API.
>
> But since codepoints above U+FFFF was added after the String
> class was defined, then the options on how to handle it were
> pretty limited.


They've added supplementary character support to String, StringBuilder,
StringBuffer.

Pity they haven't touched upon java.lang.CharSequence. Probably out of
concerns about compatibility.

Anyone got an idea how supplementary character support could be
integrated with CharSequence, or more generally, with an interface
describing a sequence of code points? Creating a sub-interface, e.g.
UnicodeSequence with int codePointAt(int), etc. doesn't seem like it'd
do the trick, since a UnicodeSequence /is-not/ a CharSequence (char
charAt(int) doesn't make sense for a UnicodeSequence). Adding a new
interface would mean you don't get the interoperability with all the
parts of the API that uses CharSequences... The only option would seem
to refactor CharSequence and all the classes that use or implement it.
Which means no backwards-compatibility.

Bloody mess this is.

--
DF.



All times are GMT. The time now is 02:24 AM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.