Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Re: number of bytes for each (uni)code point while using utf-8 asencoding ...

Reply
Thread Tools

Re: number of bytes for each (uni)code point while using utf-8 asencoding ...

 
 
Jason Bailey
Guest
Posts: n/a
 
      07-12-2012
There's an incorrect assumption here. CharBuffer.get returns a char. A
char can represent 1 or 2 bytes based on the encoding, but it is not a
codepoint. 2 chars are needed to represent the extended UTF-16.

If you want to determine how many bytes(either 1 or 2) that your char
represents, just do a comparison

boolena is2bytes = (MptChrBfr.get() >> 2) > 0;

Here you're taking the char and bit shifting it right twice. if there
are any values left, it would have required two bytes to create it.

if you want to know if the char you received is part of a bigger
codepoint. The Charachter class now has number or supporting methods.

Character.isHighSurrogate(MptChrBfr.get());

would tell you if it is a leading edge of a codepoint.

I'd look at the new methods on the Character and String class. dealing
with chars is a bit cumbersome. Just load everything into a string and
you can see the number of bytes that it takes up and if you want to know
the number of codepoints do a String.codePointCount

-jason


On 7/10/2012 6:21 AM, lbrt chx _ gemale wrote:

<snip>
> for (int j = 0; (j< MptChrBfr.length()); ++j){
> MptChrBfr.get();
> }
> ...
> ~
> each time you get() a unicode point from the buffer, you will get from 1 to 4 bytes and the sum of all "lengths" should equal the file length in bytes, right?
> ~
> I am using the (new) nio in java 7 and I wonder if sun made changes which make hard getting lenghts of bytes a unicode point needs
> ~
> How can you get the number of bytes you "get()"?
> ~
> thank you
> lbrtchx
> comp.lang.java.programmer: number of bytes for each (uni)code point while using utf-8 as encoding ...


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: number of bytes for each (uni)code point while using utf-8 asencoding ... Lew Java 0 07-11-2012 09:05 PM
Re: number of bytes for each (uni)code point while using utf-8 asencoding ... Robert Klemme Java 0 07-11-2012 08:03 PM
Re: number of bytes for each (uni)code point while using utf-8 asencoding ... Daniele Futtorovic Java 1 07-10-2012 09:17 PM
Re: number of bytes for each (uni)code point while using utf-8 asencoding ... Lew Java 0 07-10-2012 07:57 PM
Re: number of bytes for each (uni)code point while using utf-8 asencoding ... Daniele Futtorovic Java 0 07-10-2012 06:13 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57