Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > convert Java unicode escape to utf8

Reply
Thread Tools

convert Java unicode escape to utf8

 
 
Jeff Higgins
Guest
Posts: n/a
 
      07-06-2007
Hi,
How can I convert a String containing a
Java Unicode escape sequence to a String
containing the equivalent UTF8 representation?

For instance "\u4f55" -> "e4bd95"

Thanks,
Jeff Higgins


 
Reply With Quote
 
 
 
 
SadRed
Guest
Posts: n/a
 
      07-06-2007
On Jul 6, 1:03 pm, "Jeff Higgins" <(E-Mail Removed)> wrote:
> Hi,
> How can I convert a String containing a
> Java Unicode escape sequence to a String
> containing the equivalent UTF8 representation?
>
> For instance "\u4f55" -> "e4bd95"
>
> Thanks,
> Jeff Higgins


See Unicode standard documentation.
This might be handy for UTF-8 encoding:
http://homepage1.nifty.com/algafield/core0.html

 
Reply With Quote
 
 
 
 
bugbear
Guest
Posts: n/a
 
      07-06-2007
Jeff Higgins wrote:
> Hi,
> How can I convert a String containing a
> Java Unicode escape sequence to a String
> containing the equivalent UTF8 representation?
>
> For instance "\u4f55" -> "e4bd95"


You mean a string containing the hex representation
for the UTF-8 bytes encoding of the string?

Or do you mean a byte array containing utf-8 bytes?

In Java, a string contains "characters" which are
UTF-16.

So a string never contains a "unicode escape sequence",
it merely contains a character. It is the compiler
which turns the escape sequence in your source code
into a "true" string.

BugBear
 
Reply With Quote
 
bugbear
Guest
Posts: n/a
 
      07-06-2007
bugbear wrote:
> Jeff Higgins wrote:
>> Hi,
>> How can I convert a String containing a
>> Java Unicode escape sequence to a String
>> containing the equivalent UTF8 representation?
>>
>> For instance "\u4f55" -> "e4bd95"

>
> You mean a string containing the hex representation
> for the UTF-8 bytes encoding of the string?
>
> Or do you mean a byte array containing utf-8 bytes?


String str = "\u4f55";
ByteArrayOutputStream baos = new ByteArrayOutputStream();
Charset cs1 = Charset.forName("UTF-8");
OutputStreamWriter osw = new OutputStreamWriter(baos, cs1);
osw.write(str);
byte want[] = baos.toByteArray();

(neither compiled nor tested)

BugBear
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      07-06-2007
On Fri, 6 Jul 2007 00:03:52 -0400, "Jeff Higgins"
<(E-Mail Removed)> wrote, quoted or indirectly quoted someone who
said :

>
>For instance "\u4f55" -> "e4bd95"


If by that \u4f55 you mean a single 16-bit char, you just have to
write to a Writer specifying UTF-8 as your encoding. See
http://mindprod.com/applets/fileio.html for sample code.

If by that \u4f55 your mean 6 8-bit ASCII characters, nativetoascii
will convert it to other encodings. see
http://mindprod.com/jgloss/native2asciiexe.html and
http://mindprod.com/jgloss/encoding.html
for details

--
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      07-06-2007
On Fri, 6 Jul 2007 00:03:52 -0400, "Jeff Higgins"
<(E-Mail Removed)> wrote, quoted or indirectly quoted someone who
said :

>How can I convert a String containing a
>Java Unicode escape sequence to a String
>containing the equivalent UTF8 representation?
>
>For instance "\u4f55" -> "e4bd95"


If for some reason you wanted to roll your own utility, the code for
UTF-8 reading and writing its at http://mindprod.com/jgloss/utf.html

The code is primarily to help you understand the format.
--
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
 
Reply With Quote
 
Jeff Higgins
Guest
Posts: n/a
 
      07-07-2007
Jeff Higgins wrote:
> Hi,
> How can I convert a String containing a
> Java Unicode escape sequence to a String
> containing the equivalent UTF8 representation?
>
> For instance "\u4f55" -> "e4bd95"
>
> Thanks,
> Jeff Higgins
>


Ok,
Thanks everyone for the generous responses.
SadRed for the pointer to the UTF8 definition.
I found it kind of hard to follow at first, but
now that I've found some code to follow along
with, it's making more sense. Bugbear for the
NIO example, as you can see I struggle with basic
IO now I need to understand wrapping and flipping.
And Roedy whose excellent mindprod site has been
a continuing source of enlightenment, Thanks.

Anyway,
for anyone else who read my OP and was
only able to shake their head in amazement at
it's utter incomprehensibility, here is what I
had \really\ hoped to accomplish.

How to encode a Unicode scalar value in UTF8?

public class Encode
{
public static void main(String[] args)
{
int[] intArray = {0x4f55};
byte[] byteArray = encode(intArray);
for(byte b : byteArray)
{
System.out.print(Integer.toString((b & 0xff) + 0x100,
16).substring(1));
}
}
}

prints e4bd95

where encode(int[]) is a method described at:
<http://developers.sun.com/dev/gadc/technicalpublications/articles/utf8.html>



 
Reply With Quote
 
Hendrik Maryns
Guest
Posts: n/a
 
      07-11-2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jeff Higgins schreef:
> Jeff Higgins wrote:
>> Hi,
>> How can I convert a String containing a
>> Java Unicode escape sequence to a String
>> containing the equivalent UTF8 representation?
>>
>> For instance "\u4f55" -> "e4bd95"
>>
>> Thanks,
>> Jeff Higgins
>>

>
> Ok,
> Thanks everyone for the generous responses.
> SadRed for the pointer to the UTF8 definition.
> I found it kind of hard to follow at first, but
> now that I've found some code to follow along
> with, it's making more sense. Bugbear for the
> NIO example, as you can see I struggle with basic
> IO now I need to understand wrapping and flipping.
> And Roedy whose excellent mindprod site has been
> a continuing source of enlightenment, Thanks.
>
> Anyway,
> for anyone else who read my OP and was
> only able to shake their head in amazement at
> it's utter incomprehensibility, here is what I
> had \really\ hoped to accomplish.
>
> How to encode a Unicode scalar value in UTF8?
>
> public class Encode
> {
> public static void main(String[] args)
> {
> int[] intArray = {0x4f55};
> byte[] byteArray = encode(intArray);
> for(byte b : byteArray)
> {
> System.out.print(Integer.toString((b & 0xff) + 0x100,
> 16).substring(1));
> }
> }
> }
>
> prints e4bd95
>
> where encode(int[]) is a method described at:
> <http://developers.sun.com/dev/gadc/technicalpublications/articles/utf8.html>


Ok, I found out what the & 0xff is for, but mind explaining me why you
do + 0x100?

H.
- --
Hendrik Maryns
http://tcl.sfs.uni-tuebingen.de/~hendrik/
==================
http://aouw.org
Ask smart questions, get good answers:
http://www.catb.org/~esr/faqs/smart-questions.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGlLb8e+7xMGD3itQRAuRLAJ4uKGKORPEssjckqmIX62 FKq5vMygCdFlVt
VbcSYyfnmH53D+SyIhrB7Ik=
=3b6+
-----END PGP SIGNATURE-----
 
Reply With Quote
 
Jeff Higgins
Guest
Posts: n/a
 
      07-11-2007

Hendrik Maryns wrote:
> Jeff Higgins schreef:
>> Jeff Higgins wrote:
>>> Hi,
>>> How can I convert a String containing a
>>> Java Unicode escape sequence to a String
>>> containing the equivalent UTF8 representation?
>>>
>>> For instance "\u4f55" -> "e4bd95"
>>>
>>> Thanks,
>>> Jeff Higgins
>>>

>>
>> Ok,
>> Thanks everyone for the generous responses.
>> SadRed for the pointer to the UTF8 definition.
>> I found it kind of hard to follow at first, but
>> now that I've found some code to follow along
>> with, it's making more sense. Bugbear for the
>> NIO example, as you can see I struggle with basic
>> IO now I need to understand wrapping and flipping.
>> And Roedy whose excellent mindprod site has been
>> a continuing source of enlightenment, Thanks.
>>
>> Anyway,
>> for anyone else who read my OP and was
>> only able to shake their head in amazement at
>> it's utter incomprehensibility, here is what I
>> had \really\ hoped to accomplish.
>>
>> How to encode a Unicode scalar value in UTF8?
>>
>> public class Encode
>> {
>> public static void main(String[] args)
>> {
>> int[] intArray = {0x4f55};
>> byte[] byteArray = encode(intArray);
>> for(byte b : byteArray)
>> {
>> System.out.print(Integer.toString((b & 0xff) + 0x100,
>> 16).substring(1));
>> }
>> }
>> }
>>
>> prints e4bd95
>>
>> where encode(int[]) is a method described at:
>> <http://developers.sun.com/dev/gadc/technicalpublications/articles/utf8.html>

>
> Ok, I found out what the & 0xff is for, but mind explaining me why you
> do + 0x100?
>


Well, quite frankly because Roedy Green told me to. Or rather showed
the technique \somewhere\ on his mindprod site. I can't find it now.

Boiled down, the code that produced the result follows.
I have no idea how it works, except that it seems to produce the desired
result.
Now you have caused me to have to twiddle bits until I understand.

Thanks,
JH

public class Test
{
public static void main(String[] args)
{
int in = 0x4f55;
byte[] out = new byte[3];
out[0] = (byte)(in >> 12 | 0xE0);
out[1] = (byte)(in >> 6 & 0x3F | 0x80);
out[2] = (byte)(in & 0x3F | 0x80);
for(byte b : out)
{
System.out.print(Integer.toString((b & 0xff + 0x100),
16).substring(1));
}
}
}


 
Reply With Quote
 
Jeff Higgins
Guest
Posts: n/a
 
      07-11-2007

Jeff Higgins wrote:
> Hendrik Maryns wrote:
>> Jeff Higgins schreef:
>>> Jeff Higgins wrote:
>>> How to encode a Unicode scalar value in UTF8?
>>>
>>> public class Encode
>>> {
>>> public static void main(String[] args)
>>> {
>>> int[] intArray = {0x4f55};
>>> byte[] byteArray = encode(intArray);
>>> for(byte b : byteArray)
>>> {
>>> System.out.print(Integer.toString((b & 0xff) + 0x100,
>>> 16).substring(1));
>>> }
>>> }
>>> }
>>>
>>> prints e4bd95
>>>
>>> where encode(int[]) is a method described at:
>>> <http://developers.sun.com/dev/gadc/technicalpublications/articles/utf8.html>

>>
>> Ok, I found out what the & 0xff is for, but mind explaining me why you
>> do + 0x100?
>>

>
> Well, quite frankly because Roedy Green told me to. Or rather showed
> the technique \somewhere\ on his mindprod site. I can't find it now.
>


OK,
Wish I could find it on mindprod site, but can't.
Must have served another purpose.
This works.

System.out.println(Integer.toString((b & 0xff),16));

> Boiled down, the code that produced the result follows.
> I have no idea how it works, except that it seems to produce the desired
> result.
> Now you have caused me to have to twiddle bits until I understand.
>
> Thanks,
> JH
>
> public class Test
> {
> public static void main(String[] args)
> {
> int in = 0x4f55;
> byte[] out = new byte[3];
> out[0] = (byte)(in >> 12 | 0xE0);
> out[1] = (byte)(in >> 6 & 0x3F | 0x80);
> out[2] = (byte)(in & 0x3F | 0x80);
> for(byte b : out)
> {
> System.out.print(Integer.toString((b & 0xff + 0x100),
> 16).substring(1));
> }
> }
> }
>



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
given char* utf8, how to read unicode line by line, and output utf8 gry C++ 2 03-13-2012 04:32 AM
Re: Convert unicode escape sequences to unicode in a file Jeremy Python 0 01-11-2011 11:39 PM
Convert unicode escape sequences to unicode in a file Jeremy Python 1 01-11-2011 10:36 PM
Getting unicode escape sequence from unicode character? Kenneth McDonald Python 1 12-27-2006 10:27 PM
Convert UTF-8 encoded data from file into unicode escape Fritz Bayer Java 5 10-25-2004 06:07 AM



Advertisments