Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Unicode escapes and String literals?

Reply
Thread Tools

Unicode escapes and String literals?

 
 
Lew
Guest
Posts: n/a
 
      12-14-2012
markspace wrote:
> David Lamb wrote:
>> Cute. But presupposing that the OP isn't the idiot some people seem to
> > have assumed, I suspect he meant something more like
>>
>> String line = someBufferedFile.readline();
>> ... change all \u escapes into unicode in "line" ... [1]


That was not obvious to me, hence my question as to what he did mean.

> Maybe. But your code above is obvious, imo. Either Knute had a brain
> fart and forgot about \\ to escape a slash, or he ran into some other
> problem.
>
> My point was that there's a very simple pre-compiler for Java. It
> translates all \u-escapes into characters before the compiler proper
> sees it. There's no difference to the Java compiler between "fed" and
> "\u0066\u0065\u0064". It literally can't tell the difference.


That was also the point of my SSCCE.

> That's an important distinction.


--
Lew
 
Reply With Quote
 
 
 
 
Roedy Green
Guest
Posts: n/a
 
      12-14-2012
On Thu, 13 Dec 2012 09:31:18 -0800, Knute Johnson
<(E-Mail Removed)> wrote, quoted or indirectly quoted someone
who said :

>I just had a great revelation as I was putting together my SSCCE for the
>question I was going to ask. So it has changed my question. How do I
>do the conversion of unicode escape sequences to a String that are done
>by string literals?
>
>String s = "\u0066\u0065\u0064";
>
>becomes "fed" but if you create a String with \u0066\u0065\u0064 in it
>without using the literal it stays \u0066\u0065\u0064. Is there a built
>in mechanism in Java for doing that translation to a String?


have a look at native2ascii

IIRC it uses sequences like that in its ASCII representation which you
can then convert to any encoding you like.

see http://mindprod.com/jgloss/encoding.html#NATIVE2ASCII

A little finite state machine should handle that fairly easily.
If you find that difficult, I would write one for you.

--
Roedy Green Canadian Mind Products http://mindprod.com
Students who hire or con others to do their homework are as foolish
as couch potatoes who hire others to go to the gym for them.
 
Reply With Quote
 
 
 
 
Lew
Guest
Posts: n/a
 
      12-14-2012
rossum wrote:
> Lew wrote:
>>>>> if you create a String with \u0066\u0065\u0064 in it

>>
>>Exactly how?

>
> StringBuilder sb = new StringBuilder(1;
> sb.append('\\');
> sb.append("u0066");
> sb.append('\\');
> sb.append("u0065");
> sb.append('\\');
> sb.append("u0064");
>
> String ss = sb.toString();
> System.out.println(ss);
>
> Produces: \u0066\u0065\u0064
>
> Which still leaves the question why?


This has been explained to death upthread already.

Those are not Unicode escapes, that's why.

You have created the String literal that comprises backslashes, the letter "u" and
various digits. That happens at runtime.

There is no way for the pre-compiler to see those and convert them.

That code sequence is exactly equivalent to this one:

StringBuilder sb = new StringBuilder(\u0031\u003;
sb.append('\u005c\u005c\u0027)\u003b
sb.append("\u0075\u0030\u0030\u0036\u0036");
sb.append('\u005c\u005c\u0027)\u003b
sb.append("u006\u0035\u0022);
sb.append('\u005c\u005c\u0027)\u003b
sb.append(\u0022\u00750064");

Unicode escape sequence processing is a pre-compiler operation, not a compiler
operation and not a run-time operation.

To do what you want you have to parse the string and convert it yourself.

--
Lew
 
Reply With Quote
 
Arne Vajh°j
Guest
Posts: n/a
 
      12-15-2012
On 12/14/2012 5:28 AM, Roedy Green wrote:
> On Thu, 13 Dec 2012 09:31:18 -0800, Knute Johnson
> <(E-Mail Removed)> wrote, quoted or indirectly quoted someone
> who said :
>
>> I just had a great revelation as I was putting together my SSCCE for the
>> question I was going to ask. So it has changed my question. How do I
>> do the conversion of unicode escape sequences to a String that are done
>> by string literals?
>>
>> String s = "\u0066\u0065\u0064";
>>
>> becomes "fed" but if you create a String with \u0066\u0065\u0064 in it
>> without using the literal it stays \u0066\u0065\u0064. Is there a built
>> in mechanism in Java for doing that translation to a String?

>
> have a look at native2ascii
>
> IIRC it uses sequences like that in its ASCII representation which you
> can then convert to any encoding you like.
>
> see http://mindprod.com/jgloss/encoding.html#NATIVE2ASCII


First: it does not do what Knute asked for. It actually
generates the escape sequences that Knute is trying to
convert from.

Second: even it has done whar Knute asked for, then:
- create a file with the String
- use Runtime exec (or ProcessBuilder) to run native2ascii
- read a new String from the new file
seems at the least efficient solution possible.

> A little finite state machine should handle that fairly easily.
> If you find that difficult, I would write one for you.


Based on the above: hmmmmmmm.

Arne


 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      12-17-2012
On Thu, 13 Dec 2012 09:31:18 -0800, Knute Johnson
<(E-Mail Removed)> wrote, quoted or indirectly quoted someone
who said :

>I just had a great revelation as I was putting together my SSCCE for the
>question I was going to ask. So it has changed my question. How do I
>do the conversion of unicode escape sequences to a String that are done
>by string literals?


The code you want exists inside Quoter.

see FromJavaStringLiteral
and ToJavaStringLiteral classes.

Source is available from http://mindprod.com/products.html#QUOTER
you can play with it as an Applet at
http://mindprod.com/applet/quoter.html
--
Roedy Green Canadian Mind Products http://mindprod.com
Students who hire or con others to do their homework are as foolish
as couch potatoes who hire others to go to the gym for them.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Resolving unicode escapes to unicode character Tyler Ruby 1 07-29-2011 01:47 PM
escapes and JSON pbd22 Javascript 5 12-06-2007 01:51 AM
What's the proper way to remove entity escapes from a string? John Nagle Javascript 2 05-05-2007 05:46 PM
Q: quoting string without escapes Xah Lee Python 2 01-31-2005 09:27 PM
POD docs and ANSI escapes Sisyphus Perl Misc 3 09-28-2003 04:36 AM



Advertisments