Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Unicode escapes and String literals?

Reply
Thread Tools

Unicode escapes and String literals?

 
 
Knute Johnson
Guest
Posts: n/a
 
      12-13-2012
I just had a great revelation as I was putting together my SSCCE for the
question I was going to ask. So it has changed my question. How do I
do the conversion of unicode escape sequences to a String that are done
by string literals?

String s = "\u0066\u0065\u0064";

becomes "fed" but if you create a String with \u0066\u0065\u0064 in it
without using the literal it stays \u0066\u0065\u0064. Is there a built
in mechanism in Java for doing that translation to a String?

--

Knute Johnson
 
Reply With Quote
 
 
 
 
Thomas Richter
Guest
Posts: n/a
 
      12-13-2012
On 13.12.2012 18:31, Knute Johnson wrote:
> I just had a great revelation as I was putting together my SSCCE for the
> question I was going to ask. So it has changed my question. How do I do
> the conversion of unicode escape sequences to a String that are done by
> string literals?
>
> String s = "\u0066\u0065\u0064";
>
> becomes "fed" but if you create a String with \u0066\u0065\u0064 in it
> without using the literal it stays \u0066\u0065\u0064. Is there a built
> in mechanism in Java for doing that translation to a String?


Yes. It's called "compiler". The same part of the compiler that
translates a "\t" in a string literal to the TAB control character also
replaces the unicode sequences in the string literal to the
corresponding unicode encoding.

Greetings,
Thomas
 
Reply With Quote
 
 
 
 
Knute Johnson
Guest
Posts: n/a
 
      12-13-2012
On 12/13/2012 9:51 AM, Thomas Richter wrote:
> On 13.12.2012 18:31, Knute Johnson wrote:
>> I just had a great revelation as I was putting together my SSCCE for the
>> question I was going to ask. So it has changed my question. How do I do
>> the conversion of unicode escape sequences to a String that are done by
>> string literals?
>>
>> String s = "\u0066\u0065\u0064";
>>
>> becomes "fed" but if you create a String with \u0066\u0065\u0064 in it
>> without using the literal it stays \u0066\u0065\u0064. Is there a built
>> in mechanism in Java for doing that translation to a String?

>
> Yes. It's called "compiler". The same part of the compiler that
> translates a "\t" in a string literal to the TAB control character also
> replaces the unicode sequences in the string literal to the
> corresponding unicode encoding.
>
> Greetings,
> Thomas


I want to be able to do it to a String not to a string literal.

--

Knute Johnson
 
Reply With Quote
 
Lew
Guest
Posts: n/a
 
      12-13-2012
Knute Johnson wrote:
> Thomas Richter wrote:
>> Knute Johnson wrote:
>>> I just had a great revelation as I was putting together my SSCCE for the
>>> question I was going to ask. So it has changed my question. How do I do
>>> the conversion of unicode [sic] escape sequences to a String that are done by
>>> string literals?


They aren't done by String literals.

>>> String s = "\u0066\u0065\u0064";
>>> becomes "fed" but if you create a String with \u0066\u0065\u0064 in it


Exactly how?

>>> without using the literal it stays \u0066\u0065\u0064. Is there a built
>>> in mechanism in Java for doing that translation to a String?


No.

>> Yes. It's called "compiler". The same part of the compiler that


That's not exactly correct, and it certainly is not the same part that translates '\t'.

>> translates a "\t" in a string literal to the TAB control character also
>> replaces the unicode sequences in the string literal to the
>> corresponding unicode encoding.


Nope.

> I want to be able to do it to a String not to a string literal.


You want to do what, exactly? I'm not clear on what you're trying to accomplish.

'\u' sequences are pre-compile, not during compile. Their presence is exactly equivalent
to typing the corresponding Unicode character directly.

You can embed them in identifiers, directives, anywhere the corresponding character can go.

Not just literals.

For that matter, you can use them in numeric literals.

<sscce>
package temp;

/**
* ShowUnicodeEscapes.
*/
public class ShowUnicodeEscapes {

static final \u0069nt COUN\u0054 = \u0030\u003b

/**
* main.
*
* @param args String array of arguments.
*/
public static void main(String[] args) {
System.out.println("COUNT = \u0022+ COUNT);
}
}
</sscce>
 
Reply With Quote
 
Daniel Pitts
Guest
Posts: n/a
 
      12-13-2012
On 12/13/12 9:31 AM, Knute Johnson wrote:
> I just had a great revelation as I was putting together my SSCCE for the
> question I was going to ask. So it has changed my question. How do I
> do the conversion of unicode escape sequences to a String that are done
> by string literals?
>
> String s = "\u0066\u0065\u0064";
>
> becomes "fed" but if you create a String with \u0066\u0065\u0064 in it
> without using the literal it stays \u0066\u0065\u0064. Is there a built
> in mechanism in Java for doing that translation to a String?
>


Do you mean, you have a String, whose value is "\\u0066\\u0065\\u0064",
you want to pass that String to a method which will return fed.

meaning

String foo = "\\u0066\\u0065\\u0064";

System.out.println(foo); // prints \u0066\u0065\u0064
System.out.println(magicFunction(foo)); // prints fed

There might be such a function in Apache Commons library, but I don't
think there is one in the standard API. I could be wrong though.

 
Reply With Quote
 
Daniel Pitts
Guest
Posts: n/a
 
      12-13-2012
On 12/13/12 11:46 AM, Daniel Pitts wrote:
> On 12/13/12 9:31 AM, Knute Johnson wrote:
>> I just had a great revelation as I was putting together my SSCCE for the
>> question I was going to ask. So it has changed my question. How do I
>> do the conversion of unicode escape sequences to a String that are done
>> by string literals?
>>
>> String s = "\u0066\u0065\u0064";
>>
>> becomes "fed" but if you create a String with \u0066\u0065\u0064 in it
>> without using the literal it stays \u0066\u0065\u0064. Is there a built
>> in mechanism in Java for doing that translation to a String?
>>

>
> Do you mean, you have a String, whose value is "\\u0066\\u0065\\u0064",
> you want to pass that String to a method which will return fed.
>
> meaning
>
> String foo = "\\u0066\\u0065\\u0064";
>
> System.out.println(foo); // prints \u0066\u0065\u0064
> System.out.println(magicFunction(foo)); // prints fed
>
> There might be such a function in Apache Commons library, but I don't
> think there is one in the standard API. I could be wrong though.


Two minutes of googling and reading a stack-overflow post gave me this link:

<http://commons.apache.org/lang/api/org/apache/commons/lang3/StringEscapeUtils.html#unescapeJava%28java.lang.St ring%29>

 
Reply With Quote
 
markspace
Guest
Posts: n/a
 
      12-13-2012
On 12/13/2012 10:47 AM, Knute Johnson wrote:
>
> I want to be able to do it to a String not to a string literal.
>


Daniel showed one way to interpret your request. Here's another. Pay
special attention to the bits out side the quotes. This program prints
"fed".


public class EscapeTest {
public static void main(String[] args) {
String \u0066\u0065\u0064 = "\u0066\u0065\u0064";
System.out.println( fed );
}
}


 
Reply With Quote
 
David Lamb
Guest
Posts: n/a
 
      12-13-2012
On 13/12/2012 3:58 PM, markspace wrote:
> On 12/13/2012 10:47 AM, Knute Johnson wrote:
>>
>> I want to be able to do it to a String not to a string literal.
>>

>
> Daniel showed one way to interpret your request. Here's another. Pay
> special attention to the bits out side the quotes. This program prints
> "fed".
>
>
> public class EscapeTest {
> public static void main(String[] args) {
> String \u0066\u0065\u0064 = "\u0066\u0065\u0064";
> System.out.println( fed );
> }
> }


Cute. But presupposing that the OP isn't the idiot some people seem to
have assumed, I suspect he meant something more like

String line = someBufferedFile.readline();
... change all \u escapes into unicode in "line" ... [1]

where by "\u escapes" he mean the 6-character substrings one usually
types in string literals. The OP needs to look into "code points" and
the corresponding codepoint to Character conversions at
http://docs.oracle.com/javase/7/docs...Character.html

[1] which, for the pedantic, really means "create a new string(buffer)
from line"

 
Reply With Quote
 
markspace
Guest
Posts: n/a
 
      12-13-2012
On 12/13/2012 1:21 PM, David Lamb wrote:
>
> Cute. But presupposing that the OP isn't the idiot some people seem to
> have assumed, I suspect he meant something more like
>
> String line = someBufferedFile.readline();
> ... change all \u escapes into unicode in "line" ... [1]



Maybe. But your code above is obvious, imo. Either Knute had a brain
fart and forgot about \\ to escape a slash, or he ran into some other
problem.

My point was that there's a very simple pre-compiler for Java. It
translates all \u-escapes into characters before the compiler proper
sees it. There's no difference to the Java compiler between "fed" and
"\u0066\u0065\u0064". It literally can't tell the difference.

That's an important distinction.


 
Reply With Quote
 
David Lamb
Guest
Posts: n/a
 
      12-13-2012
On 13/12/2012 5:00 PM, markspace wrote:
> My point was that there's a very simple pre-compiler for Java. It
> translates all \u-escapes into characters before the compiler proper
> sees it. There's no difference to the Java compiler between "fed" and
> "\u0066\u0065\u0064". It literally can't tell the difference.


I should probably have found a different point in the thread to hang my
comment, since you're perfectly correct.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Resolving unicode escapes to unicode character Tyler Ruby 1 07-29-2011 01:47 PM
escapes and JSON pbd22 Javascript 5 12-06-2007 01:51 AM
What's the proper way to remove entity escapes from a string? John Nagle Javascript 2 05-05-2007 05:46 PM
Q: quoting string without escapes Xah Lee Python 2 01-31-2005 09:27 PM
POD docs and ANSI escapes Sisyphus Perl Misc 3 09-28-2003 04:36 AM



Advertisments