Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Java (http://www.velocityreviews.com/forums/f30-java.html)
-   -   Unicode escapes and String literals? (http://www.velocityreviews.com/forums/t955459-unicode-escapes-and-string-literals.html)

Knute Johnson 12-13-2012 05:31 PM

Unicode escapes and String literals?
 
I just had a great revelation as I was putting together my SSCCE for the
question I was going to ask. So it has changed my question. How do I
do the conversion of unicode escape sequences to a String that are done
by string literals?

String s = "\u0066\u0065\u0064";

becomes "fed" but if you create a String with \u0066\u0065\u0064 in it
without using the literal it stays \u0066\u0065\u0064. Is there a built
in mechanism in Java for doing that translation to a String?

--

Knute Johnson

Thomas Richter 12-13-2012 05:51 PM

Re: Unicode escapes and String literals?
 
On 13.12.2012 18:31, Knute Johnson wrote:
> I just had a great revelation as I was putting together my SSCCE for the
> question I was going to ask. So it has changed my question. How do I do
> the conversion of unicode escape sequences to a String that are done by
> string literals?
>
> String s = "\u0066\u0065\u0064";
>
> becomes "fed" but if you create a String with \u0066\u0065\u0064 in it
> without using the literal it stays \u0066\u0065\u0064. Is there a built
> in mechanism in Java for doing that translation to a String?


Yes. It's called "compiler". The same part of the compiler that
translates a "\t" in a string literal to the TAB control character also
replaces the unicode sequences in the string literal to the
corresponding unicode encoding.

Greetings,
Thomas

Knute Johnson 12-13-2012 06:47 PM

Re: Unicode escapes and String literals?
 
On 12/13/2012 9:51 AM, Thomas Richter wrote:
> On 13.12.2012 18:31, Knute Johnson wrote:
>> I just had a great revelation as I was putting together my SSCCE for the
>> question I was going to ask. So it has changed my question. How do I do
>> the conversion of unicode escape sequences to a String that are done by
>> string literals?
>>
>> String s = "\u0066\u0065\u0064";
>>
>> becomes "fed" but if you create a String with \u0066\u0065\u0064 in it
>> without using the literal it stays \u0066\u0065\u0064. Is there a built
>> in mechanism in Java for doing that translation to a String?

>
> Yes. It's called "compiler". The same part of the compiler that
> translates a "\t" in a string literal to the TAB control character also
> replaces the unicode sequences in the string literal to the
> corresponding unicode encoding.
>
> Greetings,
> Thomas


I want to be able to do it to a String not to a string literal.

--

Knute Johnson

Lew 12-13-2012 07:41 PM

Re: Unicode escapes and String literals?
 
Knute Johnson wrote:
> Thomas Richter wrote:
>> Knute Johnson wrote:
>>> I just had a great revelation as I was putting together my SSCCE for the
>>> question I was going to ask. So it has changed my question. How do I do
>>> the conversion of unicode [sic] escape sequences to a String that are done by
>>> string literals?


They aren't done by String literals.

>>> String s = "\u0066\u0065\u0064";
>>> becomes "fed" but if you create a String with \u0066\u0065\u0064 in it


Exactly how?

>>> without using the literal it stays \u0066\u0065\u0064. Is there a built
>>> in mechanism in Java for doing that translation to a String?


No.

>> Yes. It's called "compiler". The same part of the compiler that


That's not exactly correct, and it certainly is not the same part that translates '\t'.

>> translates a "\t" in a string literal to the TAB control character also
>> replaces the unicode sequences in the string literal to the
>> corresponding unicode encoding.


Nope.

> I want to be able to do it to a String not to a string literal.


You want to do what, exactly? I'm not clear on what you're trying to accomplish.

'\u' sequences are pre-compile, not during compile. Their presence is exactly equivalent
to typing the corresponding Unicode character directly.

You can embed them in identifiers, directives, anywhere the corresponding character can go.

Not just literals.

For that matter, you can use them in numeric literals.

<sscce>
package temp;

/**
* ShowUnicodeEscapes.
*/
public class ShowUnicodeEscapes {

static final \u0069nt COUN\u0054 = \u0030\u003b

/**
* main.
*
* @param args String array of arguments.
*/
public static void main(String[] args) {
System.out.println("COUNT = \u0022+ COUNT);
}
}
</sscce>

Daniel Pitts 12-13-2012 07:46 PM

Re: Unicode escapes and String literals?
 
On 12/13/12 9:31 AM, Knute Johnson wrote:
> I just had a great revelation as I was putting together my SSCCE for the
> question I was going to ask. So it has changed my question. How do I
> do the conversion of unicode escape sequences to a String that are done
> by string literals?
>
> String s = "\u0066\u0065\u0064";
>
> becomes "fed" but if you create a String with \u0066\u0065\u0064 in it
> without using the literal it stays \u0066\u0065\u0064. Is there a built
> in mechanism in Java for doing that translation to a String?
>


Do you mean, you have a String, whose value is "\\u0066\\u0065\\u0064",
you want to pass that String to a method which will return fed.

meaning

String foo = "\\u0066\\u0065\\u0064";

System.out.println(foo); // prints \u0066\u0065\u0064
System.out.println(magicFunction(foo)); // prints fed

There might be such a function in Apache Commons library, but I don't
think there is one in the standard API. I could be wrong though.


Daniel Pitts 12-13-2012 07:49 PM

Re: Unicode escapes and String literals?
 
On 12/13/12 11:46 AM, Daniel Pitts wrote:
> On 12/13/12 9:31 AM, Knute Johnson wrote:
>> I just had a great revelation as I was putting together my SSCCE for the
>> question I was going to ask. So it has changed my question. How do I
>> do the conversion of unicode escape sequences to a String that are done
>> by string literals?
>>
>> String s = "\u0066\u0065\u0064";
>>
>> becomes "fed" but if you create a String with \u0066\u0065\u0064 in it
>> without using the literal it stays \u0066\u0065\u0064. Is there a built
>> in mechanism in Java for doing that translation to a String?
>>

>
> Do you mean, you have a String, whose value is "\\u0066\\u0065\\u0064",
> you want to pass that String to a method which will return fed.
>
> meaning
>
> String foo = "\\u0066\\u0065\\u0064";
>
> System.out.println(foo); // prints \u0066\u0065\u0064
> System.out.println(magicFunction(foo)); // prints fed
>
> There might be such a function in Apache Commons library, but I don't
> think there is one in the standard API. I could be wrong though.


Two minutes of googling and reading a stack-overflow post gave me this link:

<http://commons.apache.org/lang/api/org/apache/commons/lang3/StringEscapeUtils.html#unescapeJava%28java.lang.St ring%29>


markspace 12-13-2012 08:58 PM

Re: Unicode escapes and String literals?
 
On 12/13/2012 10:47 AM, Knute Johnson wrote:
>
> I want to be able to do it to a String not to a string literal.
>


Daniel showed one way to interpret your request. Here's another. Pay
special attention to the bits out side the quotes. This program prints
"fed".


public class EscapeTest {
public static void main(String[] args) {
String \u0066\u0065\u0064 = "\u0066\u0065\u0064";
System.out.println( fed );
}
}



David Lamb 12-13-2012 09:21 PM

Re: Unicode escapes and String literals?
 
On 13/12/2012 3:58 PM, markspace wrote:
> On 12/13/2012 10:47 AM, Knute Johnson wrote:
>>
>> I want to be able to do it to a String not to a string literal.
>>

>
> Daniel showed one way to interpret your request. Here's another. Pay
> special attention to the bits out side the quotes. This program prints
> "fed".
>
>
> public class EscapeTest {
> public static void main(String[] args) {
> String \u0066\u0065\u0064 = "\u0066\u0065\u0064";
> System.out.println( fed );
> }
> }


Cute. But presupposing that the OP isn't the idiot some people seem to
have assumed, I suspect he meant something more like

String line = someBufferedFile.readline();
... change all \u escapes into unicode in "line" ... [1]

where by "\u escapes" he mean the 6-character substrings one usually
types in string literals. The OP needs to look into "code points" and
the corresponding codepoint to Character conversions at
http://docs.oracle.com/javase/7/docs...Character.html

[1] which, for the pedantic, really means "create a new string(buffer)
from line"


markspace 12-13-2012 10:00 PM

Re: Unicode escapes and String literals?
 
On 12/13/2012 1:21 PM, David Lamb wrote:
>
> Cute. But presupposing that the OP isn't the idiot some people seem to
> have assumed, I suspect he meant something more like
>
> String line = someBufferedFile.readline();
> ... change all \u escapes into unicode in "line" ... [1]



Maybe. But your code above is obvious, imo. Either Knute had a brain
fart and forgot about \\ to escape a slash, or he ran into some other
problem.

My point was that there's a very simple pre-compiler for Java. It
translates all \u-escapes into characters before the compiler proper
sees it. There's no difference to the Java compiler between "fed" and
"\u0066\u0065\u0064". It literally can't tell the difference.

That's an important distinction.



David Lamb 12-13-2012 10:17 PM

Re: Unicode escapes and String literals?
 
On 13/12/2012 5:00 PM, markspace wrote:
> My point was that there's a very simple pre-compiler for Java. It
> translates all \u-escapes into characters before the compiler proper
> sees it. There's no difference to the Java compiler between "fed" and
> "\u0066\u0065\u0064". It literally can't tell the difference.


I should probably have found a different point in the thread to hang my
comment, since you're perfectly correct.


All times are GMT. The time now is 12:12 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.