On Fri, 01 Sep 2006 01:09:40 -0400, Dale King wrote:
>> "Greg" <> wrote in message
>> news: oups.com...
>>> My Java program reads in (from an external source) text that contains
>>> the same sort of unicode character escape sequences as java source
>>> code. For example, one such string might be:
>>>
>>> "En Espa\u00f1ol"
>>>
>>> Naturally, I would like to convert the five characters subsequence,
>>> "\u00f1", into the single character codepoint (hex 00F1) that those
>>> characters actually represent:
>>>
>>> "En Espaņol"
>
> It's a bit more complicated than that because you will also need to
> support things like \\ to actually insert a backslash and perhaps
> support things like \n.
If he is defining a new specification for escaped input, this would be
nice but not necessary. "\" can be escaped as "\u005C", and a newline
as "\u000A". In Java source code, "\u005C" results in a malformed string
literal (which means one needs to use "\n" instead), but that escape
sequence is permitted in properties files. On the other hand, the Java
compiler and Properties.load() do not recognize the C escape-sequences
"\v" and "\a" for VT and BEL.
I think Arne's response (that used a regular expression) was too
complicated, and the response to which you are responding was
poorly-thought-out (because strings are immutable in Java). Here's a
possible solution:
String unescape(String s) {
int i=0,len=s.length(); char c; StringBuffer sb = new StringBuffer(len);
while (i<len) {
c = s.charAt(i++);
if (c=='\\') {
if (i<len) {
c = s.charAt(i++);
if (c=='u') {
c = (char) Integer.parseInt(s.substring(i,i+4),16);
i += 4;
} // add other cases here as desired...
}} // fall through: \ escapes itself, quotes any character but u
sb.append(c);
}
return sb.toString();
}
Unlike Arne's solution, it examines each character in the string only
once, and it doesn't require the java.util.regex package (which was not
introduced until Java 1.4). I also think it's more readable, to one who
is trying to verify that it does exactly what's expected and no more.
(What would Arne's solution do to "\u005Cu0020\u0020"? Is that the
correct result?)
--
PGP key posted on website ...
http://www.lmert.com/people/davidl/