Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Javascript > Identifiers - UnicodeEscapeSequence

Reply
Thread Tools

Identifiers - UnicodeEscapeSequence

 
 
Asen Bozhilov
Guest
Posts: n/a
 
      02-15-2010
Documentation permit to be used `\UnicodeEscapeSequence` in
IdentifierName. But there:

| Unicode escape sequences are also permitted in identifiers,
| where they contribute a single character to the
| identifier, as computed by the CV of the
| UnicodeEscapeSequence. The \ preceding the
| UnicodeEscapeSequence does not contribute a character to the
identifier.
| A UnicodeEscapeSequence cannot be
| used to put a character into an identifier that
| would otherwise be illegal. In other words, if a
\UnicodeEscapeSequence
| sequence were replaced by its UnicodeEscapeSequence's CV,
| the result must still be a
| valid Identifier that has the exact same sequence of characters as
the original Identifier.

As i understand it. If i type:

var \\u0069\\u0066; //var if;

`if` is ReservedWord and example above, should throw SyntaxError.

try {
eval('var \\u0069\\u0066;'); //var if;
}catch(e) {
window.alert(e instanceof SyntaxError);
}

Firefox 3.5.7 - No error
IE6 - true
Chrome 4.0 - No error
Opera 9.64 - No error
Safari 4.0 - No error
Rhino 1.7R2 - No error
DMDScript 1.02 - true

try {
eval('var \\u0030;'); //var 0;
}catch (e) {
window.alert(e instanceof SyntaxError);
}

Firefox 3.5.7 - true
IE6 - true
Chrome 4.0 - true
Opera 9.64 - No error
Safari 4.0 - true
Rhino 1.7R2 - No error
DMDScript 1.02 - No error

My question is, what is the proper behavior related with
specification? I think if i have `var \\u0069\\u0066;` should throw
SyntaxError.

Thanks.

 
Reply With Quote
 
 
 
 
Scott Sauyet
Guest
Posts: n/a
 
      02-15-2010
On Feb 15, 2:38*pm, Asen Bozhilov <asen.bozhi...@gmail.com> wrote:
> My question is, what is the proper behavior related with
> specification? I think if i have `var \\u0069\\u0066;` should throw
> SyntaxError.


I don't know the spec well enough to answer. But I'm wondering if you
would expect an error from this as well:

window["if"] = 10;

I can't see why either should throw an error. The only reason to
disallow the reserved word as a identifier name is to make unambiguous
to the ES engine what is meant by the term. There is no such
provision for keywords to be listed via unicode escapes, is there? If
not, then there is no ambiguity about what "\\u0069\\u0066" should
represent.

-- Scott
 
Reply With Quote
 
 
 
 
Lasse Reichstein Nielsen
Guest
Posts: n/a
 
      02-15-2010
Asen Bozhilov <> writes:

> Documentation permit to be used `\UnicodeEscapeSequence` in
> IdentifierName. But there:
>
> | Unicode escape sequences are also permitted in identifiers,
> | where they contribute a single character to the
> | identifier, as computed by the CV of the
> | UnicodeEscapeSequence.


This is the important part. It allows unicode escapes in identifiers.
There is no similar statement for any of the reserved words, so
unicode escapes cannot be used in a keyword.

> | The \ preceding the
> | UnicodeEscapeSequence does not contribute a character to the
> identifier.
> | A UnicodeEscapeSequence cannot be
> | used to put a character into an identifier that
> | would otherwise be illegal. In other words, if a
> \UnicodeEscapeSequence
> | sequence were replaced by its UnicodeEscapeSequence's CV,
> | the result must still be a
> | valid Identifier that has the exact same sequence of characters as
> the original Identifier.
>
> As i understand it. If i type:
>
> var \\u0069\\u0066; //var if;


(I assume it should be single backslashes when not in a string

> `if` is ReservedWord and example above, should throw SyntaxError.


No. While 'if' is a keyword, it is only the sequence U+0069 U+0066
that is recognized as the 'if' keyword. Unicode escapes are not allowed
as parts of keywords. The above, correctly, declares a variable called
'if' - because "\u0069\u0066" matches the production of an identifier
and it doesn't match the production of any reserved word.

The inputs, "if" and "i\u0066" are different sequences of characters.
They are parsed differently. The latter is parsed as an identifier.
An identifier is represented as a sequence of code points. It just
happens that "i\u0066", "\u0069f" and "\u0069\u0066" all parses to
identifers represented by U+0069U+0066, and "if" does not.

....
> My question is, what is the proper behavior related with
> specification?


Yes.

> I think if i have `var \\u0069\\u0066;` should throw
> SyntaxError.


The operative part of the ECMA262 standard is in section 7.6, which
you quote. It allows escape sequences in identifiers. No such
allowance are given for keywords or other reserved words - so anything
containing a unicode escape is not a keyword.

/L
--
Lasse Reichstein Holst Nielsen
'Javascript frameworks is a disruptive technology'

 
Reply With Quote
 
Thomas 'PointedEars' Lahn
Guest
Posts: n/a
 
      02-16-2010
Lasse Reichstein Nielsen wrote:

> Asen Bozhilov <> writes:
>> Documentation permit to be used `\UnicodeEscapeSequence` in
>> IdentifierName. But there:
>>
>> | Unicode escape sequences are also permitted in identifiers,
>> | where they contribute a single character to the
>> | identifier, as computed by the CV of the
>> | UnicodeEscapeSequence.

>
> This is the important part. It allows unicode escapes in identifiers.


But none that would not be allowed if the character was included verbatim.

> There is no similar statement for any of the reserved words, so
> unicode escapes cannot be used in a keyword.


You have got it backwards.

>> | The \ preceding the UnicodeEscapeSequence does not contribute a
>> | character to the identifier. A UnicodeEscapeSequence cannot be
>> | used to put a character into an identifier that would otherwise be
>> | illegal. In other words, if a \UnicodeEscapeSequence sequence were

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> | replaced by its UnicodeEscapeSequence's CV, the result must still

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^
>> | be a valid Identifier that has the exact same sequence of characters

^^^^^^^^^^^^^^^^^^^^^
>> | as the original Identifier.


I do not think it can be worded more clearly.

>> As i understand it. If i type:
>>
>> var \\u0069\\u0066; //var if;

>
> (I assume it should be single backslashes when not in a string


Why, the double backslashes are legal, too. However the resulting value
would still not be an /Identifier/, barring language extensions.

>> `if` is ReservedWord and example above, should throw SyntaxError.

>
> No.


True, but the program ought to be syntactical in error nonetheless.

> While 'if' is a keyword, it is only the sequence U+0069 U+0066
> that is recognized as the 'if' keyword. Unicode escapes are not allowed
> as parts of keywords. The above, correctly, declares a variable called
> 'if' - because "\u0069\u0066" matches the production of an identifier
> and it doesn't match the production of any reserved word.
> The inputs, "if" and "i\u0066" are different sequences of characters.
> They are parsed differently. The latter is parsed as an identifier.


Your logic is flawed, because escape sequences are converted into the
corresponding Unicode characters (the character is the Computed Value)
*before* the tokenization process takes place that follows from applying
the syntactical grammar:

| 5.1.4
|
| [...]
| When a stream of characters is to be parsed as an ECMAScript program, it
| is first converted to a stream of input elements by repeated application
| of the lexical grammar; this stream of input elements is then parsed by
| a single application of the syntactic grammar. The program is
| syntactically in error if the tokens in the stream of input elements
| cannot be parsed as a single instance of the goal nonterminal /Program/,
| with no tokens left over.

/UnicodeEscapeSequence/ is a goal symbol of the lexical grammar as is
/Keyword/; /IfStatement/ is a goal symbol of the syntactic grammar.

As a result, first application of the lexical grammar ought to cause

var \u0069\u0066

to become

var if

and second application of the lexical grammar ought to cause `if' to be
parsed as as a /Keyword/:

| Keyword :: one of
| [...] if [...]

Then, application of the syntactic grammar ought to cause

var if

to be recognized as theoretically producible by

VariableStatement :
VariableDeclarationList

VariableDeclarationList :
VariableDeclaration

VariableDeclaration :
Identifier Initialiser_opt

which ought to fail because the token `if' has been determined a /Keyword/
before, not an /Identifier/, and no other productions of the syntactic
grammar would be applicable.

Therefore, the program ought to be considered syntactically in error. That
it might not, could only be attributed to a proprietary extension. Hence
the clarification as quoted above:

| A UnicodeEscapeSequence cannot be used to put a character into an
| identifier that would otherwise be illegal. [...]


PointedEars
--
Danny Goodman's books are out of date and teach practices that are
positively harmful for cross-browser scripting.
-- Richard Cornford, cljs, <cife6q$253$1$> (2004)
 
Reply With Quote
 
Thomas 'PointedEars' Lahn
Guest
Posts: n/a
 
      02-16-2010
Thomas 'PointedEars' Lahn wrote:

> Lasse Reichstein Nielsen wrote:
>> Asen Bozhilov <> writes:
>>> As i understand it. If i type:
>>>
>>> var \\u0069\\u0066; //var if;

>>
>> (I assume it should be single backslashes when not in a string

>
> Why, the double backslashes are legal, too.


Ignore that, I went too far here.

| IdentifierStart ::
| UnicodeLetter
| $
| _
| \ UnicodeEscapeSequence
|
| [...]
| UnicodeEscapeSequence ::
| u HexDigit HexDigit HexDigit HexDigit


PointedEars
 
Reply With Quote
 
Asen Bozhilov
Guest
Posts: n/a
 
      02-16-2010
Lasse Reichstein Nielsen wrote:
> Asen Bozhilov writes:


> > var \\u0069\\u0066; //var if;

>
> (I assume it should be single backslashes when not in a string
>
> > `if` is ReservedWord and example above, should throw SyntaxError.


Yes. Should be:

var \u0069\u0066;

Double backslashes because i was copy from passed string to `eval'.
However this is my mystake.

> No. While 'if' is a keyword, it is only the sequence U+0069 U+0066
> that is recognized as the 'if' keyword. Unicode escapes are not allowed
> as parts of keywords. The above, correctly, declares a variable called
> 'if' - because "\u0069\u0066" matches the production of an identifier
> and it doesn't match the production of any reserved word.


I agree with this point of specification.

| 6 Source Text
| [...]
| In string literals, regular expression literals and identifiers,
| any character (code point) may also be expressed as a
| Unicode escape sequence consisting of six characters,
| namely \u plus four hexadecimal digits.

You are correct and next example prove your words.

try {
\u0069\u0066 (true);
}catch(e) {
window.alert(e instanceof ReferenceError); //true
}

\u0069\u0066 (true); Will be evaluate as `ExpressionStatement` which
finish with explicit semicolon instead of `if Statement` with
`EmptyStatement` `;`.

> The operative part of the ECMA262 standard is in section 7.6, which
> you quote. It allows escape sequences in identifiers. *No such
> allowance are given for keywords or other reserved words - so anything
> containing a unicode escape is not a keyword.


I am confused from:

| A UnicodeEscapeSequence cannot be
| used to put a character into an identifier that
| would otherwise be illegal. In other words, if a
| \UnicodeEscapeSequence sequence were replaced by
| its UnicodeEscapeSequence's CV,
| the result must still be a
| valid Identifier

As i understand it.

If i replace:

var \u0069\u0066;

With characters value (CV) i will get:

var if;

And syntax grammar for `Identifiers` doesn't allow identifier with
name `if` in:

Identifier ::
IdentifierName but not ReservedWord

Because `if` is keyword and it's a part from `7.5.1 Reserved Words`.

Thanks for this comment, but why specification doesn't say anything
about this case in explicit way?
 
Reply With Quote
 
Lasse Reichstein Nielsen
Guest
Posts: n/a
 
      02-16-2010
Thomas 'PointedEars' Lahn <> writes:

....
>>> | illegal. In other words, if a \UnicodeEscapeSequence sequence were

> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> | replaced by its UnicodeEscapeSequence's CV, the result must still

> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^
>>> | be a valid Identifier that has the exact same sequence of characters

> ^^^^^^^^^^^^^^^^^^^^^
>>> | as the original Identifier.

>
> I do not think it can be worded more clearly.


I must admit that, on second thought, I tend to agree with that
interpretation.
However, it seems that IE is the only browser that agrees. All of
Opera, Firefox, Chrome and Safari accept \u0069\u0066 as an identifier.

/L
--
Lasse Reichstein Holst Nielsen
'Javascript frameworks is a disruptive technology'

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Binding identifiers to known or unknown identifiers latashag@live.com Java 1 04-22-2008 04:54 AM
Advantages of denying keywords as identifiers valentin tihomirov VHDL 8 12-28-2004 06:44 PM
Why Does C++ Name-Mangle Identifiers? Karl Heinz Buchegger C++ 20 11-05-2004 10:31 AM
Link checker that checks fragment identifiers? Spartanicus HTML 2 05-25-2004 09:54 PM
Identifier collisions between global level and static identifiers Richard Bos C Programming 3 02-06-2004 08:23 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57