Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Javascript > Are ”extended characters” safe in identifiers?

Reply
Thread Tools

Are ”extended characters” safe in identifiers?

 
 
Jukka K. Korpela
Guest
Posts: n/a
 
      05-16-2011
The syntax of ECMAScript has allowed “extended characters” in
identifiers since 3rd edition (1999). This means, among other things,
allowing any Unicode letters, like Greek, Arabic, and Cyrillic letters
as well as e.g. Chinese ideographs. As far as I can see, this has been
supported in web browsers for a long time (e.g., ever since IE 5.5).

So is it really safe to use them, writing, say

var π = Math.PI;
var ผลบวก = 0;
function Götterdämmerung()

or are there some pitfalls? Various coding conventions as well as
practical editing issues (you can’t be sure of always being able to edit
your code on a Unicode-enabled editor) aside, is there still some real
technical reason to stick to the A–Z, a–z, 0–9, ”$”, ”_" repertoire?

--
Yucca, http://www.cs.tut.fi/~jkorpela/
 
Reply With Quote
 
 
 
 
Martin Honnen
Guest
Posts: n/a
 
      05-16-2011
Jukka K. Korpela wrote:

> The syntax of ECMAScript has allowed “extended characters” in identifiers since 3rd edition (1999). This means, among other things, allowing any Unicode letters, like Greek, Arabic, and Cyrillic letters as well as e.g. Chinese ideographs. As far as I can see, this has been supported in web browsers for a long time (e.g., ever since IE 5.5).
>
> So is it really safe to use them, writing, say
>
> var π = Math.PI;
> var ผลบวก = 0;
> function Götterdämmerung()
>
> or are there some pitfalls? Various coding conventions as well as practical editing issues (you can’t be sure of always being able to edit your code on a Unicode-enabled editor) aside, is there still some real technical reason to stick to the A–Z, a–z, 0–9, ”$”, ”_" repertoire?


I think it is technically safe but I don't see people doing that,
neither in Javascript nor in other languages like C# which also allow
more than ASCII letters in identifiers. But the reason is probably
coding conventions, editing issues and keeping code readable and
understandable internationally. And partly maybe ignorance that more
than ASCII can be used.


--

Martin Honnen
http://msmvps.com/blogs/martin_honnen/
 
Reply With Quote
 
 
 
 
Thomas 'PointedEars' Lahn
Guest
Posts: n/a
 
      05-17-2011
Jukka K. Korpela wrote:

> The syntax of ECMAScript has allowed “extended characters” in
> identifiers since 3rd edition (1999). This means, among other things,
> allowing any Unicode letters, like Greek, Arabic, and Cyrillic letters
> as well as e.g. Chinese ideographs. As far as I can see, this has been
> supported in web browsers for a long time (e.g., ever since IE 5.5).
>
> So is it really safe to use them, writing, say
>
> var π = Math.PI;
> var ผลบวก = 0;
> function Götterdämmerung()
>
> or are there some pitfalls? Various coding conventions as well as
> practical editing issues (you can’t be sure of always being able to edit
> your code on a Unicode-enabled editor) aside, is there still some real
> technical reason to stick to the A–Z, a–z, 0–9, ”$”, ”_" repertoire?


Perhaps misconfigured Web servers still declaring ISO-8859-1 by default is a
reason why few people use characters beyond U+007F or U+00FF.

As for practical editing issues, it is not only the editor, but also the
input method that needs to be available and to allow for easy typing. At
least on my current X.org keyboard setup it is considerably harder to type
`π' than `pi' (except in GNOME applications where I could type C-S-u
$HEXCODEPOINT; but that would still be four keypresses more). (I really
don't seem to need the THORN letters, so I could define GREEK LETTER … PI
for M-P instead. But not all people can do this, and even if they could
they may not want to.)


PointedEars
--
Danny Goodman's books are out of date and teach practices that are
positively harmful for cross-browser scripting.
-- Richard Cornford, cljs, <cife6q$253$1$(E-Mail Removed)> (2004)
 
Reply With Quote
 
Tim Streater
Guest
Posts: n/a
 
      05-17-2011
In article <(E-Mail Removed)>,
Thomas 'PointedEars' Lahn <(E-Mail Removed)> wrote:

> As for practical editing issues, it is not only the editor, but also the
> input method that needs to be available and to allow for easy typing. At
> least on my current X.org keyboard setup it is considerably harder to type
> `น' than `pi' (except in GNOME applications where I could type C-S-u
> $HEXCODEPOINT; but that would still be four keypresses more). (I really
> don't seem to need the THORN letters, so I could define GREEK LETTER ษ PI
> for M-P instead. But not all people can do this, and even if they could
> they may not want to.)


On my Mac น is option-p (alt-p if you prefer) - in all applications.

--
Tim

"That excessive bail ought not to be required, nor excessive fines imposed,
nor cruel and unusual punishments inflicted" -- Bill of Rights 1689
 
Reply With Quote
 
Thomas 'PointedEars' Lahn
Guest
Posts: n/a
 
      05-17-2011
Tim Streater wrote:

> Thomas 'PointedEars' Lahn <(E-Mail Removed)> wrote:
>> As for practical editing issues, it is not only the editor, but also the
>> input method that needs to be available and to allow for easy typing. At
>> least on my current X.org keyboard setup it is considerably harder to
>> type `น' than `pi' (except in GNOME applications where I could type C-S-u
>> $HEXCODEPOINT; but that would still be four keypresses more). (I really
>> don't seem to need the THORN letters, so I could define GREEK LETTER ษ PI
>> for M-P instead. But not all people can do this, and even if they could
>> they may not want to.)

>
> On my Mac น is option-p (alt-p if you prefer) - in all applications.


But what good is a handy input method if you have the wrong application?
For example, you did not post GREEK SMALL LETTER PI, but something else
(UniView says, U+0E19 THAI CHARACTER NO NU; you even managed to mangle my
proper Unicode pi and ellipsis when quoting them.)

So I think we can add lack of proper Unicode support in some newsreaders to
the list of technical reasons for not using non-ASCII characters in source
code


PointedEars
--
var bugRiddenCrashPronePieceOfJunk = (
navigator.userAgent.indexOf('MSIE 5') != -1
&& navigator.userAgent.indexOf('Mac') != -1
) // Plone, register_function.js:16
 
Reply With Quote
 
Tim Streater
Guest
Posts: n/a
 
      05-17-2011
In article <(E-Mail Removed)>,
Thomas 'PointedEars' Lahn <(E-Mail Removed)> wrote:

> Tim Streater wrote:
>
> > Thomas 'PointedEars' Lahn <(E-Mail Removed)> wrote:
> >> As for practical editing issues, it is not only the editor, but also the
> >> input method that needs to be available and to allow for easy typing. At
> >> least on my current X.org keyboard setup it is considerably harder to
> >> type `น' than `pi' (except in GNOME applications where I could type C-S-u
> >> $HEXCODEPOINT; but that would still be four keypresses more). (I really
> >> don't seem to need the THORN letters, so I could define GREEK LETTER ษ PI
> >> for M-P instead. But not all people can do this, and even if they could
> >> they may not want to.)

> >
> > On my Mac น is option-p (alt-p if you prefer) - in all applications.

>
> But what good is a handy input method if you have the wrong application?
> For example, you did not post GREEK SMALL LETTER PI, but something else
> (UniView says, U+0E19 THAI CHARACTER NO NU; you even managed to mangle my
> proper Unicode pi and ellipsis when quoting them.)
>
> So I think we can add lack of proper Unicode support in some newsreaders to
> the list of technical reasons for not using non-ASCII characters in source
> code


Yes, MT-NewsWatcher does seem to have some issues in this regard (it
claims to send UTF-8 but is obviously lying). Shame really as it's quite
good in most other respects for my purposes.

--
Tim

"That excessive bail ought not to be required, nor excessive fines imposed,
nor cruel and unusual punishments inflicted" -- Bill of Rights 1689
 
Reply With Quote
 
Erwin Moller
Guest
Posts: n/a
 
      05-18-2011
On 5/17/2011 10:24 PM, Tim Streater wrote:
> In article <(E-Mail Removed)>,
> Thomas 'PointedEars' Lahn <(E-Mail Removed)> wrote:
>
>> As for practical editing issues, it is not only the editor, but also
>> the input method that needs to be available and to allow for easy
>> typing. At least on my current X.org keyboard setup it is considerably
>> harder to type `น' than `pi' (except in GNOME applications where I
>> could type C-S-u $HEXCODEPOINT; but that would still be four
>> keypresses more). (I really don't seem to need the THORN letters, so I
>> could define GREEK LETTER ษ PI for M-P instead. But not all people can
>> do this, and even if they could they may not want to.)

>
> On my Mac น is option-p (alt-p if you prefer) - in all applications.
>


Did anybody else notice the change in the topic when Tim replied?

The original quotes around "extended characters" have been replaced by
something else.
Funny, considering the discussion at hand.

Regards,
Erwin Moller

--
"That which can be asserted without evidence, can be dismissed without
evidence."
-- Christopher Hitchens
 
Reply With Quote
 
Tim Streater
Guest
Posts: n/a
 
      05-18-2011
In article <4dd36c43$0$49038$(E-Mail Removed)4all.nl>,
Erwin Moller
<Since_humans_read_this_I_am_spammed_too_much@spam yourself.com> wrote:

> On 5/17/2011 10:24 PM, Tim Streater wrote:
> > In article <(E-Mail Removed)>,
> > Thomas 'PointedEars' Lahn <(E-Mail Removed)> wrote:
> >
> >> As for practical editing issues, it is not only the editor, but also
> >> the input method that needs to be available and to allow for easy
> >> typing. At least on my current X.org keyboard setup it is considerably
> >> harder to type `น' than `pi' (except in GNOME applications where I
> >> could type C-S-u $HEXCODEPOINT; but that would still be four
> >> keypresses more). (I really don't seem to need the THORN letters, so I
> >> could define GREEK LETTER ษ PI for M-P instead. But not all people can
> >> do this, and even if they could they may not want to.)

> >
> > On my Mac น is option-p (alt-p if you prefer) - in all applications.

>
> Did anybody else notice the change in the topic when Tim replied?
>
> The original quotes around "extended characters" have been replaced by
> something else.
> Funny, considering the discussion at hand.


Quite so

--
Tim

"That excessive bail ought not to be required, nor excessive fines imposed,
nor cruel and unusual punishments inflicted" -- Bill of Rights 1689
 
Reply With Quote
 
Dr J R Stockton
Guest
Posts: n/a
 
      05-18-2011
In comp.lang.javascript message <iqqs00$2u5$(E-Mail Removed)>, Mon, 16
May 2011 12:49:51, Jukka K. Korpela <(E-Mail Removed)> posted:

>The syntax of ECMAScript has allowed “extended characters” in
>identifiers since 3rd edition (1999). This means, among other things,
>allowing any Unicode letters, like Greek, Arabic, and Cyrillic letters
>as well as e.g. Chinese ideographs. As far as I can see, this has been
>supported in web browsers for a long time (e.g., ever since IE 5.5).
>
>So is it really safe to use them, writing, say
>
>var ? = Math.PI;
>var ????? = 0;
>function Götterdämmerung()
>
>or are there some pitfalls? Various coding conventions as well as
>practical editing issues (you can’t be sure of always being able to
>edit your code on a Unicode-enabled editor) aside, is there still some
>real technical reason to stick to the A–Z, a–z, 0–9, ”$”, ”_"
>repertoire?



It creates interesting possibilities of writing code which looks
incorrect but will execute, or /vice versa/ - for example, "while" can
be used as an ordinary identifier, but not as a reserved word, if the
third character is \u2170. In at least common fonts, that numeric
character is likely to look very much like \x69.

One can likewise attack 'var' and 'extends'. And \u03bf or \u0531 can
be used in 'for'.

One can presumably defeat Google Translate be exchanging visually
equivalent Greek, Cyrillic, and Latin characters.

Code might be visually obfuscated by renaming one's variables to
incorporate, or comprise, various non-inking characters = especially
\u008d.

ENTIRELY UNTESTED.

But the French are rather proud of their language, and IIRC a well-
placed accent can completely change the meaning of a word - I van
understand a French programmer wanting to use accented identifiers.

--
(c) John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v6.05 MIME.
Web <http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
Proper <= 4-line sig. separator as above, a line exactly "-- " (SonOfRFC1036)
Do not Mail News to me. Before a reply, quote with ">" or "> " (SonOfRFC1036)
 
Reply With Quote
 
Jukka K. Korpela
Guest
Posts: n/a
 
      05-19-2011
18.5.2011 20:53, Dr J R Stockton wrote:

> It creates interesting possibilities of writing code which looks
> incorrect but will execute, or /vice versa/ - for example, "while" can
> be used as an ordinary identifier, but not as a reserved word, if the
> third character is \u2170.


Non-Ascii characters in identifiers could be used for a variety of
purposes, yes. There has been a lot of discussion about similar issues
with non-Ascii characters in domain names, where the risk (both
probability and possible damage) of intentionally caused confusion is
much greater.

With identifiers in JavaScript, the risks are already with us, without
any precautions like the complex rules for domain names (e.g. rules
against mixing letters from different writing systems in a word). So I
don't think the risks could be used as an argument against appropriate use.

I was somewhat surprised at seeing that both http://www.jslint.com/ and
http://jshint.com/ apparently report any non-Ascii characters in
identifiers as errors, without even offering any option to allow them.

> But the French are rather proud of their language, and IIRC a well-
> placed accent can completely change the meaning of a word - I van
> understand a French programmer wanting to use accented identifiers.


It's not that common to find French word pairs that differ only in the
use of accents. In Swedish or Finnish, it's much easier, and letters
like å and ä aren't treated as letters with accents but as separate
letters of the alphabet. But Greek, Bulgarian, Thai, and Japanese are
better examples of languages that need non-Ascii letters.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
os.ChDir() not thread-safe; was : Is tempfile.mkdtemp() thread-safe? Gabriel Rossetti Python 0 08-29-2008 08:30 AM
Safe Mode (?) - It is meant to be normal mode but looks like safe mode English Patient Computer Support 3 10-03-2004 11:10 PM
Re: Those cute little "WORK-SAFE" / "NOT WORK-SAFE" tags that people put in the Subject headers of their posts... Soapy Digital Photography 1 08-16-2004 12:07 PM
Re: Those cute little "WORK-SAFE" / "NOT WORK-SAFE" tags that people put in the Subject headers of their posts... Soapy Digital Photography 1 08-16-2004 06:24 AM
$SAFE = 5 and Safe Ruby Misleading? kirindave@lensmen.net Ruby 1 08-11-2003 11:35 PM



Advertisments