Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Allowing non-ASCII identifiers (Fran?ois Pinard)

Reply
Thread Tools

Allowing non-ASCII identifiers (Fran?ois Pinard)

 
 
Doug Fort
Guest
Posts: n/a
 
      02-09-2004
This is an excerpt from a much longer post on the python-dev mailing list.
I'm responding here, to avoid cluttering up python-dev.

[Franšois Pinard]
<snip>
>Some English readers might not really imagine, but it is a constant
>misery, having to mangle identifiers while documenting and thinking
>in languages other than English, merely because the Python notion of
>letter is limited to the English subset. Granted, keywords and standard
>library use English, this is Python, and this is not at stake here!
>However, there is a good part of code in local (or in-house) programs
>which is thought as our crafted code, and even the linguistic change is
>useful (to us) for segregating between what comes from the language and
>what comes from us. The idea is extremely appealing of being able to
>craft and polish our code (comments, strings, identifiers) to make it as

<nice as it could get, while thinking in our native, natural language.
>--
>Franšois Pinard http://www.iro.umontreal.ca/~pinard

</snip>

Monglot English speakers, like me, might also benefit from reading
well-crafted Python code with non-english identifiers and comments. I learn
best by anchoring new ideas in a familiar context.

One of my (non-programmer) friends is improving his French by working
through the French versions of the Harry Potter novels.


 
Reply With Quote
 
 
 
 
Paul Prescod
Guest
Posts: n/a
 
      02-09-2004
Doug Fort wrote:

> [Franšois Pinard]
> <snip>
>
>>Some English readers might not really imagine, but it is a constant
>>misery, having to mangle identifiers while documenting and thinking
>>in languages other than English, merely because the Python notion of
>>letter is limited to the English subset. Granted, keywords and standard
>>library use English, this is Python, and this is not at stake here!
>>However, there is a good part of code in local (or in-house) programs
>>which is thought as our crafted code, and even the linguistic change is
>>useful (to us) for segregating between what comes from the language and
>>what comes from us. The idea is extremely appealing of being able to
>>craft and polish our code (comments, strings, identifiers) to make it as


I wonder if the proposal would be more palatable if it were restricted
to 8-bit encodings (what we used to call "code pages"). This is at least
a first step in the right direction that would help westerners and could
be made to work even if Python were compiled without Unicode support.
(it is still possible to compile Python without Unicode isn't it?)

Paul Prescod



 
Reply With Quote
 
 
 
 
=?iso-8859-1?Q?Fran=E7ois?= Pinard
Guest
Posts: n/a
 
      02-09-2004
[Paul Prescod]

> I wonder if the proposal would be more palatable if it were restricted
> to 8-bit encodings (what we used to call "code pages"). This is at
> least a first step in the right direction that would help westerners
> and could be made to work even if Python were compiled without Unicode
> support.


To repeat something I was writing to python-dev earlier today, it
already works by some kind of accident. A smallish main program
could do:

import locale
locale.setlocale(locale.LC_ALL, '')
import THE-REAL-APPLICATION

to activate your code page, given your environment is already set for
it. This will activate proper classification of characters in <ctype.c>
and then, Python seems to behave properly with non-ASCII identifiers
within the imported application.

It is an accident because it was not meant this way by Guido, at least
so far that I know. The trick might break at various places, who knows.
I did not test it seriously, and do not intend to rely on it, as Guido
might even choose to consider this as a bug to be corrected.

The plan rather seems to be to support non-ASCII identifiers widely
instead of parsimoniously, if Python ever does it, or not at all. The
decision has not been taken yet, Guido wants a PEP and a discussion
first.

In my experience, such discussions are often rough (or at least
demanding), because people have a lot of emotions on linguistic
issues, and do not always show the real relations between emotions and
rationalisations, which sometimes get convoluted.

> (it is still possible to compile Python without Unicode isn't it?)


I would guess that Unicode in Python is central if you want codecs to
work, in particular for all code pages which Python currently supports.

--
Franšois Pinard http://www.iro.umontreal.ca/~pinard

 
Reply With Quote
 
=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=
Guest
Posts: n/a
 
      02-09-2004
Paul Prescod wrote:
> I wonder if the proposal would be more palatable if it were restricted
> to 8-bit encodings (what we used to call "code pages"). This is at least
> a first step in the right direction that would help westerners and could
> be made to work even if Python were compiled without Unicode support.
> (it is still possible to compile Python without Unicode isn't it?)


I doubt that it would matter much to those currently opposed; I know
that *I* would be opposed to such a strategy: Allowing arbitrary source
code encoding is no technical challenge whatsoever, and restricting
it to single-byte encodings is an arbitrary restriction.

I believe Guido's concern is more along the lines "How do I call a
function that has a ł in its name, or a Σ?", or, even, "How can I
find out what the function does, by looking at its name and doc
string, if that is in Polish or Greek?" The fact that there is
a single-byte encoding for either character doesn't really help
here.

So this is about social issues, coding policies, guidelines, etc -
not about technical issues.

Regards,
Martin


 
Reply With Quote
 
Paul Prescod
Guest
Posts: n/a
 
      02-09-2004
Martin v. L├Âwis wrote:

> Paul Prescod wrote:
>
>> I wonder if the proposal would be more palatable if it were restricted
>> to 8-bit encodings (what we used to call "code pages"). This is at
>> least a first step in the right direction that would help westerners
>> and could be made to work even if Python were compiled without Unicode
>> support. (it is still possible to compile Python without Unicode isn't
>> it?)

>
>
> I doubt that it would matter much to those currently opposed; I know
> that *I* would be opposed to such a strategy: Allowing arbitrary source
> code encoding is no technical challenge whatsoever, and restricting
> it to single-byte encodings is an arbitrary restriction.


You are right. Re-reading Guido's complaint I understand what you mean.
But I have heard the argument in the past that Unicode source files
would break introspection tools. If that isn't a concern this time
around then disregard my suggestion.

Paul Prescod



 
Reply With Quote
 
John Roth
Guest
Posts: n/a
 
      02-10-2004

"Paul Prescod" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
Martin v. L÷wis wrote:

> Paul Prescod wrote:
>
>> I wonder if the proposal would be more palatable if it were restricted
>> to 8-bit encodings (what we used to call "code pages"). This is at
>> least a first step in the right direction that would help westerners
>> and could be made to work even if Python were compiled without Unicode
>> support. (it is still possible to compile Python without Unicode isn't
>> it?)

>
>
> I doubt that it would matter much to those currently opposed; I know
> that *I* would be opposed to such a strategy: Allowing arbitrary source
> code encoding is no technical challenge whatsoever, and restricting
> it to single-byte encodings is an arbitrary restriction.


You are right. Re-reading Guido's complaint I understand what you mean.
But I have heard the argument in the past that Unicode source files
would break introspection tools. If that isn't a concern this time
around then disregard my suggestion.

[JR]
I believe that unicode (actually UTF- source code files
are legitimate if you declare them properly in the encoding
line. In fact, UTF-8 is the example in the documentation.

I'm all in favor of going to unicode all the way. I'd like to
have the proper mathematical symbols for logical and set
operations, as well as integer divide. They're all there in the
unicode character set, after all; why should we have to
settle for archaic character restrictions?

John Roth
[/JR]

Paul Prescod




 
Reply With Quote
 
AdSR
Guest
Posts: n/a
 
      02-10-2004
"John Roth" <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>...
> I'm all in favor of going to unicode all the way. I'd like to
> have the proper mathematical symbols for logical and set
> operations, as well as integer divide. They're all there in the
> unicode character set, after all; why should we have to
> settle for archaic character restrictions?


Java allows for Unicode identifiers and I'm yet to see a single source
file that uses anything but ASCII. Actually, so far I have only seen
non-ASCII in Polish Logo many years ago, and that was only for
educational purposes.

As a non-native English speaker, coming from Polish and Portuguese
background, I could argue in favor of non-ASCII identifiers, but I'm
against them. Do we really need those? Even if program output is in
Polish, all my code is "identified" and commented in English, which I
think of as of a good habit. (With exception of HTML, where comments
are closely related to content.)

I don't have any _really_ solid reasons against Unicode identifiers,
except for simplicity. It's just the way I feel about programming.

On a side note, one place where I think non-ASCII really should be
avoided are domain names, something that is being much debated
recently.

AdSR
 
Reply With Quote
 
Michael Hudson
Guest
Posts: n/a
 
      02-10-2004
http://www.velocityreviews.com/forums/(E-Mail Removed) (AdSR) writes:

> On a side note, one place where I think non-ASCII really should be
> avoided are domain names, something that is being much debated
> recently.


And something Python supports already

Cheers,
mwh

--
Windows XP: Big cow. Stands there, not especially malevolent
but constantly crapping on your carpet. Eventually you have to
open a window to let the crap out or you die.
-- Jim's pedigree of operating systems, asr
 
Reply With Quote
 
Scott David Daniels
Guest
Posts: n/a
 
      02-10-2004
John Roth wrote:
> ...
> I believe that unicode (actually UTF- source code files
> are legitimate if you declare them properly in the encoding
> line. In fact, UTF-8 is the example in the documentation.
>
> I'm all in favor of going to unicode all the way. I'd like to
> have the proper mathematical symbols for logical and set
> operations, as well as integer divide. They're all there in the
> unicode character set, after all; why should we have to
> settle for archaic character restrictions?

Because some of us use archaic systems and/or fonts which are
incapable of displaying such symbols. Never mind whether we
can read them.

Also, we would have to solve the issue of multiple representations
for the same identifier (normalized identifiers)? There are four
equivalent representations:

(u'\N{Latin small letter e with acute}l'
u'\N{Latin small letter e with grave}ve')

(u'\N{Latin small letter e with acute}l'
u'e\N{Combining grave accent}ve')

(u'e\N{Combining acute accent}l'
u'\N{Latin small letter e with grave}ve')

(u'e\N{Combining acute accent}l'
u'e\N{Combining grave accent}ve')

Unicode says we should treat these four identically. Further,
they each have a distinct hash code, so a dictionary will not
necessarily even try to compare them to find them equal.


--
-Scott David Daniels
(E-Mail Removed)
 
Reply With Quote
 
=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=
Guest
Posts: n/a
 
      02-10-2004
Paul Prescod wrote:

> You are right. Re-reading Guido's complaint I understand what you mean.
> But I have heard the argument in the past that Unicode source files
> would break introspection tools. If that isn't a concern this time
> around then disregard my suggestion.


That might be a problem, indeed. OTOH, those tools likely also
break if you use non-ASCII byte strings for identifiers.

Regards,
Martin

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Binding identifiers to known or unknown identifiers latashag@live.com Java 1 04-22-2008 04:54 AM
Advantages of denying keywords as identifiers valentin tihomirov VHDL 8 12-28-2004 06:44 PM
Why Does C++ Name-Mangle Identifiers? Karl Heinz Buchegger C++ 20 11-05-2004 10:31 AM
Link checker that checks fragment identifiers? Spartanicus HTML 2 05-25-2004 09:54 PM
RE: Allowing non-ASCII identifiers Brian Quinlan Python 2 02-13-2004 09:02 AM



Advertisments