Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > unicode mystery

Reply
Thread Tools

unicode mystery

 
 
Steve Holden
Guest
Posts: n/a
 
      01-13-2005
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:

> John Machin wrote:
>
>
>>I regard continued usage of octal as a pox and a pestilence.

>
>
> Quite agree. I was disappointed that it ever made it into Python.
>
> Octal's only use is:
>
> a) umasks
> b) confusing the hell out of normal non-programmers for whom a
> leading zero is in no way magic
>
> (a) does not outweigh (b).
>
> In Mythical Future Python I would like to be able to use any base in
> integer literals, which would be better. Example random syntax:
>
> flags= 2x00011010101001
> umask= 8x664
> answer= 10x42
> addr= 16x0E800004 # 16x == 0x
> gunk= 36x8H6Z9A0X
>
> But either way, I want rid of 0->octal.
>
>
>>Is it not regretted?

>
>
> Maybe the problem just doesn't occur to people who have used C too
> long.
>



> OT: Also, if Google doesn't stop lstrip()ing my posts I may have to get
> a proper news feed. What use is that on a Python newsgroup? Grr.


I remember using a langauge (Icon?) in which arbitrary bases up to 36
could be used with numeric literals. IIRC, the literals had to begin
with the base in decimnal, folowed by a "b" followed by the digits of
the value using a through z for digits from ten to thirty-five. So

gunk = 36b8H6Z9A0X

would have been valid.

nothing-new-under-the-sun-ly y'rs - steve
--
Steve Holden http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/
Holden Web LLC +1 703 861 4237 +1 800 494 3119

 
Reply With Quote
 
 
 
 
Dan Sommers
Guest
Posts: n/a
 
      01-13-2005
On Thu, 13 Jan 2005 09:56:15 -0500,
Steve Holden <(E-Mail Removed)> wrote:

> I remember using a langauge (Icon?) in which arbitrary bases up to 36
> could be used with numeric literals. IIRC, the literals had to begin
> with the base in decimnal, folowed by a "b" followed by the digits of
> the value using a through z for digits from ten to thirty-five. So


> gunk = 36b8H6Z9A0X


> would have been valid.


Lisp also allows for literals in bases from 2 through 36.

Lisp also allows programs to change the default (away from decimal), so
that an "identifier" like aa is read by the parser as a numeric constant
with the decimal value of 170. Obviously, this has to be used with
care, but makes reading external data files written in strange bases
very easy.

> nothing-new-under-the-sun-ly y'rs - steve


every-language-wants-to-be-lisp-ly y'rs,
Dan

--
Dan Sommers
<http://www.tombstonezero.net/dan/>
Never play leapfrog with a unicorn.
 
Reply With Quote
 
 
 
 
Leif K-Brooks
Guest
Posts: n/a
 
      01-13-2005
Tim Roberts wrote:
> Stephen Thorne <(E-Mail Removed)> wrote:
>
>>I would actually like to see pychecker pick up conceptual errors like this:
>>
>>import datetime
>>datetime.datetime(2005, 04,04)

>
>
> Why is that a conceptual error? Syntactically, this could be a valid call
> to a function. Even if you have parsed and executed datetime, so that you
> know datetime.datetime is a class, it's quite possible that the creation
> and destruction of an object might have useful side effects.


I'm guessing that Stephen is saying that PyChecker should have special
knowledge of the datetime module and of the fact that dates are often
specified with a leading zero, and therefor complain that they shouldn't
be used that way in Python source code.
 
Reply With Quote
 
Bengt Richter
Guest
Posts: n/a
 
      01-13-2005
On Thu, 13 Jan 2005 08:18:25 -0500, Peter Hansen <(E-Mail Removed)> wrote:

>(E-Mail Removed) wrote:
>> In Mythical Future Python I would like to be able to use any base in
>> integer literals, which would be better. Example random syntax:
>>
>> flags= 2x00011010101001
>> umask= 8x664
>> answer= 10x42
>> addr= 16x0E800004 # 16x == 0x
>> gunk= 36x8H6Z9A0X

>
>I think I kinda like this idea. Allowing arbitrary values,
>however, would probably be pointless, as there are very
>few bases in common enough use that a language should make
>it easy to write literals in any of them. So I think "36x"
>is silly, and would suggest limiting this to 2, 8, 10, and
>16. At the very least, a range of 2-16 should be used.
>(It would be cute but pointless to allow 1x000000000.
>

My concern is negative numbers when you are interested in the
bits of a typical twos-complement number. (BTW, please don't tell me
that's platform-specific hardware oriented stuff: Two's complement is
a fine abstraction for interpreting a bit vector, which is another
fine abstraction

One way to do it consistently is to have a sign digit as the first
digit after the x, which is either 0 or base-1 -- e.g., +3 and -3 would be

2x011 2x101
8x03 8x75
16x03 16xfd
10x03 10x97

Then the "sign digit" can be extended indefinitely to the left without
changing the value, noting that -3 == 97-100 == 997-1000) and similarly
for other bases: -3 == binary 101-1000 == 111101-1000000 etc. IOW, you just
subtract base**<number of digits in representation> if the first digit
is base-1, to get the negative value.

This would let us have a %<width>.<base>b format to generate the digits part
and would get around the ugly hack for writing hex literals of negative numbers.

def __repr__(self): return '<%s object at %08.16b>' %(type(self).__name__, id(self))

and you could write based literals in the above formats with e.g., with
'16x%.16b' % number
or
'2x%.2b' % number
etc.

Regards,
Bengt Richter
 
Reply With Quote
 
Jeff Epler
Guest
Posts: n/a
 
      01-13-2005
On Thu, Jan 13, 2005 at 11:04:21PM +0000, Bengt Richter wrote:
> One way to do it consistently is to have a sign digit as the first
> digit after the x, which is either 0 or base-1 -- e.g., +3 and -3 would be
>
> 2x011 2x101
> 8x03 8x75
> 16x03 16xfd
> 10x03 10x97


... so that 0x8 and 16x8 would be different? So that 2x1 and 2x01 would
be different?

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFB5weEJd01MZaTXX0RAvDCAJ46GzXpJobv6dSiIKNGmy elRerFPwCfXmRs
9i51i1LelXbZO26izFwYv58=
=vjU0
-----END PGP SIGNATURE-----

 
Reply With Quote
 
Bengt Richter
Guest
Posts: n/a
 
      01-14-2005
On Thu, 13 Jan 2005 17:43:01 -0600, Jeff Epler <(E-Mail Removed)> wrote:

>
>--LQksG6bCIzRHxTLp
>Content-Type: text/plain; charset=us-ascii
>Content-Disposition: inline
>Content-Transfer-Encoding: quoted-printable
>
>On Thu, Jan 13, 2005 at 11:04:21PM +0000, Bengt Richter wrote:
>> One way to do it consistently is to have a sign digit as the first
>> digit after the x, which is either 0 or base-1 -- e.g., +3 and -3 would be
>>=20
>> 2x011 2x101
>> 8x03 8x75
>> 16x03 16xfd
>> 10x03 10x97

>
>=2E.. so that 0x8 and 16x8 would be different? So that 2x1 and 2x01 would
>be different?


I guess I didn't make the encoding clear enough, sorry ;-/

16x8 would be illegal as a literal, since the first digit after x is not 0 or f (base-1)
0x8 would be spelled 16x08 in <base>x format.

2x1 _could_ be allowed as a degenerate form of the general negative format for -1, where
any number of leading base-1 "sign digits" compute to the same value. Hence

2x1 == 1*2**0 - 2**1 == -1
2x11 == 1*2**0 + 1*2**1 - 2**2 == 1 + 2 -4 == -1
2x111 == 1*2**0 + 1*2**1 + 1*2**2 - 2**4 == 1 + 2 + 4 - 8 == -1

16f == 15*16**0 - 16**1 = 15 -16 == -1
16ff == 15*6**0 + 15*16**1 - 16**2 = 15 + 240 - 256 == -1

etc. Although IMO it will be more consistent to require a leading "sign digit" even if redundant.
The one exception would be zero, since 00 doesn't look too cool in company with a collection
of other numbers if output minimally without their (known) base prefixes, e.g.,
-2,-1,0,1,2 => 110 11 0 01 010

Of course these numbers can be output in constant width with leading pads of 0 or base-1 digit according
to sign, e.g., 1110 1111 0000 0001 0010 for the same set width 4. That would be format '%04.2b' and
if you wanted the literal-making prefix, '2x%04.2b' would produce legal fixed width literals 2x1110 2x1111 etc.

Regards,
Bengt Richter
 
Reply With Quote
 
Simon Brunning
Guest
Posts: n/a
 
      01-14-2005
On Thu, 13 Jan 2005 16:50:56 -0500, Leif K-Brooks <(E-Mail Removed)> wrote:
> Tim Roberts wrote:
> > Stephen Thorne <(E-Mail Removed)> wrote:
> >
> >>I would actually like to see pychecker pick up conceptual errors like this:
> >>
> >>import datetime
> >>datetime.datetime(2005, 04,04)

> >
> >
> > Why is that a conceptual error? Syntactically, this could be a valid call
> > to a function. Even if you have parsed and executed datetime, so that you
> > know datetime.datetime is a class, it's quite possible that the creation
> > and destruction of an object might have useful side effects.

>
> I'm guessing that Stephen is saying that PyChecker should have special
> knowledge of the datetime module and of the fact that dates are often
> specified with a leading zero, and therefor complain that they shouldn't
> be used that way in Python source code.


It would be useful if PyChecker warned you when you specify an octal
literal and where the value would differ from what you might expect if
you didn't realise that you were specifying an octal literal.

x = 04 # This doesn't need a warning: 04 == 4
#x = 09 # This doesn't need a warning: it will fail to compile
x= 012 # This *does* need a warning: 012 == 10

--
Cheers,
Simon B,
(E-Mail Removed),
http://www.brunningonline.net/simon/blog/
 
Reply With Quote
 
Reinhold Birkenfeld
Guest
Posts: n/a
 
      01-14-2005
Simon Brunning wrote:
> On Thu, 13 Jan 2005 16:50:56 -0500, Leif K-Brooks <(E-Mail Removed)> wrote:
>> Tim Roberts wrote:
>> > Stephen Thorne <(E-Mail Removed)> wrote:
>> >
>> >>I would actually like to see pychecker pick up conceptual errors like this:
>> >>
>> >>import datetime
>> >>datetime.datetime(2005, 04,04)
>> >
>> >
>> > Why is that a conceptual error? Syntactically, this could be a valid call
>> > to a function. Even if you have parsed and executed datetime, so that you
>> > know datetime.datetime is a class, it's quite possible that the creation
>> > and destruction of an object might have useful side effects.

>>
>> I'm guessing that Stephen is saying that PyChecker should have special
>> knowledge of the datetime module and of the fact that dates are often
>> specified with a leading zero, and therefor complain that they shouldn't
>> be used that way in Python source code.

>
> It would be useful if PyChecker warned you when you specify an octal
> literal and where the value would differ from what you might expect if
> you didn't realise that you were specifying an octal literal.
>
> x = 04 # This doesn't need a warning: 04 == 4
> #x = 09 # This doesn't need a warning: it will fail to compile
> x= 012 # This *does* need a warning: 012 == 10


Well, this would generate warnings for all octal literals except
01, 02, 03, 04, 05, 06 and 07.

However, I would vote +1 for adding such an option to PyChecker. For
code that explicitly uses octals, it can be turned off and it is _very_
confusing to newbies...

Reinhold
 
Reply With Quote
 
Reinhold Birkenfeld
Guest
Posts: n/a
 
      01-14-2005
Bengt Richter wrote:
> On Thu, 13 Jan 2005 08:18:25 -0500, Peter Hansen <(E-Mail Removed)> wrote:
>
>>(E-Mail Removed) wrote:
>>> In Mythical Future Python I would like to be able to use any base in
>>> integer literals, which would be better. Example random syntax:
>>>
>>> flags= 2x00011010101001
>>> umask= 8x664
>>> answer= 10x42
>>> addr= 16x0E800004 # 16x == 0x
>>> gunk= 36x8H6Z9A0X

>>
>>I think I kinda like this idea. Allowing arbitrary values,
>>however, would probably be pointless, as there are very
>>few bases in common enough use that a language should make
>>it easy to write literals in any of them. So I think "36x"
>>is silly, and would suggest limiting this to 2, 8, 10, and
>>16. At the very least, a range of 2-16 should be used.
>>(It would be cute but pointless to allow 1x000000000.
>>

> My concern is negative numbers when you are interested in the
> bits of a typical twos-complement number. (BTW, please don't tell me
> that's platform-specific hardware oriented stuff: Two's complement is
> a fine abstraction for interpreting a bit vector, which is another
> fine abstraction
>
> One way to do it consistently is to have a sign digit as the first
> digit after the x, which is either 0 or base-1 -- e.g., +3 and -3 would be
>
> 2x011 2x101
> 8x03 8x75
> 16x03 16xfd
> 10x03 10x97


Why not just -2x11? IMHO, Py2.4 does not produce negative values out of
hex or oct literals any longer, so your proposal would be inconsistent.

Reinhold
 
Reply With Quote
 
JCM
Guest
Posts: n/a
 
      01-14-2005
(E-Mail Removed) wrote:
....
> In Mythical Future Python I would like to be able to use any base in
> integer literals, which would be better. Example random syntax:


> flags= 2x00011010101001
> umask= 8x664
> answer= 10x42
> addr= 16x0E800004 # 16x == 0x
> gunk= 36x8H6Z9A0X


I'd prefer using the leftmost character as a two's complement
extension bit.

0x1 : 1 in hex notation
1xf : -1 in hex notation, or conceptually an infinitely long string of 1s
0c12 : 10 in octal noataion
1c12 : -54 in octal (I think)
0d12 : 12 in decimal
0b10 : 2 in binary
etc

I leave it to the reader to decide whether I'm joking.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
os.lisdir, gets unicode, returns unicode... USUALLY?!?!? gabor Python 13 11-18-2006 09:23 AM
unicode mystery/problem Petr Jakes Python 4 09-22-2006 02:09 PM
Unicode digit to unicode string Gabriele *darkbard* Farina Python 2 05-16-2006 01:15 PM
unicode wrap unicode object? ygao Python 6 04-08-2006 09:54 AM
Unicode + jsp + mysql + tomcat = unicode still not displaying Robert Mark Bram Java 0 09-28-2003 05:37 AM



Advertisments