Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Few things

Reply
Thread Tools

Few things

 
 
bearophile
Guest
Posts: n/a
 
      11-26-2004
Hello,
here are a four more questions (or suggestions) for the language
(probably people have already discussed some of/all such things:

I've seen the contracts for Python:
http://www.wayforward.net/pycontract/
http://www.python.org/peps/pep-0316.html
They look interesting and nice, how Python developers feel about
accepting something like this in the standard language? (Maybe they
are a bit complex).


I think it can be useful a little stat standard module that computes
permutations, combinations, median (quickselect), etc. There is even a
C implementation (of most of them):
http://probstat.sourceforge.net/
Probably some Win users can appreciate to have this already compiled
(and built in).


A command like this:
print 0x9f, 054135
This prints an hex and octal. I think the syntax for the hex is a bit
ugly; and the syntax for the octal looks just dangerous (and wrong) to
me.


In some python source codes that I'm finding around, I find things
like:
def foo():
'''This is just a
silly text'''
....

Because:
def foo():
'''This is just a
silly text'''
print foo.__doc__

Outputs:
This is just a
silly text

I think a better syntax for such multiline strings can be something
like: remove from all the beginnings of the lines successive to the
first one a number of spaces equal to the position of ''' in the
soucecode.
With this sintax such print outputs:
This is just a
silly text

Note: even less indentation of the lines successive the first one can
be simply ignored:
def foo2():
'''This is just a
silly text'''
print foo.__doc__

Outputs:
This is just a
silly text

Hello,
Bearophile
 
Reply With Quote
 
 
 
 
Josiah Carlson
Guest
Posts: n/a
 
      11-26-2004

http://www.velocityreviews.com/forums/(E-Mail Removed) (bearophile) wrote:
>
> Hello,
> here are a four more questions (or suggestions) for the language
> (probably people have already discussed some of/all such things:
>
> I've seen the contracts for Python:
> http://www.wayforward.net/pycontract/
> http://www.python.org/peps/pep-0316.html
> They look interesting and nice, how Python developers feel about
> accepting something like this in the standard language? (Maybe they
> are a bit complex).


Decorators can do this without additional syntax. Think @accepts and
@returns.


> I think it can be useful a little stat standard module that computes
> permutations, combinations, median (quickselect), etc. There is even a
> C implementation (of most of them):
> http://probstat.sourceforge.net/
> Probably some Win users can appreciate to have this already compiled
> (and built in).


Having a 'faq' for permutation and combination generation would be 99%
of the way there. Quickselect, really, doesn't gain you a whole lot.
Sure, it's a log factor faster to select a median, but many algorithms
involving selecting medians (at least the ones that I run into in CS
theory) end up repeatedly (logn) time selecting the 'kth' smallest
element (varying k's), where sorting would actually run slightly faster.

As for the rest of it, be specific with what you would want to be in
this mythical 'statistics' module ('stat' is already used for the
filesystem stat module). A single-pass average/standard deviation has
already been discussed for such a module, as well as give me all the
k-smallest items of this sequence, etc., but was tossed by Raymond
Hettinger due to the limited demand for such a module.


> A command like this:
> print 0x9f, 054135
> This prints an hex and octal. I think the syntax for the hex is a bit
> ugly; and the syntax for the octal looks just dangerous (and wrong) to
> me.


Internally those values are Python integers, there would need to be a
special way to tag integers as being originally hex or octal. Or the
pyc would need to store the fact that it was originally one of those
other methods specifically for the print statement.

The preferred way for doing such things (printing some internal type via
some special method) is via string interpolation:
print "0x%x 0%o"%(0x9f, 054135)

Ugly or not, "Special cases aren't special enough to break the rules."
Don't hold your breath for print doing anything special with integers.


> In some python source codes that I'm finding around, I find things
> like:
> def foo():
> '''This is just a
> silly text'''
> ...
>
> Because:
> def foo():
> '''This is just a
> silly text'''
> print foo.__doc__
>
> Outputs:
> This is just a
> silly text
>
> I think a better syntax for such multiline strings can be something
> like: remove from all the beginnings of the lines successive to the
> first one a number of spaces equal to the position of ''' in the
> soucecode.
> With this sintax such print outputs:
> This is just a
> silly text
>
> Note: even less indentation of the lines successive the first one can
> be simply ignored:
> def foo2():
> '''This is just a
> silly text'''
> print foo.__doc__
>
> Outputs:
> This is just a
> silly text


It is a wart. An option is to use:
def foo():
'''\
This is just a
silly text'''

Me, I just don't use docstrings. I put everything in comments indented
with the code. I have contemplated writing an import hook to do
pre-processing of modules to convert such comments to docstrings, but I
never actually use docstrings, so have never written the hook.


- Josiah

 
Reply With Quote
 
 
 
 
Josiah Carlson
Guest
Posts: n/a
 
      11-26-2004

Josiah Carlson <(E-Mail Removed)> wrote:

> theory) end up repeatedly (logn) time selecting the 'kth' smallest
> element (varying k's), where sorting would actually run slightly faster.


That should have read:
theory) end up repeatedly (logn times) selecting the...

- Josiah


 
Reply With Quote
 
Nick Coghlan
Guest
Posts: n/a
 
      11-26-2004
Josiah Carlson wrote:
>>A command like this:
>>print 0x9f, 054135
>>This prints an hex and octal. I think the syntax for the hex is a bit
>>ugly; and the syntax for the octal looks just dangerous (and wrong) to
>>me.

>
>
> Internally those values are Python integers, there would need to be a
> special way to tag integers as being originally hex or octal. Or the
> pyc would need to store the fact that it was originally one of those
> other methods specifically for the print statement.


I believe the OP was objecting to the spelling of "this integer literal is hex"
and "this integer literal is octal".

Python stole these spellings directly from C. Saying it's ugly without
suggesting an alternative isn't likely to result in developers taking any
action, though. (Not that that is particularly likely on this point, regardless)

If the spelling really bothers the OP, the following works:

print int("9f", 16), int("54135",

That's harder to type, is a lot slower at run-time and uses more memory, though.

Cheers,
Nick.
 
Reply With Quote
 
Nick Coghlan
Guest
Posts: n/a
 
      11-26-2004
bearophile wrote:
> I think a better syntax for such multiline strings can be something
> like: remove from all the beginnings of the lines successive to the
> first one a number of spaces equal to the position of ''' in the
> soucecode.


Indeed, a similar rule is used by docstring parsing tools (e.g. the builtin
help() function). The raw text is kept around, but the display tools clean it up
according to whatever algorithm best suits their needs.

>>> def foo():

.... '''This is just a
.... silly text'''
....
>>> print foo.__doc__

This is just a
silly text
>>> help(foo)

Help on function foo in module __main__:

foo()
This is just a
silly text

Raw docstrings are rarely what you want to be looking at. The best place for
info on recommended docstring formats is:
http://www.python.org/peps/pep-0257.html

If you absolutely, positively need the raw docstrings to be nicely formatted,
then line continuations can help you out:

>>> def bar():

.... "This is actually "\
.... "all one line\n"\
.... "but this is a second line"
....
>>> print bar.__doc__

This is actually all one line
but this is a second line

I'd advise against actually doing this, though - it's non-standard, handling the
whitespace manually is a serious pain, and most docstring parsers clean up the
indentation of the conventional form automatically.

Cheers,
Nick.
 
Reply With Quote
 
Carlos Ribeiro
Guest
Posts: n/a
 
      11-26-2004
On 25 Nov 2004 16:41:00 -0800, bearophile <(E-Mail Removed)> wrote:
> I think a better syntax for such multiline strings can be something
> like: remove from all the beginnings of the lines successive to the
> first one a number of spaces equal to the position of ''' in the
> soucecode.


I was thinking exactly about this earlier today. There is a utility
function described somewhere in the docutils documentation that does
that. I've borrowed that code and called it "stripindent". It handles
all stuff that you mentioned and also tabs & space conversion. I
already call it almost everywhere I use the triple-quote strings.

The end result is that my code is full of constructs of the form:

sqlquery = stripindent("""
select column1, column2
from sometable
where column1 > blahblahblah
""")

And I thought, "wouldn't it be nice if Python automatically
reformatted such strings"? Of course, this is not a change to be taken
lightly. Some pros and cons:

0) it automatically supports what is already done by tools such as
pydoc, coctools, doctest, and every Python-enabled IDE that gets
information from docstrings.

1) the source code reads much better; the intention of the writer in
the case above is clearly *not* to have all those extra spaces
clutering the string contents.

2) It encourages use of triple-quoted strings in real code (by making
it more practical) and avoids idioms such as:

s = stripindent("""...
""")
s = "abcdef..." +
"rstuvwxyz..."
s = "abcdef..." \
"rstuvwxyz..."

3) it uses indentation to change the string parsing behavior.
Indentation already has meaning in Python, but not in this situation.

4) It's a change, and people are usually afraid of changes, specially
in this case where it may look like there are so little to gain from
it.

5) it may break old code that uses triple-quoted strings, and that may
require the extra spaces at the beginning of each line.

6) it may lead to surprised in some cases (specially for Python old-timers).


At this point, this is not still a serious proposal, but more like a
"Call for Comments". I have another bunch of ideas being worked out
for possibly future PEPs ("iunpack" & named tuples), so why not give
this one a try?

The idea is as follows:

1) triple-quote strings will automatically be reformatted to remove
any extra space on the left side due to indentation. The indentation
will be controled by the position of the leftmost non-space character
in the string.

2) raw triple-quoted strings will *NOT* be reformatted. Any space to
the left side is deemed to be significant.

This is indeed a quite simple idea, with the potential to simplify
some code. It will also encourage people to write triple-quoted
strings for long strings, which is something that people usually do to
avoid the extra space.


--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: (E-Mail Removed)
mail: (E-Mail Removed)
 
Reply With Quote
 
Josiah Carlson
Guest
Posts: n/a
 
      11-26-2004

Nick Coghlan <(E-Mail Removed)> wrote:
>
> Josiah Carlson wrote:
> >>A command like this:
> >>print 0x9f, 054135
> >>This prints an hex and octal. I think the syntax for the hex is a bit
> >>ugly; and the syntax for the octal looks just dangerous (and wrong) to
> >>me.

> >
> >
> > Internally those values are Python integers, there would need to be a
> > special way to tag integers as being originally hex or octal. Or the
> > pyc would need to store the fact that it was originally one of those
> > other methods specifically for the print statement.

>
> I believe the OP was objecting to the spelling of "this integer literal is hex"
> and "this integer literal is octal".
>
> Python stole these spellings directly from C. Saying it's ugly without
> suggesting an alternative isn't likely to result in developers taking any
> action, though. (Not that that is particularly likely on this point, regardless)
>
> If the spelling really bothers the OP, the following works:
>
> print int("9f", 16), int("54135",
>
> That's harder to type, is a lot slower at run-time and uses more memory, though.


Perhaps, though I thought he was talking specifically about printing
(hence using a print statement). Regardless, I also don't believe the
"I don't like this" without "this is the way it should be" will result
in anything.

- Josiah

 
Reply With Quote
 
Nick Coghlan
Guest
Posts: n/a
 
      11-27-2004
Carlos Ribeiro wrote:
> The idea is as follows:
>
> 1) triple-quote strings will automatically be reformatted to remove
> any extra space on the left side due to indentation. The indentation
> will be controled by the position of the leftmost non-space character
> in the string.
>
> 2) raw triple-quoted strings will *NOT* be reformatted. Any space to
> the left side is deemed to be significant.
>
> This is indeed a quite simple idea, with the potential to simplify
> some code. It will also encourage people to write triple-quoted
> strings for long strings, which is something that people usually do to
> avoid the extra space.


I'd be +0, since the behaviour you suggest is what I generally want when I use
long strings. However, I almost always set up such strings as module globals, so
the indenting issue doesn't usually bother me in practice. . .

If the compatibility problems prove to be a deal breaker (i.e. someone somewhere
actually wants the extra space, and adding an 'r' character to the source for
compatibility with a new Python release is too much of a burden), then another
alternative is a new string type character (e.g. 't' for 'trimmed', to use the
PEP 257 terminology. 'i' for 'indented' would also work - the source code for
the string literal is indented, so that indenting should be removed from the
resulting string).

The argument against the inevitable suggestion of just using a function (as you
already do) is that a function call doesn't work for a docstring.

Cheers,
Nick.
 
Reply With Quote
 
bearophile
Guest
Posts: n/a
 
      11-30-2004
Thank you for the comments and answers, and sorry for my answering
delay...

Josiah Carlson:

>Decorators can do this without additional syntax. Think @accepts and

@returns.<

The purpose of those pre-post is to write something simile and very
*clean* that states what inputs and outputs must be. This is an
example of a pre-post conditional for a sorting function taken from
that site (all this is inside the docstring of the function):

pre:
# must be a list
isinstance(a, list)

# all elements must be comparable with all other items
forall(range(len(a)),
lambda i: forall(range(len(a)),
lambda j: (a[i] < a[j]) ^ (a[i] >= a[j])))

post[a]:
# length of array is unchanged
len(a) == len(__old__.a)

# all elements given are still in the array
forall(__old__.a, lambda e: __old__.a.count(e) == a.count(e))

# the array is sorted
forall([a[i] >= a[i-1] for i in range(1, len(a))])


Surely such things can be passed (at least as strings) to the @accepts
and @returns decorators (using a "decorate" identifier instead of @ is
probably nicer, because the @ makes Python look more like Perl, but
I've seen that lots of people have already discussed such topic). Such
testing performed by such decorators can be "switched off" with a
global boolean flag when the program is debugged and tested.
So now someone can write and let standardise a couple of good @accepts
and @returns decorators/functors :-]


>Having a 'faq' for permutation and combination generation would be

99% of the way there.<

Uh, I'm sorry, but I don't understand :-]
Aren't such functions quite well defined?


>[Fixed] Quickselect, really, doesn't gain you a whole lot. Sure, it's

a log factor faster to select a median, but many algorithms involving
selecting medians (at least the ones that I run into in CS theory) end
up repeatedly (logn times) selecting the 'kth' smallest element
(varying k's), where sorting would actually run slightly faster.<

I've done some tests with a Quickselect that I have essentially
translated and adapted to pure Python from "Numerical Recipes" (it
seems a bit faster than the Quickselect coded by Raymond Hettinger
that can be seen in the cookbook). I have seen that on my PC, on
random sequence of FP numbers, a *single* Quickselect (to find just
the median) is faster than the standard sort for lists longer than
about 3 million elements. So it's often useless.
But using Psyco, that Quickselect becomes 5-6 times faster (for long
lists), so it beats the (good) standard Sort for lists longer than
600-3000 elements. If the Quickselect works in place (as the sort)
then it returns a partially ordered list, and you can use it to
quickly select other positions (so for close positions, like the
computing of the two central values for the median, the complexity of
the second select is nearly a constant time).
So coding the Quickselect in C/Pyrex can probably make it useful.
If you are interested I can give the Python Quickselect code, etc.


>Raymond Hettinger<


I have already seen that this person is working a lot on Python, often
in the algorithmic parts.


Nick Coghlan>I believe the OP was objecting to the spelling of "this
integer literal is hex" and "this integer literal is octal".<

Right.


Josiah Carlson>Regardless, I also don't believe the "I don't like
this" without "this is the way it should be" will result in anything.<

You are right, I was mostly afraid of saying silly things... Here is:
Such syntax can be like:
number<Separator><Base>

(Putting <Base><Separator> at the beginning of the number is probably
worse and it goes against normal base representation in mathematics,
where you often subscript the base number).

<Separator> cannot be "B" or "b" (that stands for "base") because
number can be a Hex containing B too... So <Separator> can be "_"
(this is the Subscript in TeX markup, so this agrees with normal
representation of the base)

<Base> can be:
1)just an integer number representing the base (similar to the second
parameter of "int", this also allows to specify any base).
2) a symbol to represent a smaller class of possibilities, like 0=2,
1=8, 2=10, 3=16, 4=64. Instead
of such digits a letter can be used: a=2, b=8, c=10, etc.
I think the first option is better.

So integer numbers can be written like:
1010100111011_2
154545_10
777_8
afa35a_16
Fi3pK_64


Thank you to Carlos Ribeiro for your development of such doc string
ideas, I appreciate them :-]

Bear hugs,
Bearophile
 
Reply With Quote
 
Josiah Carlson
Guest
Posts: n/a
 
      11-30-2004

(E-Mail Removed) (bearophile) wrote:
>
> Thank you for the comments and answers, and sorry for my answering
> delay...
>
> Josiah Carlson:
>
> >Decorators can do this without additional syntax. Think @accepts and

> @returns.<
>
> The purpose of those pre-post is to write something simile and very
> *clean* that states what inputs and outputs must be. This is an
> example of a pre-post conditional for a sorting function taken from
> that site (all this is inside the docstring of the function):
>
> pre:
> # must be a list
> isinstance(a, list)
>
> # all elements must be comparable with all other items
> forall(range(len(a)),
> lambda i: forall(range(len(a)),
> lambda j: (a[i] < a[j]) ^ (a[i] >= a[j])))
>
> post[a]:
> # length of array is unchanged
> len(a) == len(__old__.a)
>
> # all elements given are still in the array
> forall(__old__.a, lambda e: __old__.a.count(e) == a.count(e))
>
> # the array is sorted
> forall([a[i] >= a[i-1] for i in range(1, len(a))])


That is simple and clean? In my opinion, if one wants to write such
complicated pre and post conditions, one should have to write the pre
and post condition functions that would do the test, and either use
decorators, or use calls within the function to do the tests. That is
the way it is done now, and I personally don't see a good reason to make
a change. Then again, I document and test, and haven't used pre/post
conditions in 5+ years.


> Surely such things can be passed (at least as strings) to the @accepts
> and @returns decorators (using a "decorate" identifier instead of @ is
> probably nicer, because the @ makes Python look more like Perl, but
> I've seen that lots of people have already discussed such topic). Such
> testing performed by such decorators can be "switched off" with a
> global boolean flag when the program is debugged and tested.
> So now someone can write and let standardise a couple of good @accepts
> and @returns decorators/functors :-]


Discussion of the @ decorator syntax is a moot point. Python 2.4 final
was released within the last couple days and uses @, which Guido has
decided is the way it will be. You are around 6 months too late to have
anything to say about what syntax is better or worse. It is done.


> >Having a 'faq' for permutation and combination generation would be

> 99% of the way there.<
>
> Uh, I'm sorry, but I don't understand :-]
> Aren't such functions quite well defined?


Think of it like an 'example' in the documentation, where the code is
provided for doing both permutations and combinations. There exists a
FAQ for Python that addresses all sorts of "why does Python do A and not
B" questions. Regardless, both are offered in the Python cookbook.


> >[Fixed] Quickselect, really, doesn't gain you a whole lot. Sure, it's

> a log factor faster to select a median, but many algorithms involving
> selecting medians (at least the ones that I run into in CS theory) end
> up repeatedly (logn times) selecting the 'kth' smallest element
> (varying k's), where sorting would actually run slightly faster.<
>
> I've done some tests with a Quickselect that I have essentially
> translated and adapted to pure Python from "Numerical Recipes" (it
> seems a bit faster than the Quickselect coded by Raymond Hettinger
> that can be seen in the cookbook). I have seen that on my PC, on
> random sequence of FP numbers, a *single* Quickselect (to find just
> the median) is faster than the standard sort for lists longer than
> about 3 million elements. So it's often useless.
> But using Psyco, that Quickselect becomes 5-6 times faster (for long
> lists), so it beats the (good) standard Sort for lists longer than
> 600-3000 elements. If the Quickselect works in place (as the sort)
> then it returns a partially ordered list, and you can use it to
> quickly select other positions (so for close positions, like the
> computing of the two central values for the median, the complexity of
> the second select is nearly a constant time).
> So coding the Quickselect in C/Pyrex can probably make it useful.
> If you are interested I can give the Python Quickselect code, etc.


No thank you, I have my own.


> Nick Coghlan>I believe the OP was objecting to the spelling of "this
> integer literal is hex" and "this integer literal is octal".<
>
> Right.
>
>
> Josiah Carlson>Regardless, I also don't believe the "I don't like
> this" without "this is the way it should be" will result in anything.<
>
> You are right, I was mostly afraid of saying silly things... Here is:
> Such syntax can be like:
> number<Separator><Base>
>
> (Putting <Base><Separator> at the beginning of the number is probably
> worse and it goes against normal base representation in mathematics,
> where you often subscript the base number).
>
> <Separator> cannot be "B" or "b" (that stands for "base") because
> number can be a Hex containing B too... So <Separator> can be "_"
> (this is the Subscript in TeX markup, so this agrees with normal
> representation of the base)
>
> <Base> can be:
> 1)just an integer number representing the base (similar to the second
> parameter of "int", this also allows to specify any base).
> 2) a symbol to represent a smaller class of possibilities, like 0=2,
> 1=8, 2=10, 3=16, 4=64. Instead
> of such digits a letter can be used: a=2, b=8, c=10, etc.
> I think the first option is better.
>
> So integer numbers can be written like:
> 1010100111011_2
> 154545_10
> 777_8
> afa35a_16
> Fi3pK_64


Ick. In Python, the language is generally read left to right, in a
similar fashion to english. The prefix notation of 0<octal> and 0x<hex>,
in my opinion, reads better than your postfix-with-punctuation notation.

I'll also mention that two of your examples; afa35a_16 and Fi3pK_64, are
valid python variable names through all of the Python versions I have
access to, so are ambiguous if you want to represent 'integer literals',
which have historically been 'unquoted strings prefixed with a number'.

Furthermore, there /is/ already a postfix notation for representing
integers, though it doesn't support all bases at the moment, requires
a bit more punctuation, and is runtime-evaluated:

>>> int('1010', 2)

10
>>> int('1010100111011',2)

5435
>>> int('154545',10)

154545
>>> int('777',

511
>>> int('afa35a',16)

11510618
>>> int('Fi3pK',64)

Traceback (most recent call last):
File "<stdin>", line 1, in ?
ValueError: int() base must be >= 2 and <= 36
>>>



Your second option (replacing the _2, _10, etc., with _1, _2, ...) is,
in my opinion, ****. You take something that is unambiguous (base
representation) and make it ambiguous through the use of a numbering of
a set of 'standard' bases. What is the use of representing base 10 as a
'2' or 'c'? I cannot think of a good reason to do so, unless being
almost unreadable is desireable.


An option if you want to get all of the base representations available
is a prefix notation that is similar to what already exists. I'm not
advocating it (because I also think its crap), but the following fixes
the problems with your postfix notation, and is explicit about bases.
0<base>_<number>
like:
016_feff
02_10010010101
010_9329765872
08_767

The above syntax is:
1. unambiguous
2. readable from left-to-right

Note that I think that the syntax that I just provided is ugly. I much
prefer just using decimal and offering the proper base notation
afterwards in a comment...

val = 15 # 1111 in binary
val = 35 # 0x23 in hex
val = 17 # 021 in octal


- Josiah

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
vs2005 publish website doing bad things, bad things =?Utf-8?B?V2lsbGlhbSBTdWxsaXZhbg==?= ASP .Net 1 10-25-2006 06:18 PM
Need help with composition (and a few other things) C J Campbell Digital Photography 2 10-22-2004 08:11 PM
A few things remain unclear... Eirik WS C Programming 9 02-03-2004 11:52 PM
Need some advice on a few things neil C++ 5 01-16-2004 12:21 AM
Few things - Validation, Tables etc PJ ASP .Net 2 07-31-2003 10:19 PM



Advertisments