![]() |
Few things
Hello,
here are a four more questions (or suggestions) for the language (probably people have already discussed some of/all such things: I've seen the contracts for Python: http://www.wayforward.net/pycontract/ http://www.python.org/peps/pep-0316.html They look interesting and nice, how Python developers feel about accepting something like this in the standard language? (Maybe they are a bit complex). I think it can be useful a little stat standard module that computes permutations, combinations, median (quickselect), etc. There is even a C implementation (of most of them): http://probstat.sourceforge.net/ Probably some Win users can appreciate to have this already compiled (and built in). A command like this: print 0x9f, 054135 This prints an hex and octal. I think the syntax for the hex is a bit ugly; and the syntax for the octal looks just dangerous (and wrong) to me. In some python source codes that I'm finding around, I find things like: def foo(): '''This is just a silly text''' .... Because: def foo(): '''This is just a silly text''' print foo.__doc__ Outputs: This is just a silly text I think a better syntax for such multiline strings can be something like: remove from all the beginnings of the lines successive to the first one a number of spaces equal to the position of ''' in the soucecode. With this sintax such print outputs: This is just a silly text Note: even less indentation of the lines successive the first one can be simply ignored: def foo2(): '''This is just a silly text''' print foo.__doc__ Outputs: This is just a silly text Hello, Bearophile |
Re: Few things
bearophileHUGS@lycos.com (bearophile) wrote: > > Hello, > here are a four more questions (or suggestions) for the language > (probably people have already discussed some of/all such things: > > I've seen the contracts for Python: > http://www.wayforward.net/pycontract/ > http://www.python.org/peps/pep-0316.html > They look interesting and nice, how Python developers feel about > accepting something like this in the standard language? (Maybe they > are a bit complex). Decorators can do this without additional syntax. Think @accepts and @returns. > I think it can be useful a little stat standard module that computes > permutations, combinations, median (quickselect), etc. There is even a > C implementation (of most of them): > http://probstat.sourceforge.net/ > Probably some Win users can appreciate to have this already compiled > (and built in). Having a 'faq' for permutation and combination generation would be 99% of the way there. Quickselect, really, doesn't gain you a whole lot. Sure, it's a log factor faster to select a median, but many algorithms involving selecting medians (at least the ones that I run into in CS theory) end up repeatedly (logn) time selecting the 'kth' smallest element (varying k's), where sorting would actually run slightly faster. As for the rest of it, be specific with what you would want to be in this mythical 'statistics' module ('stat' is already used for the filesystem stat module). A single-pass average/standard deviation has already been discussed for such a module, as well as give me all the k-smallest items of this sequence, etc., but was tossed by Raymond Hettinger due to the limited demand for such a module. > A command like this: > print 0x9f, 054135 > This prints an hex and octal. I think the syntax for the hex is a bit > ugly; and the syntax for the octal looks just dangerous (and wrong) to > me. Internally those values are Python integers, there would need to be a special way to tag integers as being originally hex or octal. Or the pyc would need to store the fact that it was originally one of those other methods specifically for the print statement. The preferred way for doing such things (printing some internal type via some special method) is via string interpolation: print "0x%x 0%o"%(0x9f, 054135) Ugly or not, "Special cases aren't special enough to break the rules." Don't hold your breath for print doing anything special with integers. > In some python source codes that I'm finding around, I find things > like: > def foo(): > '''This is just a > silly text''' > ... > > Because: > def foo(): > '''This is just a > silly text''' > print foo.__doc__ > > Outputs: > This is just a > silly text > > I think a better syntax for such multiline strings can be something > like: remove from all the beginnings of the lines successive to the > first one a number of spaces equal to the position of ''' in the > soucecode. > With this sintax such print outputs: > This is just a > silly text > > Note: even less indentation of the lines successive the first one can > be simply ignored: > def foo2(): > '''This is just a > silly text''' > print foo.__doc__ > > Outputs: > This is just a > silly text It is a wart. An option is to use: def foo(): '''\ This is just a silly text''' Me, I just don't use docstrings. I put everything in comments indented with the code. I have contemplated writing an import hook to do pre-processing of modules to convert such comments to docstrings, but I never actually use docstrings, so have never written the hook. - Josiah |
Re: Few things
Josiah Carlson <jcarlson@uci.edu> wrote: > theory) end up repeatedly (logn) time selecting the 'kth' smallest > element (varying k's), where sorting would actually run slightly faster. That should have read: theory) end up repeatedly (logn times) selecting the... - Josiah |
Re: Few things
Josiah Carlson wrote:
>>A command like this: >>print 0x9f, 054135 >>This prints an hex and octal. I think the syntax for the hex is a bit >>ugly; and the syntax for the octal looks just dangerous (and wrong) to >>me. > > > Internally those values are Python integers, there would need to be a > special way to tag integers as being originally hex or octal. Or the > pyc would need to store the fact that it was originally one of those > other methods specifically for the print statement. I believe the OP was objecting to the spelling of "this integer literal is hex" and "this integer literal is octal". Python stole these spellings directly from C. Saying it's ugly without suggesting an alternative isn't likely to result in developers taking any action, though. (Not that that is particularly likely on this point, regardless) If the spelling really bothers the OP, the following works: print int("9f", 16), int("54135", 8) That's harder to type, is a lot slower at run-time and uses more memory, though. Cheers, Nick. |
Re: Few things
bearophile wrote:
> I think a better syntax for such multiline strings can be something > like: remove from all the beginnings of the lines successive to the > first one a number of spaces equal to the position of ''' in the > soucecode. Indeed, a similar rule is used by docstring parsing tools (e.g. the builtin help() function). The raw text is kept around, but the display tools clean it up according to whatever algorithm best suits their needs. >>> def foo(): .... '''This is just a .... silly text''' .... >>> print foo.__doc__ This is just a silly text >>> help(foo) Help on function foo in module __main__: foo() This is just a silly text Raw docstrings are rarely what you want to be looking at. The best place for info on recommended docstring formats is: http://www.python.org/peps/pep-0257.html If you absolutely, positively need the raw docstrings to be nicely formatted, then line continuations can help you out: >>> def bar(): .... "This is actually "\ .... "all one line\n"\ .... "but this is a second line" .... >>> print bar.__doc__ This is actually all one line but this is a second line I'd advise against actually doing this, though - it's non-standard, handling the whitespace manually is a serious pain, and most docstring parsers clean up the indentation of the conventional form automatically. Cheers, Nick. |
Automatic reformatting of triple-quoted strings (was Re: Few things)
On 25 Nov 2004 16:41:00 -0800, bearophile <bearophilehugs@lycos.com> wrote:
> I think a better syntax for such multiline strings can be something > like: remove from all the beginnings of the lines successive to the > first one a number of spaces equal to the position of ''' in the > soucecode. I was thinking exactly about this earlier today. There is a utility function described somewhere in the docutils documentation that does that. I've borrowed that code and called it "stripindent". It handles all stuff that you mentioned and also tabs & space conversion. I already call it almost everywhere I use the triple-quote strings. The end result is that my code is full of constructs of the form: sqlquery = stripindent(""" select column1, column2 from sometable where column1 > blahblahblah """) And I thought, "wouldn't it be nice if Python automatically reformatted such strings"? Of course, this is not a change to be taken lightly. Some pros and cons: 0) it automatically supports what is already done by tools such as pydoc, coctools, doctest, and every Python-enabled IDE that gets information from docstrings. 1) the source code reads much better; the intention of the writer in the case above is clearly *not* to have all those extra spaces clutering the string contents. 2) It encourages use of triple-quoted strings in real code (by making it more practical) and avoids idioms such as: s = stripindent("""... """) s = "abcdef..." + "rstuvwxyz..." s = "abcdef..." \ "rstuvwxyz..." 3) it uses indentation to change the string parsing behavior. Indentation already has meaning in Python, but not in this situation. 4) It's a change, and people are usually afraid of changes, specially in this case where it may look like there are so little to gain from it. 5) it may break old code that uses triple-quoted strings, and that may require the extra spaces at the beginning of each line. 6) it may lead to surprised in some cases (specially for Python old-timers). At this point, this is not still a serious proposal, but more like a "Call for Comments". I have another bunch of ideas being worked out for possibly future PEPs ("iunpack" & named tuples), so why not give this one a try? The idea is as follows: 1) triple-quote strings will automatically be reformatted to remove any extra space on the left side due to indentation. The indentation will be controled by the position of the leftmost non-space character in the string. 2) raw triple-quoted strings will *NOT* be reformatted. Any space to the left side is deemed to be significant. This is indeed a quite simple idea, with the potential to simplify some code. It will also encourage people to write triple-quoted strings for long strings, which is something that people usually do to avoid the extra space. -- Carlos Ribeiro Consultoria em Projetos blog: http://rascunhosrotos.blogspot.com blog: http://pythonnotes.blogspot.com mail: carribeiro@gmail.com mail: carribeiro@yahoo.com |
Re: Few things
Nick Coghlan <ncoghlan@email.com> wrote: > > Josiah Carlson wrote: > >>A command like this: > >>print 0x9f, 054135 > >>This prints an hex and octal. I think the syntax for the hex is a bit > >>ugly; and the syntax for the octal looks just dangerous (and wrong) to > >>me. > > > > > > Internally those values are Python integers, there would need to be a > > special way to tag integers as being originally hex or octal. Or the > > pyc would need to store the fact that it was originally one of those > > other methods specifically for the print statement. > > I believe the OP was objecting to the spelling of "this integer literal is hex" > and "this integer literal is octal". > > Python stole these spellings directly from C. Saying it's ugly without > suggesting an alternative isn't likely to result in developers taking any > action, though. (Not that that is particularly likely on this point, regardless) > > If the spelling really bothers the OP, the following works: > > print int("9f", 16), int("54135", 8) > > That's harder to type, is a lot slower at run-time and uses more memory, though. Perhaps, though I thought he was talking specifically about printing (hence using a print statement). Regardless, I also don't believe the "I don't like this" without "this is the way it should be" will result in anything. - Josiah |
Re: Automatic reformatting of triple-quoted strings (was Re: Fewthings)
Carlos Ribeiro wrote:
> The idea is as follows: > > 1) triple-quote strings will automatically be reformatted to remove > any extra space on the left side due to indentation. The indentation > will be controled by the position of the leftmost non-space character > in the string. > > 2) raw triple-quoted strings will *NOT* be reformatted. Any space to > the left side is deemed to be significant. > > This is indeed a quite simple idea, with the potential to simplify > some code. It will also encourage people to write triple-quoted > strings for long strings, which is something that people usually do to > avoid the extra space. I'd be +0, since the behaviour you suggest is what I generally want when I use long strings. However, I almost always set up such strings as module globals, so the indenting issue doesn't usually bother me in practice. . . If the compatibility problems prove to be a deal breaker (i.e. someone somewhere actually wants the extra space, and adding an 'r' character to the source for compatibility with a new Python release is too much of a burden), then another alternative is a new string type character (e.g. 't' for 'trimmed', to use the PEP 257 terminology. 'i' for 'indented' would also work - the source code for the string literal is indented, so that indenting should be removed from the resulting string). The argument against the inevitable suggestion of just using a function (as you already do) is that a function call doesn't work for a docstring. Cheers, Nick. |
Re: Few things
Thank you for the comments and answers, and sorry for my answering
delay... Josiah Carlson: >Decorators can do this without additional syntax. Think @accepts and @returns.< The purpose of those pre-post is to write something simile and very *clean* that states what inputs and outputs must be. This is an example of a pre-post conditional for a sorting function taken from that site (all this is inside the docstring of the function): pre: # must be a list isinstance(a, list) # all elements must be comparable with all other items forall(range(len(a)), lambda i: forall(range(len(a)), lambda j: (a[i] < a[j]) ^ (a[i] >= a[j]))) post[a]: # length of array is unchanged len(a) == len(__old__.a) # all elements given are still in the array forall(__old__.a, lambda e: __old__.a.count(e) == a.count(e)) # the array is sorted forall([a[i] >= a[i-1] for i in range(1, len(a))]) Surely such things can be passed (at least as strings) to the @accepts and @returns decorators (using a "decorate" identifier instead of @ is probably nicer, because the @ makes Python look more like Perl, but I've seen that lots of people have already discussed such topic). Such testing performed by such decorators can be "switched off" with a global boolean flag when the program is debugged and tested. So now someone can write and let standardise a couple of good @accepts and @returns decorators/functors :-] >Having a 'faq' for permutation and combination generation would be 99% of the way there.< Uh, I'm sorry, but I don't understand :-] Aren't such functions quite well defined? >[Fixed] Quickselect, really, doesn't gain you a whole lot. Sure, it's a log factor faster to select a median, but many algorithms involving selecting medians (at least the ones that I run into in CS theory) end up repeatedly (logn times) selecting the 'kth' smallest element (varying k's), where sorting would actually run slightly faster.< I've done some tests with a Quickselect that I have essentially translated and adapted to pure Python from "Numerical Recipes" (it seems a bit faster than the Quickselect coded by Raymond Hettinger that can be seen in the cookbook). I have seen that on my PC, on random sequence of FP numbers, a *single* Quickselect (to find just the median) is faster than the standard sort for lists longer than about 3 million elements. So it's often useless. But using Psyco, that Quickselect becomes 5-6 times faster (for long lists), so it beats the (good) standard Sort for lists longer than 600-3000 elements. If the Quickselect works in place (as the sort) then it returns a partially ordered list, and you can use it to quickly select other positions (so for close positions, like the computing of the two central values for the median, the complexity of the second select is nearly a constant time). So coding the Quickselect in C/Pyrex can probably make it useful. If you are interested I can give the Python Quickselect code, etc. >Raymond Hettinger< I have already seen that this person is working a lot on Python, often in the algorithmic parts. Nick Coghlan>I believe the OP was objecting to the spelling of "this integer literal is hex" and "this integer literal is octal".< Right. Josiah Carlson>Regardless, I also don't believe the "I don't like this" without "this is the way it should be" will result in anything.< You are right, I was mostly afraid of saying silly things... Here is: Such syntax can be like: number<Separator><Base> (Putting <Base><Separator> at the beginning of the number is probably worse and it goes against normal base representation in mathematics, where you often subscript the base number). <Separator> cannot be "B" or "b" (that stands for "base") because number can be a Hex containing B too... So <Separator> can be "_" (this is the Subscript in TeX markup, so this agrees with normal representation of the base) <Base> can be: 1)just an integer number representing the base (similar to the second parameter of "int", this also allows to specify any base). 2) a symbol to represent a smaller class of possibilities, like 0=2, 1=8, 2=10, 3=16, 4=64. Instead of such digits a letter can be used: a=2, b=8, c=10, etc. I think the first option is better. So integer numbers can be written like: 1010100111011_2 154545_10 777_8 afa35a_16 Fi3pK_64 Thank you to Carlos Ribeiro for your development of such doc string ideas, I appreciate them :-] Bear hugs, Bearophile |
Re: Few things
bearophileHUGS@lycos.com (bearophile) wrote: > > Thank you for the comments and answers, and sorry for my answering > delay... > > Josiah Carlson: > > >Decorators can do this without additional syntax. Think @accepts and > @returns.< > > The purpose of those pre-post is to write something simile and very > *clean* that states what inputs and outputs must be. This is an > example of a pre-post conditional for a sorting function taken from > that site (all this is inside the docstring of the function): > > pre: > # must be a list > isinstance(a, list) > > # all elements must be comparable with all other items > forall(range(len(a)), > lambda i: forall(range(len(a)), > lambda j: (a[i] < a[j]) ^ (a[i] >= a[j]))) > > post[a]: > # length of array is unchanged > len(a) == len(__old__.a) > > # all elements given are still in the array > forall(__old__.a, lambda e: __old__.a.count(e) == a.count(e)) > > # the array is sorted > forall([a[i] >= a[i-1] for i in range(1, len(a))]) That is simple and clean? In my opinion, if one wants to write such complicated pre and post conditions, one should have to write the pre and post condition functions that would do the test, and either use decorators, or use calls within the function to do the tests. That is the way it is done now, and I personally don't see a good reason to make a change. Then again, I document and test, and haven't used pre/post conditions in 5+ years. > Surely such things can be passed (at least as strings) to the @accepts > and @returns decorators (using a "decorate" identifier instead of @ is > probably nicer, because the @ makes Python look more like Perl, but > I've seen that lots of people have already discussed such topic). Such > testing performed by such decorators can be "switched off" with a > global boolean flag when the program is debugged and tested. > So now someone can write and let standardise a couple of good @accepts > and @returns decorators/functors :-] Discussion of the @ decorator syntax is a moot point. Python 2.4 final was released within the last couple days and uses @, which Guido has decided is the way it will be. You are around 6 months too late to have anything to say about what syntax is better or worse. It is done. > >Having a 'faq' for permutation and combination generation would be > 99% of the way there.< > > Uh, I'm sorry, but I don't understand :-] > Aren't such functions quite well defined? Think of it like an 'example' in the documentation, where the code is provided for doing both permutations and combinations. There exists a FAQ for Python that addresses all sorts of "why does Python do A and not B" questions. Regardless, both are offered in the Python cookbook. > >[Fixed] Quickselect, really, doesn't gain you a whole lot. Sure, it's > a log factor faster to select a median, but many algorithms involving > selecting medians (at least the ones that I run into in CS theory) end > up repeatedly (logn times) selecting the 'kth' smallest element > (varying k's), where sorting would actually run slightly faster.< > > I've done some tests with a Quickselect that I have essentially > translated and adapted to pure Python from "Numerical Recipes" (it > seems a bit faster than the Quickselect coded by Raymond Hettinger > that can be seen in the cookbook). I have seen that on my PC, on > random sequence of FP numbers, a *single* Quickselect (to find just > the median) is faster than the standard sort for lists longer than > about 3 million elements. So it's often useless. > But using Psyco, that Quickselect becomes 5-6 times faster (for long > lists), so it beats the (good) standard Sort for lists longer than > 600-3000 elements. If the Quickselect works in place (as the sort) > then it returns a partially ordered list, and you can use it to > quickly select other positions (so for close positions, like the > computing of the two central values for the median, the complexity of > the second select is nearly a constant time). > So coding the Quickselect in C/Pyrex can probably make it useful. > If you are interested I can give the Python Quickselect code, etc. No thank you, I have my own. > Nick Coghlan>I believe the OP was objecting to the spelling of "this > integer literal is hex" and "this integer literal is octal".< > > Right. > > > Josiah Carlson>Regardless, I also don't believe the "I don't like > this" without "this is the way it should be" will result in anything.< > > You are right, I was mostly afraid of saying silly things... Here is: > Such syntax can be like: > number<Separator><Base> > > (Putting <Base><Separator> at the beginning of the number is probably > worse and it goes against normal base representation in mathematics, > where you often subscript the base number). > > <Separator> cannot be "B" or "b" (that stands for "base") because > number can be a Hex containing B too... So <Separator> can be "_" > (this is the Subscript in TeX markup, so this agrees with normal > representation of the base) > > <Base> can be: > 1)just an integer number representing the base (similar to the second > parameter of "int", this also allows to specify any base). > 2) a symbol to represent a smaller class of possibilities, like 0=2, > 1=8, 2=10, 3=16, 4=64. Instead > of such digits a letter can be used: a=2, b=8, c=10, etc. > I think the first option is better. > > So integer numbers can be written like: > 1010100111011_2 > 154545_10 > 777_8 > afa35a_16 > Fi3pK_64 Ick. In Python, the language is generally read left to right, in a similar fashion to english. The prefix notation of 0<octal> and 0x<hex>, in my opinion, reads better than your postfix-with-punctuation notation. I'll also mention that two of your examples; afa35a_16 and Fi3pK_64, are valid python variable names through all of the Python versions I have access to, so are ambiguous if you want to represent 'integer literals', which have historically been 'unquoted strings prefixed with a number'. Furthermore, there /is/ already a postfix notation for representing integers, though it doesn't support all bases at the moment, requires a bit more punctuation, and is runtime-evaluated: >>> int('1010', 2) 10 >>> int('1010100111011',2) 5435 >>> int('154545',10) 154545 >>> int('777',8) 511 >>> int('afa35a',16) 11510618 >>> int('Fi3pK',64) Traceback (most recent call last): File "<stdin>", line 1, in ? ValueError: int() base must be >= 2 and <= 36 >>> Your second option (replacing the _2, _10, etc., with _1, _2, ...) is, in my opinion, ****. You take something that is unambiguous (base representation) and make it ambiguous through the use of a numbering of a set of 'standard' bases. What is the use of representing base 10 as a '2' or 'c'? I cannot think of a good reason to do so, unless being almost unreadable is desireable. An option if you want to get all of the base representations available is a prefix notation that is similar to what already exists. I'm not advocating it (because I also think its crap), but the following fixes the problems with your postfix notation, and is explicit about bases. 0<base>_<number> like: 016_feff 02_10010010101 010_9329765872 08_767 The above syntax is: 1. unambiguous 2. readable from left-to-right Note that I think that the syntax that I just provided is ugly. I much prefer just using decimal and offering the proper base notation afterwards in a comment... val = 15 # 1111 in binary val = 35 # 0x23 in hex val = 17 # 021 in octal - Josiah |
| All times are GMT. The time now is 04:03 AM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.