Velocity Reviews > While we're talking about annoyances

# While we're talking about annoyances

Steven D'Aprano
Guest
Posts: n/a

 04-29-2007
Am I the only one who finds that I'm writing more documentation than code?

I recently needed to write a function to generate a rank table from a
list. That is, a list of ranks, where the rank of an item is the position
it would be in if the list were sorted:

alist = list('defabc')
ranks = [3, 4, 5, 0, 1, 2]

To do that, I needed to generate an index table first. In the book
"Numerical Recipes in Pascal" by William Press et al there is a procedure
to generate an index table (46 lines of code) and one for a rank table
(five lines).

In Python, my index function is four lines of code and my rank function is
five lines. I then wrote three more functions for verifying that my index
and rank tables were calculated correctly (17 more lines) and four more
lines to call doctest, giving a total of 30 lines of code.

I also have 93 lines of documentation, including doctests, or three
lines of documentation per line of code.

For those interested, here is how to generate an index table and rank
table in Python:

def index(sequence):
decorated = zip(sequence, xrange(len(sequence)))
decorated.sort()
return [idx for (value, idx) in decorated]

def rank(sequence):
table = [None] * len(sequence)
for j, idx in enumerate(index(sequence)):
table[idx] = j
return table

You can write your own damn documentation. *wink*

--
Steven.

GHUM
Guest
Posts: n/a

 04-29-2007
Steven,

> def index(sequence):
> decorated = zip(sequence, xrange(len(sequence)))
> decorated.sort()
> return [idx for (value, idx) in decorated]

would'nt that be equivalent code?

def index(sequence):
return [c for _,c in sorted((b,a) for a, b in
enumerate(sequence))]

tested, gave same results. But worsens your doc2code ratio

Harald Armin Massa
--

Michael Hoffman
Guest
Posts: n/a

 04-29-2007
GHUM wrote:
> Steven,
>
>> def index(sequence):
>> decorated = zip(sequence, xrange(len(sequence)))
>> decorated.sort()
>> return [idx for (value, idx) in decorated]

>
> would'nt that be equivalent code?
>
> def index(sequence):
> return [c for _,c in sorted((b,a) for a, b in
> enumerate(sequence))]

Or even these:

def index(sequence):
return sorted(range(len(sequence)), key=sequence.__getitem__)

def rank(sequence):
return sorted(range(len(sequence)),
key=index(sequence).__getitem__)

Hint: if you find yourself using a decorate-sort-undecorate pattern,
sorted(key=func) or sequence.sort(key=func) might be a better idea.
--
Michael Hoffman

=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=
Guest
Posts: n/a

 04-29-2007
On 4/29/07, Steven D'Aprano <(E-Mail Removed)> wrote:
> To do that, I needed to generate an index table first. In the book
> "Numerical Recipes in Pascal" by William Press et al there is a procedure
> to generate an index table (46 lines of code) and one for a rank table
> (five lines).

51 lines total.

> In Python, my index function is four lines of code and my rank function is
> five lines. I then wrote three more functions for verifying that my index
> and rank tables were calculated correctly (17 more lines) and four more
> lines to call doctest, giving a total of 30 lines of code.

So 9 lines for Python, excluding tests.

> I also have 93 lines of documentation, including doctests, or three
> lines of documentation per line of code.

Then, without documentation, Python is roughly 560% (51/9) as
efficient as Pascal. But with documentation (assuming you need the
same amount of documentation for the Python code as the Pascal code),
(51 + 93)/(9 + 93) = 1.41 so only 141% as efficient as Pascal.

I wonder what that means? Maybe Python the language is approaching the
upper bound for how efficient an imperative programming language can
be? On the other hand, there seem to be some progress that could be
made to reduce the amount of work in writing documentation.
Documentation in Esperanto instead of English maybe?

--
mvh Björn

Ben Finney
Guest
Posts: n/a

 04-29-2007
"BJÃ¶rn Lindqvist" <(E-Mail Removed)> writes:

> On the other hand, there seem to be some progress that could be made
> to reduce the amount of work in writing documentation.
> Documentation in Esperanto instead of English maybe?

Lojban <URL:http://www.lojban.org/> is both easier to learn world-wide
than Euro-biased Esperanto, and computer-parseable. Seems a better[0]_
choice for computer documentation to me.

... _[0] ignoring the fact that it's spoken by even fewer people than
Esperanto.

--
\ "The greater the artist, the greater the doubt; perfect |
`\ confidence is granted to the less talented as a consolation |
_o__) prize." -- Robert Hughes |
Ben Finney

Jarek Zgoda
Guest
Posts: n/a

 04-29-2007
Ben Finney napisa³(a):

>> On the other hand, there seem to be some progress that could be made
>> to reduce the amount of work in writing documentation.
>> Documentation in Esperanto instead of English maybe?

>
> Lojban <URL:http://www.lojban.org/> is both easier to learn world-wide
> than Euro-biased Esperanto, and computer-parseable. Seems a better[0]_
> choice for computer documentation to me.

German seems to be less "wordy" than English, despite the fact that most
of nouns is much longer.

--
Jarek Zgoda
http://jpa.berlios.de/

Arnaud Delobelle
Guest
Posts: n/a

 04-29-2007
On Apr 29, 11:46 am, Michael Hoffman <(E-Mail Removed)> wrote:
> GHUM wrote:
> > Steven,

>
> >> def index(sequence):
> >> decorated = zip(sequence, xrange(len(sequence)))
> >> decorated.sort()
> >> return [idx for (value, idx) in decorated]

>
> > would'nt that be equivalent code?

>
> > def index(sequence):
> > return [c for _,c in sorted((b,a) for a, b in
> > enumerate(sequence))]

>
> Or even these:
>
> def index(sequence):
> return sorted(range(len(sequence)), key=sequence.__getitem__)
>
> def rank(sequence):
> return sorted(range(len(sequence)),
> key=index(sequence).__getitem__)

Better still:

def rank(sequence):
return index(index(sequence))

But really these two versions of rank are slower than the original one
(as sorting a list is O(nlogn) whereas filling a table with
precomputed values is O(n) ).

Anyway I would like to contribute my own index function:

def index(seq):
return sum(sorted(map(list,enumerate(seq)), key=list.pop), [])

It's short and has the advantage of being self-documenting, which will
save Steven a lot of annoying typing I hope Who said Python
couldn't rival with perl?

--
Arnaud

Paul Rubin
Guest
Posts: n/a

 04-29-2007
Steven D'Aprano <(E-Mail Removed)> writes:
> I recently needed to write a function to generate a rank table from a
> list. That is, a list of ranks, where the rank of an item is the position
> it would be in if the list were sorted:
>
> alist = list('defabc')
> ranks = [3, 4, 5, 0, 1, 2]

fst = operator.itemgetter(0) # these should be builtins...
snd = operator.itemgetter(1)

ranks=map(fst, sorted(enumerate(alist), key=snd))

Arnaud Delobelle
Guest
Posts: n/a

 04-29-2007
On Apr 29, 5:33 pm, Paul Rubin <http://(E-Mail Removed)> wrote:
> Steven D'Aprano <(E-Mail Removed)> writes:
> > I recently needed to write a function to generate a rank table from a
> > list. That is, a list of ranks, where the rank of an item is the position
> > it would be in if the list were sorted:

>
> > alist = list('defabc')
> > ranks = [3, 4, 5, 0, 1, 2]

>
> fst = operator.itemgetter(0) # these should be builtins...
> snd = operator.itemgetter(1)
>
> ranks=map(fst, sorted(enumerate(alist), key=snd))

This is what the OP calls the index table, not the ranks table (both
are the same for the example above, but that's an unfortunate
coincidence...)

--
Arnaud

Raymond Hettinger
Guest
Posts: n/a

 04-29-2007
[Steven D'Aprano]
> I recently needed to write a function to generate a rank table from a
> list. That is, a list of ranks, where the rank of an item is the position
> it would be in if the list were sorted:
>
> alist = list('defabc')
> ranks = [3, 4, 5, 0, 1, 2]

.. . .
> def rank(sequence):
> table = [None] * len(sequence)
> for j, idx in enumerate(index(sequence)):
> table[idx] = j
> return table

FWIW, you can do ranking faster and more succinctly with the sorted()
builtin:

def rank(seq):
return sorted(range(len(seq)), key=seq.__getitem__)

Raymond