Velocity Reviews > tallying occurrences in list

# tallying occurrences in list

kj
Guest
Posts: n/a

 06-04-2010

Task: given a list, produce a tally of all the distinct items in
the list (for some suitable notion of "distinct").

Example: if the list is ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b',
'c', 'a'], then the desired tally would look something like this:

[('a', 4), ('b', 3), ('c', 3)]

I find myself needing this simple operation so often that I wonder:

1. is there a standard name for it?
2. is there already a function to do it somewhere in the Python
standard library?

Granted, as long as the list consists only of items that can be
used as dictionary keys (and Python's equality test for hashkeys
agrees with the desired notion of "distinctness" for the tallying),
then the following does the job passably well:

def tally(c):
t = dict()
for x in c:
t[x] = t.get(x, 0) + 1
return sorted(t.items(), key=lambda x: (-x[1], x[0]))

But, of course, if a standard library solution exists it would be
preferable. Otherwise I either cut-and-paste the above every time
I need it, or I create a module just for it. (I don't like either
of these, though I suppose that the latter is much better than the
former.)

So anyway, I thought I'd ask.

~K

Paul Rubin
Guest
Posts: n/a

 06-04-2010
kj <(E-Mail Removed)> writes:
> 1. is there a standard name for it?

I don't know of one, or a stdlib for it, but it's pretty trivial.

> def tally(c):
> t = dict()
> for x in c:
> t[x] = t.get(x, 0) + 1
> return sorted(t.items(), key=lambda x: (-x[1], x[0]))

I like to use defaultdict and tuple unpacking for code like that:

from collections import defaultdict
def tally(c):
t = defaultdict(int)
for x in c:
t[x] += 1
return sorted(t.iteritems(), key=lambda (k,v): (-v, k))

Peter Otten
Guest
Posts: n/a

 06-04-2010
kj wrote:

>
>
>
>
>
> Task: given a list, produce a tally of all the distinct items in
> the list (for some suitable notion of "distinct").
>
> Example: if the list is ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b',
> 'c', 'a'], then the desired tally would look something like this:
>
> [('a', 4), ('b', 3), ('c', 3)]
>
> I find myself needing this simple operation so often that I wonder:
>
> 1. is there a standard name for it?
> 2. is there already a function to do it somewhere in the Python
> standard library?
>
> Granted, as long as the list consists only of items that can be
> used as dictionary keys (and Python's equality test for hashkeys
> agrees with the desired notion of "distinctness" for the tallying),
> then the following does the job passably well:
>
> def tally(c):
> t = dict()
> for x in c:
> t[x] = t.get(x, 0) + 1
> return sorted(t.items(), key=lambda x: (-x[1], x[0]))
>
> But, of course, if a standard library solution exists it would be
> preferable. Otherwise I either cut-and-paste the above every time
> I need it, or I create a module just for it. (I don't like either
> of these, though I suppose that the latter is much better than the
> former.)
>
> So anyway, I thought I'd ask.

Python 3.1 has, and 2.7 will have collections.Counter:

>>> from collections import Counter
>>> c = Counter("abcabcabca")
>>> c.most_common()

[('a', 4), ('c', 3), ('b', 3)]

Peter

Magdoll
Guest
Posts: n/a

 06-04-2010
On Jun 4, 11:28*am, Paul Rubin <(E-Mail Removed)> wrote:
> kj <(E-Mail Removed)> writes:
> > 1. is there a standard name for it?

>
> I don't know of one, or a stdlib for it, but it's pretty trivial.
>
> > def tally(c):
> > * * t = dict()
> > * * for x in c:
> > * * * * t[x] = t.get(x, 0) + 1
> > * * return sorted(t.items(), key=lambda x: (-x[1], x[0]))

>
> I like to use defaultdict and tuple unpacking for code like that:
>
> *from collections import defaultdict
> *def tally(c):
> * * *t = defaultdict(int)
> * * *for x in c:
> * * * * *t[x] += 1
> * * *return sorted(t.iteritems(), key=lambda (k,v): (-v, k))

I would also very much like to see this become part of the standard
library. Sure the code is easy to write but I use this incredibly
often and I've always wished I would have a one-line function call
that has the same output as the mysql query:

"SELECT id, count(*) FROM table GROUP BY somefield"

or maybe there is already a short solution to this that I'm not aware
of...

Magdoll
Guest
Posts: n/a

 06-04-2010
On Jun 4, 11:33*am, Peter Otten <(E-Mail Removed)> wrote:
> kj wrote:
>
> > Task: given a list, produce a tally of all the distinct items in
> > the list (for some suitable notion of "distinct").

>
> > Example: if the list is ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b',
> > 'c', 'a'], then the desired tally would look something like this:

>
> > [('a', 4), ('b', 3), ('c', 3)]

>
> > I find myself needing this simple operation so often that I wonder:

>
> > 1. is there a standard name for it?
> > 2. is there already a function to do it somewhere in the Python
> > * *standard library?

>
> > Granted, as long as the list consists only of items that can be
> > used as dictionary keys (and Python's equality test for hashkeys
> > agrees with the desired notion of "distinctness" for the tallying),
> > then the following does the job passably well:

>
> > def tally(c):
> > * * t = dict()
> > * * for x in c:
> > * * * * t[x] = t.get(x, 0) + 1
> > * * return sorted(t.items(), key=lambda x: (-x[1], x[0]))

>
> > But, of course, if a standard library solution exists it would be
> > preferable. *Otherwise I either cut-and-paste the above every time
> > I need it, or I create a module just for it. *(I don't like either
> > of these, though I suppose that the latter is much better than the
> > former.)

>
> > So anyway, I thought I'd ask.

>
> Python 3.1 has, and 2.7 will have collections.Counter:
>
> >>> from collections import Counter
> >>> c = Counter("abcabcabca")
> >>> c.most_common()

>
> [('a', 4), ('c', 3), ('b', 3)]
>
> Peter

Thanks Peter, I think you just answered my post

MRAB
Guest
Posts: n/a

 06-04-2010
kj wrote:
>
>
>
>
> Task: given a list, produce a tally of all the distinct items in
> the list (for some suitable notion of "distinct").
>
> Example: if the list is ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b',
> 'c', 'a'], then the desired tally would look something like this:
>
> [('a', 4), ('b', 3), ('c', 3)]
>
> I find myself needing this simple operation so often that I wonder:
>
> 1. is there a standard name for it?
> 2. is there already a function to do it somewhere in the Python
> standard library?
>
> Granted, as long as the list consists only of items that can be
> used as dictionary keys (and Python's equality test for hashkeys
> agrees with the desired notion of "distinctness" for the tallying),
> then the following does the job passably well:
>
> def tally(c):
> t = dict()
> for x in c:
> t[x] = t.get(x, 0) + 1
> return sorted(t.items(), key=lambda x: (-x[1], x[0]))
>
> But, of course, if a standard library solution exists it would be
> preferable. Otherwise I either cut-and-paste the above every time
> I need it, or I create a module just for it. (I don't like either
> of these, though I suppose that the latter is much better than the
> former.)
>
> So anyway, I thought I'd ask.
>

In Python 3 there's the 'Counter' class in the 'collections' module.
It'll also be in Python 2.7.

For earlier versions there's this:

http://code.activestate.com/recipes/576611/

Lie Ryan
Guest
Posts: n/a

 06-04-2010
On 06/05/10 04:38, Magdoll wrote:
> On Jun 4, 11:33 am, Peter Otten <(E-Mail Removed)> wrote:
>> kj wrote:
>>
>>> Task: given a list, produce a tally of all the distinct items in
>>> the list (for some suitable notion of "distinct").

>>
>>> Example: if the list is ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b',
>>> 'c', 'a'], then the desired tally would look something like this:

>>
>>> [('a', 4), ('b', 3), ('c', 3)]

>>
>>> I find myself needing this simple operation so often that I wonder:

>>
>>> 1. is there a standard name for it?
>>> 2. is there already a function to do it somewhere in the Python
>>> standard library?

>>
>>> Granted, as long as the list consists only of items that can be
>>> used as dictionary keys (and Python's equality test for hashkeys
>>> agrees with the desired notion of "distinctness" for the tallying),
>>> then the following does the job passably well:

>>
>>> def tally(c):
>>> t = dict()
>>> for x in c:
>>> t[x] = t.get(x, 0) + 1
>>> return sorted(t.items(), key=lambda x: (-x[1], x[0]))

>>
>>> But, of course, if a standard library solution exists it would be
>>> preferable. Otherwise I either cut-and-paste the above every time
>>> I need it, or I create a module just for it. (I don't like either
>>> of these, though I suppose that the latter is much better than the
>>> former.)

>>
>>> So anyway, I thought I'd ask.

>>
>> Python 3.1 has, and 2.7 will have collections.Counter:
>>
>>>>> from collections import Counter
>>>>> c = Counter("abcabcabca")
>>>>> c.most_common()

>>
>> [('a', 4), ('c', 3), ('b', 3)]
>>
>> Peter

>
>
> Thanks Peter, I think you just answered my post

If you're using previous versions (2.4 and onwards) then:

[(o, len(list(g))) for o, g in itertools.groupby(sorted(myList))]

kj
Guest
Posts: n/a

 06-04-2010

Thank you all!

~K

Sreenivas Reddy Thatiparthy
Guest
Posts: n/a

 06-05-2010
On Jun 4, 11:14*am, kj <(E-Mail Removed)> wrote:
> Task: given a list, produce a tally of all the distinct items in
> the list (for some suitable notion of "distinct").
>
> Example: if the list is ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b',
> 'c', 'a'], then the desired tally would look something like this:
>
> [('a', 4), ('b', 3), ('c', 3)]
>
> I find myself needing this simple operation so often that I wonder:
>
> 1. is there a standard name for it?
> 2. is there already a function to do it somewhere in the Python
> * *standard library?
>
> Granted, as long as the list consists only of items that can be
> used as dictionary keys (and Python's equality test for hashkeys
> agrees with the desired notion of "distinctness" for the tallying),
> then the following does the job passably well:
>
> def tally(c):
> * * t = dict()
> * * for x in c:
> * * * * t[x] = t.get(x, 0) + 1
> * * return sorted(t.items(), key=lambda x: (-x[1], x[0]))
>
> But, of course, if a standard library solution exists it would be
> preferable. *Otherwise I either cut-and-paste the above every time
> I need it, or I create a module just for it. *(I don't like either
> of these, though I suppose that the latter is much better than the
> former.)
>
> So anyway, I thought I'd ask.
>
> ~K

set([(k,yourList.count(k)) for k in yourList])

Paul Rubin
Guest
Posts: n/a

 06-05-2010
Sreenivas Reddy Thatiparthy <(E-Mail Removed)> writes:
> set([(k,yourList.count(k)) for k in yourList])

That has a rather bad efficiency problem if the list is large.