Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > tallying occurrences in list

Reply
Thread Tools

tallying occurrences in list

 
 
kj
Guest
Posts: n/a
 
      06-04-2010





Task: given a list, produce a tally of all the distinct items in
the list (for some suitable notion of "distinct").

Example: if the list is ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b',
'c', 'a'], then the desired tally would look something like this:

[('a', 4), ('b', 3), ('c', 3)]

I find myself needing this simple operation so often that I wonder:

1. is there a standard name for it?
2. is there already a function to do it somewhere in the Python
standard library?

Granted, as long as the list consists only of items that can be
used as dictionary keys (and Python's equality test for hashkeys
agrees with the desired notion of "distinctness" for the tallying),
then the following does the job passably well:

def tally(c):
t = dict()
for x in c:
t[x] = t.get(x, 0) + 1
return sorted(t.items(), key=lambda x: (-x[1], x[0]))

But, of course, if a standard library solution exists it would be
preferable. Otherwise I either cut-and-paste the above every time
I need it, or I create a module just for it. (I don't like either
of these, though I suppose that the latter is much better than the
former.)

So anyway, I thought I'd ask.

~K
 
Reply With Quote
 
 
 
 
Paul Rubin
Guest
Posts: n/a
 
      06-04-2010
kj <(E-Mail Removed)> writes:
> 1. is there a standard name for it?


I don't know of one, or a stdlib for it, but it's pretty trivial.

> def tally(c):
> t = dict()
> for x in c:
> t[x] = t.get(x, 0) + 1
> return sorted(t.items(), key=lambda x: (-x[1], x[0]))


I like to use defaultdict and tuple unpacking for code like that:

from collections import defaultdict
def tally(c):
t = defaultdict(int)
for x in c:
t[x] += 1
return sorted(t.iteritems(), key=lambda (k,v): (-v, k))
 
Reply With Quote
 
 
 
 
Peter Otten
Guest
Posts: n/a
 
      06-04-2010
kj wrote:

>
>
>
>
>
> Task: given a list, produce a tally of all the distinct items in
> the list (for some suitable notion of "distinct").
>
> Example: if the list is ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b',
> 'c', 'a'], then the desired tally would look something like this:
>
> [('a', 4), ('b', 3), ('c', 3)]
>
> I find myself needing this simple operation so often that I wonder:
>
> 1. is there a standard name for it?
> 2. is there already a function to do it somewhere in the Python
> standard library?
>
> Granted, as long as the list consists only of items that can be
> used as dictionary keys (and Python's equality test for hashkeys
> agrees with the desired notion of "distinctness" for the tallying),
> then the following does the job passably well:
>
> def tally(c):
> t = dict()
> for x in c:
> t[x] = t.get(x, 0) + 1
> return sorted(t.items(), key=lambda x: (-x[1], x[0]))
>
> But, of course, if a standard library solution exists it would be
> preferable. Otherwise I either cut-and-paste the above every time
> I need it, or I create a module just for it. (I don't like either
> of these, though I suppose that the latter is much better than the
> former.)
>
> So anyway, I thought I'd ask.


Python 3.1 has, and 2.7 will have collections.Counter:

>>> from collections import Counter
>>> c = Counter("abcabcabca")
>>> c.most_common()

[('a', 4), ('c', 3), ('b', 3)]

Peter
 
Reply With Quote
 
Magdoll
Guest
Posts: n/a
 
      06-04-2010
On Jun 4, 11:28*am, Paul Rubin <(E-Mail Removed)> wrote:
> kj <(E-Mail Removed)> writes:
> > 1. is there a standard name for it?

>
> I don't know of one, or a stdlib for it, but it's pretty trivial.
>
> > def tally(c):
> > * * t = dict()
> > * * for x in c:
> > * * * * t[x] = t.get(x, 0) + 1
> > * * return sorted(t.items(), key=lambda x: (-x[1], x[0]))

>
> I like to use defaultdict and tuple unpacking for code like that:
>
> *from collections import defaultdict
> *def tally(c):
> * * *t = defaultdict(int)
> * * *for x in c:
> * * * * *t[x] += 1
> * * *return sorted(t.iteritems(), key=lambda (k,v): (-v, k))


I would also very much like to see this become part of the standard
library. Sure the code is easy to write but I use this incredibly
often and I've always wished I would have a one-line function call
that has the same output as the mysql query:

"SELECT id, count(*) FROM table GROUP BY somefield"

or maybe there is already a short solution to this that I'm not aware
of...
 
Reply With Quote
 
Magdoll
Guest
Posts: n/a
 
      06-04-2010
On Jun 4, 11:33*am, Peter Otten <(E-Mail Removed)> wrote:
> kj wrote:
>
> > Task: given a list, produce a tally of all the distinct items in
> > the list (for some suitable notion of "distinct").

>
> > Example: if the list is ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b',
> > 'c', 'a'], then the desired tally would look something like this:

>
> > [('a', 4), ('b', 3), ('c', 3)]

>
> > I find myself needing this simple operation so often that I wonder:

>
> > 1. is there a standard name for it?
> > 2. is there already a function to do it somewhere in the Python
> > * *standard library?

>
> > Granted, as long as the list consists only of items that can be
> > used as dictionary keys (and Python's equality test for hashkeys
> > agrees with the desired notion of "distinctness" for the tallying),
> > then the following does the job passably well:

>
> > def tally(c):
> > * * t = dict()
> > * * for x in c:
> > * * * * t[x] = t.get(x, 0) + 1
> > * * return sorted(t.items(), key=lambda x: (-x[1], x[0]))

>
> > But, of course, if a standard library solution exists it would be
> > preferable. *Otherwise I either cut-and-paste the above every time
> > I need it, or I create a module just for it. *(I don't like either
> > of these, though I suppose that the latter is much better than the
> > former.)

>
> > So anyway, I thought I'd ask.

>
> Python 3.1 has, and 2.7 will have collections.Counter:
>
> >>> from collections import Counter
> >>> c = Counter("abcabcabca")
> >>> c.most_common()

>
> [('a', 4), ('c', 3), ('b', 3)]
>
> Peter



Thanks Peter, I think you just answered my post
 
Reply With Quote
 
MRAB
Guest
Posts: n/a
 
      06-04-2010
kj wrote:
>
>
>
>
> Task: given a list, produce a tally of all the distinct items in
> the list (for some suitable notion of "distinct").
>
> Example: if the list is ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b',
> 'c', 'a'], then the desired tally would look something like this:
>
> [('a', 4), ('b', 3), ('c', 3)]
>
> I find myself needing this simple operation so often that I wonder:
>
> 1. is there a standard name for it?
> 2. is there already a function to do it somewhere in the Python
> standard library?
>
> Granted, as long as the list consists only of items that can be
> used as dictionary keys (and Python's equality test for hashkeys
> agrees with the desired notion of "distinctness" for the tallying),
> then the following does the job passably well:
>
> def tally(c):
> t = dict()
> for x in c:
> t[x] = t.get(x, 0) + 1
> return sorted(t.items(), key=lambda x: (-x[1], x[0]))
>
> But, of course, if a standard library solution exists it would be
> preferable. Otherwise I either cut-and-paste the above every time
> I need it, or I create a module just for it. (I don't like either
> of these, though I suppose that the latter is much better than the
> former.)
>
> So anyway, I thought I'd ask.
>

In Python 3 there's the 'Counter' class in the 'collections' module.
It'll also be in Python 2.7.

For earlier versions there's this:

http://code.activestate.com/recipes/576611/
 
Reply With Quote
 
Lie Ryan
Guest
Posts: n/a
 
      06-04-2010
On 06/05/10 04:38, Magdoll wrote:
> On Jun 4, 11:33 am, Peter Otten <(E-Mail Removed)> wrote:
>> kj wrote:
>>
>>> Task: given a list, produce a tally of all the distinct items in
>>> the list (for some suitable notion of "distinct").

>>
>>> Example: if the list is ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b',
>>> 'c', 'a'], then the desired tally would look something like this:

>>
>>> [('a', 4), ('b', 3), ('c', 3)]

>>
>>> I find myself needing this simple operation so often that I wonder:

>>
>>> 1. is there a standard name for it?
>>> 2. is there already a function to do it somewhere in the Python
>>> standard library?

>>
>>> Granted, as long as the list consists only of items that can be
>>> used as dictionary keys (and Python's equality test for hashkeys
>>> agrees with the desired notion of "distinctness" for the tallying),
>>> then the following does the job passably well:

>>
>>> def tally(c):
>>> t = dict()
>>> for x in c:
>>> t[x] = t.get(x, 0) + 1
>>> return sorted(t.items(), key=lambda x: (-x[1], x[0]))

>>
>>> But, of course, if a standard library solution exists it would be
>>> preferable. Otherwise I either cut-and-paste the above every time
>>> I need it, or I create a module just for it. (I don't like either
>>> of these, though I suppose that the latter is much better than the
>>> former.)

>>
>>> So anyway, I thought I'd ask.

>>
>> Python 3.1 has, and 2.7 will have collections.Counter:
>>
>>>>> from collections import Counter
>>>>> c = Counter("abcabcabca")
>>>>> c.most_common()

>>
>> [('a', 4), ('c', 3), ('b', 3)]
>>
>> Peter

>
>
> Thanks Peter, I think you just answered my post


If you're using previous versions (2.4 and onwards) then:

[(o, len(list(g))) for o, g in itertools.groupby(sorted(myList))]
 
Reply With Quote
 
kj
Guest
Posts: n/a
 
      06-04-2010


Thank you all!

~K
 
Reply With Quote
 
Sreenivas Reddy Thatiparthy
Guest
Posts: n/a
 
      06-05-2010
On Jun 4, 11:14*am, kj <(E-Mail Removed)> wrote:
> Task: given a list, produce a tally of all the distinct items in
> the list (for some suitable notion of "distinct").
>
> Example: if the list is ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b',
> 'c', 'a'], then the desired tally would look something like this:
>
> [('a', 4), ('b', 3), ('c', 3)]
>
> I find myself needing this simple operation so often that I wonder:
>
> 1. is there a standard name for it?
> 2. is there already a function to do it somewhere in the Python
> * *standard library?
>
> Granted, as long as the list consists only of items that can be
> used as dictionary keys (and Python's equality test for hashkeys
> agrees with the desired notion of "distinctness" for the tallying),
> then the following does the job passably well:
>
> def tally(c):
> * * t = dict()
> * * for x in c:
> * * * * t[x] = t.get(x, 0) + 1
> * * return sorted(t.items(), key=lambda x: (-x[1], x[0]))
>
> But, of course, if a standard library solution exists it would be
> preferable. *Otherwise I either cut-and-paste the above every time
> I need it, or I create a module just for it. *(I don't like either
> of these, though I suppose that the latter is much better than the
> former.)
>
> So anyway, I thought I'd ask.
>
> ~K


How about this one liner, if you prefer them;
set([(k,yourList.count(k)) for k in yourList])
 
Reply With Quote
 
Paul Rubin
Guest
Posts: n/a
 
      06-05-2010
Sreenivas Reddy Thatiparthy <(E-Mail Removed)> writes:
> How about this one liner, if you prefer them;
> set([(k,yourList.count(k)) for k in yourList])


That has a rather bad efficiency problem if the list is large.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
RE: Vote tallying... Nick Cash Python 0 01-18-2013 03:19 PM
Re: Vote tallying... Stefan Behnel Python 0 01-18-2013 08:47 AM
Re: Vote tallying... Lie Ryan Python 0 01-18-2013 08:39 AM
Vote tallying... Andrew Robinson Python 0 01-17-2013 11:59 PM
how to remove multiple occurrences of a string within a list? bahoo Python 37 04-09-2007 12:09 AM



Advertisments