Velocity Reviews > Pythonic way to count sequences

Pythonic way to count sequences

CM
Guest
Posts: n/a

 04-25-2013
I have to count the number of various two-digit sequences in a list
such as this:

mylist = [(2,4), (2,4), (3,4), (4,5), (2,1)] # (Here the (2,4)
sequence appears 2 times.)

and tally up the results, assigning each to a variable. The inelegant
first pass at this was something like...

# Create names and set them all to 0
alpha = 0
beta = 0
delta = 0
gamma = 0
# etc...

# loop over all the tuple sequences and increment appropriately
for sequence_tuple in list_of_tuples:
if sequence_tuple == (1,2):
alpha += 1
if sequence_tuple == (2,4):
beta += 1
if sequence_tuple == (2,5):
delta +=1
# etc... But I actually have more than 10 sequence types.

# Finally, I need a list created like this:
result_list = [alpha, beta, delta, gamma] #etc...in that order

I can sense there is very likely an elegant/Pythonic way to do this,
and probably with a dict, or possibly with some Python structure I
don't typically use. Suggestions sought. Thanks.

Chris Angelico
Guest
Posts: n/a

 04-25-2013
On Thu, Apr 25, 2013 at 3:05 PM, CM <(E-Mail Removed)> wrote:
> I have to count the number of various two-digit sequences in a list
> such as this:
>
> mylist = [(2,4), (2,4), (3,4), (4,5), (2,1)] # (Here the (2,4)
> sequence appears 2 times.)
>
> and tally up the results, assigning each to a variable.

You can use a tuple as a dictionary key, just like you would a string.
So you can count them up directly with a dictionary:

count = {}
for sequence_tuple in list_of_tuples:
count[sequence_tuple] = count.get(sequence_tuple,0) + 1

Also, since this is such a common thing to do, there's a standard
library way of doing it:

import collections
count = collections.Counter(list_of_tuples)

This doesn't depend on knowing ahead of time what your elements will
be. At the end of it, you can simply iterate over 'count' and get all

for sequence,number in count.items():
print("%d of %r" % (number,sequence))

ChrisA

Steven D'Aprano
Guest
Posts: n/a

 04-25-2013
On Wed, 24 Apr 2013 22:05:52 -0700, CM wrote:

> I have to count the number of various two-digit sequences in a list such
> as this:
>
> mylist = [(2,4), (2,4), (3,4), (4,5), (2,1)] # (Here the (2,4) sequence
> appears 2 times.)
>
> and tally up the results, assigning each to a variable. The inelegant
> first pass at this was something like...
>
> # Create names and set them all to 0
> alpha = 0
> beta = 0
> delta = 0
> gamma = 0
> # etc...

Do they absolutely have to be global variables like that? Seems like a
bad design, especially if you don't know in advance exactly how many
there are.

> # loop over all the tuple sequences and increment appropriately for
> sequence_tuple in list_of_tuples:
> if sequence_tuple == (1,2):
> alpha += 1
> if sequence_tuple == (2,4):
> beta += 1
> if sequence_tuple == (2,5):
> delta +=1
> # etc... But I actually have more than 10 sequence types.

counts = {}
for t in list_of_tuples:
counts[t] = counts.get(t, 0) + 1

Or, use collections.Counter:

from collections import Counter
counts = Counter(list_of_tuples)

> # Finally, I need a list created like this: result_list = [alpha, beta,
> delta, gamma] #etc...in that order

Dicts are unordered, so getting the results in a specific order will be a
bit tricky. You could do this:

results = sorted(counts.items(), key=lambda t: t[0])
results = [t[1] for t in results]

if you are lucky enough to have the desired order match the natural order
of the tuples. Otherwise:

desired_order = [(2, 3), (3, 1), (1, 2), ...]
results = [counts.get(t, 0) for t in desired_order]

--
Steven

Serhiy Storchaka
Guest
Posts: n/a

 04-25-2013
25.04.13 08:26, Chris Angelico написав(ла):
> So you can count them up directly with a dictionary:
>
> count = {}
> for sequence_tuple in list_of_tuples:
> count[sequence_tuple] = count.get(sequence_tuple,0) + 1

Or alternatives:

count = {}
for sequence_tuple in list_of_tuples:
if sequence_tuple] in count:
count[sequence_tuple] += 1
else:
count[sequence_tuple] = 1

count = {}
for sequence_tuple in list_of_tuples:
try:
count[sequence_tuple] += 1
except KeyError:
count[sequence_tuple] = 1

import collections
count = collections.defaultdict(int)
for sequence_tuple in list_of_tuples:
count[sequence_tuple] += 1

But of course collections.Counter is a preferable way now.

Denis McMahon
Guest
Posts: n/a

 04-25-2013
On Wed, 24 Apr 2013 22:05:52 -0700, CM wrote:

> I have to count the number of various two-digit sequences in a list such
> as this:
>
> mylist = [(2,4), (2,4), (3,4), (4,5), (2,1)] # (Here the (2,4) sequence
> appears 2 times.)
>
> and tally up the results, assigning each to a variable. The inelegant
> first pass at this was something like...
>
> # Create names and set them all to 0 alpha = 0 beta = 0 delta = 0 gamma
> = 0 # etc...
>
> # loop over all the tuple sequences and increment appropriately for
> sequence_tuple in list_of_tuples:
> if sequence_tuple == (1,2):
> alpha += 1
> if sequence_tuple == (2,4):
> beta += 1
> if sequence_tuple == (2,5):
> delta +=1
> # etc... But I actually have more than 10 sequence types.
>
> # Finally, I need a list created like this:
> result_list = [alpha, beta, delta, gamma] #etc...in that order
>
> I can sense there is very likely an elegant/Pythonic way to do this, and
> probably with a dict, or possibly with some Python structure I don't
> typically use. Suggestions sought. Thanks.

mylist = [ (3,3), (1,2), "fred", ("peter",1,7), 1, 19, 37, 28.312,
("monkey"), "fred", "fred", (1,2) ]

bits = {}

for thing in mylist:
if thing in bits:
bits[thing] += 1
else:
bits[thing] = 1

for thing in bits:
print thing, " occurs ", bits[thing], " times"

outputs:

(1, 2) occurs 2 times
1 occurs 1 times
('peter', 1, 7) occurs 1 times
(3, 3) occurs 1 times
28.312 occurs 1 times
fred occurs 3 times
19 occurs 1 times
monkey occurs 1 times
37 occurs 1 times

if you want to check that thing is a 2 int tuple then use something like:

for thing in mylist:
if isinstance( thing, tuple ) and len( thing ) == 2 and isinstance
( thing[0], ( int, long ) ) and isinstance( thing[1], ( int, long) ):
if thing in bits:
bits[thing] += 1
else:
bits[thing] = 1

--
Denis McMahon, http://www.velocityreviews.com/forums/(E-Mail Removed)

Modulok
Guest
Posts: n/a

 04-26-2013
On 4/25/13, Denis McMahon <(E-Mail Removed)> wrote:
> On Wed, 24 Apr 2013 22:05:52 -0700, CM wrote:
>
>> I have to count the number of various two-digit sequences in a list such
>> as this:
>>
>> mylist = [(2,4), (2,4), (3,4), (4,5), (2,1)] # (Here the (2,4) sequence
>> appears 2 times.)
>>
>> and tally up the results, assigning each to a variable.

....

Consider using the ``collections`` module::

from collections import Counter

mylist = [(2,4), (2,4), (3,4), (4,5), (2,1)]
count = Counter()
for k in mylist:
count[k] += 1

print(count)

# Output looks like this:
# Counter({(2, 4): 2, (4, 5): 1, (3, 4): 1, (2, 1): 1})

You then have access to methods to return the most common items, etc. See more
examples here:

http://docs.python.org/3.3/library/c...ctions.Counter

Good luck!
-Modulok-

CM
Guest
Posts: n/a

 04-26-2013
expanding.

Matthew Gilson
Guest
Posts: n/a

 04-26-2013

information. The below example can be simplified:

from collections import Counter
count = Counter(mylist)

With the other example, you could have achieved the same thing (and been
backward compatible to python2.5) with

from collections import defaultdict
count = defaultdict(int)
for k in mylist:
count[k] += 1

On 4/25/13 9:16 PM, Modulok wrote:
> On 4/25/13, Denis McMahon <(E-Mail Removed)> wrote:
>> On Wed, 24 Apr 2013 22:05:52 -0700, CM wrote:
>>
>>> I have to count the number of various two-digit sequences in a list such
>>> as this:
>>>
>>> mylist = [(2,4), (2,4), (3,4), (4,5), (2,1)] # (Here the (2,4) sequence
>>> appears 2 times.)
>>>
>>> and tally up the results, assigning each to a variable.

> ...
>
> Consider using the ``collections`` module::
>
>
> from collections import Counter
>
> mylist = [(2,4), (2,4), (3,4), (4,5), (2,1)]
> count = Counter()
> for k in mylist:
> count[k] += 1
>
> print(count)
>
> # Output looks like this:
> # Counter({(2, 4): 2, (4, 5): 1, (3, 4): 1, (2, 1): 1})
>
>
> You then have access to methods to return the most common items, etc. See more
> examples here:
>
> http://docs.python.org/3.3/library/c...ctions.Counter
>
>
> Good luck!
> -Modulok-