Velocity Reviews > Modal value of an array

# Modal value of an array

datta.abhirup@gmail.com
Guest
Posts: n/a

 03-29-2007
Hi

How can I find out the modal value in an array. That is the value
which occurs maximum time in the sequence ..

e.g. if my array has values like [2,3,2,2,2,4,2,2] definitely the
maximum time 2 occurs in the array. so this function should be able to
return 2 as a result ..

So is there any function in built in python which can do that ?

Thanks

Abhirup

Ben Finney
Guest
Posts: n/a

 03-29-2007
"(E-Mail Removed)" <(E-Mail Removed)> writes:

> Hi
>
> How can I find out the modal value in an array. That is the value
> which occurs maximum time in the sequence ..
>
> e.g. if my array has values like [2,3,2,2,2,4,2,2] definitely the
> maximum time 2 occurs in the array. so this function should be able to
> return 2 as a result ..

That's not the only case though. What do you expect to be returned for
an input of ["eggs", "beans", "beans", "eggs", "spam"] ?

Assuming you want *a* mode value, and any one will do (e.g. any of
"spam", "eggs" or "beans" is okay), I'd write it this way as a first
guess:

>>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
>>> counts = [(foo.count(val), val) for val in set(foo)]
>>> counts

[(2, 'eggs'), (1, 'beans'), (4, 'spam')]
>>> sorted(counts)[-1]

(4, 'spam')
>>> sorted(counts)[-1][1]

'spam'

>>> foo = ["eggs", "beans", "beans", "eggs", "spam"]
>>> counts = [(foo.count(val), val) for val in set(foo)]
>>> sorted(counts)[-1][1]

'eggs'

--
\ "Anger makes dull men witty, but it keeps them poor." -- |
`\ Elizabeth Tudor |
_o__) |
Ben Finney

Guest
Posts: n/a

 03-29-2007
On Mar 29, 4:40 am, "(E-Mail Removed)"
<(E-Mail Removed)> wrote:
> Hi
>
> How can I find out the modal value in an array. That is the value
> which occurs maximum time in the sequence ..
>
> e.g. if my array has values like [2,3,2,2,2,4,2,2] definitely the
> maximum time 2 occurs in the array. so this function should be able to
> return 2 as a result ..
>
> So is there any function in built in python which can do that ?
>
> Thanks
>
> Abhirup

With the same assumptions as Ben Finney, I came up with this:

>>> import operator
>>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
>>> count = {}
>>> for item in foo: count[item] = count.get(item, 0) +1

....
>>> maxitem = max(count.items(), key= operator.itemgetter(1))
>>> maxitem

('spam', 4)
>>>

I was trying to minimise the iterations through the list.

Steven D'Aprano
Guest
Posts: n/a

 03-29-2007
On Wed, 28 Mar 2007 20:40:22 -0700, http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:

> Hi
>
> How can I find out the modal value in an array. That is the value
> which occurs maximum time in the sequence ..
>
> e.g. if my array has values like [2,3,2,2,2,4,2,2] definitely the
> maximum time 2 occurs in the array. so this function should be able to
> return 2 as a result ..
>
> So is there any function in built in python which can do that ?

No. You need to create a frequency table, then do a reverse-lookup on the
frequency table. Assuming your data is small, this should be plenty fast
enough.

def mode(data):
# create a frequency table
freq = {}
for x in data:
freq[x] = freq.get(x, 0) + 1
# find the maximum frequency
F = max(freq.values())
# return the items (one or more) with that frequency
modes = []
for x, f in freq.items():
if f == F:
modes.append(x)
return modes

>>> mode([2,3,2,2,2,4,2,2])

[2]
>>> mode([2,3,2,3,2,3,4,1])

[2, 3]

--
Steven.

Alex Martelli
Guest
Posts: n/a

 03-29-2007
Ben Finney <(E-Mail Removed)> wrote:
...
> That's not the only case though. What do you expect to be returned for
> an input of ["eggs", "beans", "beans", "eggs", "spam"] ?
>
> Assuming you want *a* mode value, and any one will do (e.g. any of
> "spam", "eggs" or "beans" is okay), I'd write it this way as a first
> guess:
>
> >>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
> >>> counts = [(foo.count(val), val) for val in set(foo)]
> >>> counts

> [(2, 'eggs'), (1, 'beans'), (4, 'spam')]
> >>> sorted(counts)[-1]

> (4, 'spam')
> >>> sorted(counts)[-1][1]

> 'spam'

A bit more directly:

>>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
>>> max(foo, key=foo.count)

'spam'

Alex

bearophileHUGS@lycos.com
Guest
Posts: n/a

 03-29-2007
Alex Martelli:
> >>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
> >>> max(foo, key=foo.count)

It's a very nice solution, the shortest too. But I think it's better
to develop your own well tested and efficient stats module (and there
is one already done that can be found around) and import it when you
need functions, instead of using similar onliners or re-writing code.
As you know your solution becomes rather slow if the list is quite
long, and it works with lists only.
This uses more memory but it's probably much faster for longer
interables:

from collections import defaultdict

def mode(seq):
freqs = defaultdict(int)
for el in seq:
freqs[el] += 1
return max(freqs.itervalues())

Generally you may want to know what's the mode element(s) too:

def mode2(seq):
freqs = defaultdict(int)
for el in seq:
freqs[el] += 1
maxfreq = max(freqs.itervalues())
mode_els = [el for el,f in freqs.iteritems() if f == maxfreq]
return maxfreq, mode_els

foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
print mode(foo)
print mode2(foo)

Bye,
bearophile

Guest
Posts: n/a

 03-29-2007
On Mar 29, 8:49 am, (E-Mail Removed) (Alex Martelli) wrote:
> Ben Finney <(E-Mail Removed)> wrote:
>
> ...
>
> > That's not the only case though. What do you expect to be returned for
> > an input of ["eggs", "beans", "beans", "eggs", "spam"] ?

>
> > Assuming you want *a* mode value, and any one will do (e.g. any of
> > "spam", "eggs" or "beans" is okay), I'd write it this way as a first
> > guess:

>
> > >>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
> > >>> counts = [(foo.count(val), val) for val in set(foo)]
> > >>> counts

> > [(2, 'eggs'), (1, 'beans'), (4, 'spam')]
> > >>> sorted(counts)[-1]

> > (4, 'spam')
> > >>> sorted(counts)[-1][1]

> > 'spam'

>
> A bit more directly:
>
> >>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
> >>> max(foo, key=foo.count)

>
> 'spam'
>
> Alex

This doesn't call foo.count for duplicate entries by keeping a cache

>>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
>>> def cachecount(x, cache={}):

.... return cache.setdefault(x, foo.count(x))
....
>>> max(foo, key=cachecount)

'spam'
>>> cachecount.func_defaults

({'eggs': 2, 'beans': 1, 'spam': 4},)
>>>

Alex Martelli
Guest
Posts: n/a

 03-30-2007
...
> > A bit more directly:
> >
> > >>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
> > >>> max(foo, key=foo.count)

> >
> > 'spam'
> >
> > Alex

>
> This doesn't call foo.count for duplicate entries by keeping a cache
>
> >>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
> >>> def cachecount(x, cache={}):

> ... return cache.setdefault(x, foo.count(x))
> ...
> >>> max(foo, key=cachecount)

> 'spam'
> >>> cachecount.func_defaults

> ({'eggs': 2, 'beans': 1, 'spam': 4},)
> >>>

If you're willing to do that much extra coding to save some work (while
still being O(N squared)), then the further small extra needed to be
O(N) starts looking good:

counts = collections.defaultdict(int)
for item in foo: counts[item] += 1
max(foo, key=counts.get)

Alex

Guest
Posts: n/a

 03-30-2007
On Mar 30, 2:58 am, (E-Mail Removed) (Alex Martelli) wrote:
>
> ...
>
>
>
> > > A bit more directly:

>
> > > >>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
> > > >>> max(foo, key=foo.count)

>
> > > 'spam'

>
> > > Alex

>
> > This doesn't call foo.count for duplicate entries by keeping a cache

>
> > >>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
> > >>> def cachecount(x, cache={}):

> > ... return cache.setdefault(x, foo.count(x))
> > ...
> > >>> max(foo, key=cachecount)

> > 'spam'
> > >>> cachecount.func_defaults

> > ({'eggs': 2, 'beans': 1, 'spam': 4},)

>
> If you're willing to do that much extra coding to save some work (while
> still being O(N squared)), then the further small extra needed to be
> O(N) starts looking good:
>
> counts = collections.defaultdict(int)
> for item in foo: counts[item] += 1
> max(foo, key=counts.get)
>
> Alex

Yeh, My first answer is like that but I had to play around with your
original to try and 'fix' the idea in my head - it might be useful
someday.

Gabriel Genellina
Guest
Posts: n/a

 03-30-2007
En Thu, 29 Mar 2007 19:44:56 -0300, Paddy <(E-Mail Removed)>
escribió:

> On Mar 29, 8:49 am, (E-Mail Removed) (Alex Martelli) wrote:

>> >>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
>> >>> max(foo, key=foo.count)

>>
>> 'spam'

>
> This doesn't call foo.count for duplicate entries by keeping a cache
>
>>>> foo = ["spam", "eggs", "spam", "spam", "spam", "beans", "eggs"]
>>>> def cachecount(x, cache={}):

> ... return cache.setdefault(x, foo.count(x))

Unfortunately it does, because all arguments are evaluated *before* a
function call, so you gain nothing.

--
Gabriel Genellina