Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Clustering the keys of a dict according to its values

Reply
Thread Tools

Clustering the keys of a dict according to its values

 
 
bearophileHUGS@lycos.com
Guest
Posts: n/a
 
      11-14-2008
Florian Brucker:
> We may assume that all values in the
> original dict/list can be used as dict keys.


Are you sure? Is this for school?

Bye,
bearophile
 
Reply With Quote
 
 
 
 
alex23
Guest
Posts: n/a
 
      11-14-2008
Florian Brucker <(E-Mail Removed)> wrote:
> Given a dictionary, I want to create a clustered version of it,
> collecting keys that have the same value:
>
> *>>> d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
> *>>> cluster(d)
> {1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}
>
> That is, generate a new dict which holds for each value of the old dict
> a list of the keys of the old dict that have that very value.


>>> from collections import defaultdict
>>> def cluster(d):

.... if not hasattr(d, 'iteritems'):
.... d = dict((x,y) for x,y in enumerate(d))
.... cluster = defaultdict(list)
.... for key, value in d.iteritems():
.... cluster[value].append(key)
.... return cluster
....
>>> d1 = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
>>> d2 = ['a','a','b','c','b','d','d']
>>> cluster(d1)

defaultdict(<type 'list'>, {1: ['a', 'c', 'd'], 2: ['b', 'e'], 3:
['f']})
>>> cluster(d2)

defaultdict(<type 'list'>, {'a': [0, 1], 'c': [3], 'b': [2, 4], 'd':
[5, 6]})

defaultdicts act pretty much exactly like dicts, except for the
default-value-for-everything behaviour, but if you'd rather have pure
dicts, just coerce with a dict(cluster) at the end of the function.

My version converts lists to dicts so the same core behaviour will
work. At worst it'll step through a sequence twice, but if it's passed
a dict it'll only do it once, unlike yours' which steps through each
twice no matter what (in order to obtain the unique values you
require).

A side note: if you're using Python2.4+, you can replace your unique()
function with the built-in set().
 
Reply With Quote
 
 
 
 
Arnaud Delobelle
Guest
Posts: n/a
 
      11-14-2008
On Nov 14, 1:16 pm, Florian Brucker <(E-Mail Removed)> wrote:
> Hi everybody!
>
> Given a dictionary, I want to create a clustered version of it,
> collecting keys that have the same value:
>
> >>> d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
> >>> cluster(d)

> {1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}
>
> That is, generate a new dict which holds for each value of the old dict
> a list of the keys of the old dict that have that very value.
>
> Another requirement is that it should also work on lists, in that case
> with indices instead of keys. We may assume that all values in the
> original dict/list can be used as dict keys.
>
> Right now I'm doing it like this:
>
> def cluster(d):
> try:
> # is d a dict?
> values = unique(d.values())
> keys = d.keys()
> except AttributeError:
> # assume d is a list
> values = unique(d)
> keys = range(len(d))
>
> clusters = {}
> for value in values:
> clusters[value] = filter(lambda v: d[v] == value, keys)
>
> return clusters
>
> where unique() is from O'Reilly's Python Cookbook and returns a list
> containing each item of the given list exactly once.
>
> Now I'm pretty new to Python and chances are rather high that this could
> be done prettier and/or more efficient. Therefore any comment on the
> above code is greatly appreciated.
>
> Regards,
> Florian


Your implementation is pretty inefficient as it iterates over the
whole dictionary as many times as there are distinct values in it.
Here is a simpler solution.

from collections import defaultdict

def cluster(d):
clusters = defaultdict(list)
for key, val in d.iteritems():
clusters[val].append(key)
return clusters


Or without defaultdict:

def cluster(d):
clusters = {}
for key, val in d.iteritems():
clusters.setdefault(val, []).append(key)
return clusters


--
Arnaud
 
Reply With Quote
 
Florian Brucker
Guest
Posts: n/a
 
      11-14-2008
Hi everybody!

Given a dictionary, I want to create a clustered version of it,
collecting keys that have the same value:

>>> d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
>>> cluster(d)

{1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}

That is, generate a new dict which holds for each value of the old dict
a list of the keys of the old dict that have that very value.

Another requirement is that it should also work on lists, in that case
with indices instead of keys. We may assume that all values in the
original dict/list can be used as dict keys.

Right now I'm doing it like this:

def cluster(d):
try:
# is d a dict?
values = unique(d.values())
keys = d.keys()
except AttributeError:
# assume d is a list
values = unique(d)
keys = range(len(d))

clusters = {}
for value in values:
clusters[value] = filter(lambda v: d[v] == value, keys)

return clusters

where unique() is from O'Reilly's Python Cookbook and returns a list
containing each item of the given list exactly once.

Now I'm pretty new to Python and chances are rather high that this could
be done prettier and/or more efficient. Therefore any comment on the
above code is greatly appreciated.


Regards,
Florian
 
Reply With Quote
 
bearophileHUGS@lycos.com
Guest
Posts: n/a
 
      11-14-2008
Not much tested:

from collections import defaultdict

def cluster(pairs):
"""
>>> d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
>>> cluster(d) == {1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}

True
>>> p = [1, 2, 1, 1, 2, 3]
>>> cluster(p) == {1: [0, 2, 3], 2: [1, 4], 3: [5]}

True
"""
d = defaultdict(list)
if isinstance(pairs, list):
for k, v in enumerate(pairs):
d[v].append(k)
else:
for k, v in pairs.iteritems():
d[v].append(k)
return d

if __name__ == "__main__":
import doctest
doctest.testmod()
print "Doctests finished.\n"

Bye,
bearophile
 
Reply With Quote
 
bearophileHUGS@lycos.com
Guest
Posts: n/a
 
      11-14-2008
Alternative version:

def cluster(data):
d = defaultdict(list)
pairs = enumerate(data) if isinstance(data, list) else
data.iteritems()
for k, v in pairs:
d[v].append(k)
return d

Bye,
bearophile
 
Reply With Quote
 
alex23
Guest
Posts: n/a
 
      11-14-2008
On Nov 14, 11:24*pm, (E-Mail Removed) wrote:
> Alternative version:
>
> def cluster(data):
> * * d = defaultdict(list)
> * * pairs = enumerate(data) if isinstance(data, list) else
> data.iteritems()
> * * for k, v in pairs:
> * * * * d[v].append(k)
> * * return d
>
> Bye,
> bearophile


Very nice, +1.
 
Reply With Quote
 
Florian Brucker
Guest
Posts: n/a
 
      11-14-2008
> Are you sure? Is this for school?

Yes, I'm pretty sure (the values of the original dict are integers), and
no, this isn't for school If the "We may assume ..." made you think
so, I guess that's the way everybody formulates requirements after
having read to many math papers

If it is of interest to you: I'm trying to reproduce a certain sorted
version of the keys of a dict (sorted according to their value) where
the values are not unique, and thus the sorted version is not unique,
either. All I have left of the original sorting is some statistics, so I
can check for every sorted version if it's the same as the original. If
I have the clustering I'm talking about getting all the valid sorted
lists is a matter of concatenating permutations of the clusters.


Regards,
Florian
 
Reply With Quote
 
Bruno Desthuilliers
Guest
Posts: n/a
 
      11-14-2008
Florian Brucker a écrit :
> Hi everybody!
>
> Given a dictionary, I want to create a clustered version of it,
> collecting keys that have the same value:
>
> >>> d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
> >>> cluster(d)

> {1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}
>
> That is, generate a new dict which holds for each value of the old dict
> a list of the keys of the old dict that have that very value.
>
> Another requirement is that it should also work on lists, in that case
> with indices instead of keys. We may assume that all values in the
> original dict/list can be used as dict keys.


from collections import defaultdict

def cluster(source):
iteritems = getattr(
source, "iteritems",
lambda : enumerate(source)
)
index = defaultdict(list)
for k, v in iteritems():
index[k].append(v)
return index

HTH
 
Reply With Quote
 
Bruno Desthuilliers
Guest
Posts: n/a
 
      11-14-2008
http://www.velocityreviews.com/forums/(E-Mail Removed) a écrit :
> Alternative version:
>
> def cluster(data):
> d = defaultdict(list)
> pairs = enumerate(data) if isinstance(data, list) else
> data.iteritems()



What is data is another type of sequence or iterable ?-)



> for k, v in pairs:
> d[v].append(k)
> return d
>
> Bye,
> bearophile

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
dict: keys() and values() order guaranteed to be same? Henrik Faber Python 8 07-26-2012 09:06 PM
Learning Pyhton - Functional Programming - How intersect/differencetwo dict with dict/values? fast! macm Python 11 11-11-2010 10:04 AM
Re: dict.keys() and dict.values() are always the same order, is it? Cameron Simpson Python 6 04-21-2010 04:37 AM
dict.keys() and dict.values() are always the same order, is it? Menghan Zheng Python 1 04-20-2010 03:51 AM
Creating dict from keys and values Dirkjan Ochtman Python 1 05-29-2005 09:08 AM



Advertisments