Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > remove duplicates of array of object based on a attribute

Reply
Thread Tools

remove duplicates of array of object based on a attribute

 
 
senthil
Guest
Posts: n/a
 
      03-06-2007
hi all,
how to remove duplicates of an array of objects based a
attribute of the object. For ex
i am having an array of ruby beans named diagnoses . i want
remove duplicates from the based on the diagnoses id. assume diagnoses
have attributes id and weightage .So for two diagnoses with same id and
different weightage , the diagnoses with lower weightage should be
removed.
Can anyone help me??

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
 
Phrogz
Guest
Posts: n/a
 
      03-06-2007
On Mar 6, 7:03 am, senthil <(E-Mail Removed)> wrote:
> hi all,
> how to remove duplicates of an array of objects based a
> attribute of the object. For ex
> i am having an array of ruby beans named diagnoses . i want
> remove duplicates from the based on the diagnoses id. assume diagnoses
> have attributes id and weightage .So for two diagnoses with same id and
> different weightage , the diagnoses with lower weightage should be
> removed.


Here's my best shot at it:

require 'set'
class Array
def uniq_by
seen = Set.new
select{ |x| seen.add?( yield( x ) ) }
end
end

a = [ {:a=>1, :d=>1}, {:b=>2}, {:c=>3}, {:a=>1, :d=>3} ]
p a, a.uniq, a.uniq_by{ |h| h[:a] }
#=> [{:a=>1, :d=>1}, {:b=>2}, {:c=>3}, {:a=>1, :d=>3}]
#=> [{:a=>1, :d=>1}, {:b=>2}, {:c=>3}, {:a=>1, :d=>3}]
#=> [{:a=>1, :d=>1}, {:b=>2}]

(Note how :b=>2 and :c=>3 have the same value for :a (nil), so only
one is included.)

Here's another (assumedly slower) version that doesn't rely on Set:

class Array
def uniq_by
seen = {}
select{ |x|
v = yield(x)
!seen[v] && (seen[v]=true)
}
end
end

 
Reply With Quote
 
 
 
 
brabuhr@gmail.com
Guest
Posts: n/a
 
      03-06-2007
On 3/6/07, senthil <(E-Mail Removed)> wrote:
> i am having an array of ruby beans named diagnoses . i want
> remove duplicates from the based on the diagnoses id. assume diagnoses
> have attributes id and weightage .So for two diagnoses with same id and
> different weightage , the diagnoses with lower weightage should be
> removed.
> Can anyone help me??


From: http://blade.nagaokaut.ac.jp/cgi-bin...by-talk/228538

module Enumerable
def group_by &b
h = Hash.new{|h,k| h[k] = []}
each{|x| h[x.instance_eval(&b)] << x}
h.values
end
end

old_diagnoses = [
{:id => 1, :w => 30},
{:id => 2, :w => 20},
{:id => 3, :w => 10},
{:id => 1, :w => 10},
{:id => 1, :w => 40},
{:id => 2, :w => 50},
{:id => 4, :w => 60},
{:id => 4, :w => 30},
{:id => 2, :w => 20},
{:id => 3, :w => 10}
]
new_diagnoses = []

groups = old_diagnoses.group_by{ |d| d[:id] }

groups.each do |group|
new_diagnoses << group.sort_by{ |g| g[:w] }.last
end

p old_diagnoses
p new_diagnoses

[{:w=>30, :id=>1}, {:w=>20, :id=>2}, {:w=>10, :id=>3}, {:w=>10, :id=>1},
{:w=>40, :id=>1}, {:w=>50, :id=>2}, {:w=>60, :id=>4}, {:w=>30, :id=>4},
{:w=>20, :id=>2}, {:w=>10, :id=>3}]

[{:w=>40, :id=>1}, {:w=>50, :id=>2}, {:w=>10, :id=>3}, {:w=>60, :id=>4}]

 
Reply With Quote
 
Phrogz
Guest
Posts: n/a
 
      03-06-2007
On Mar 6, 7:27 am, "Phrogz" <(E-Mail Removed)> wrote:
> Here's another (assumedly slower) version that doesn't rely on Set:


Huh...actually, the hash-based one seems faster than the Set-based
one:

require 'set'
class Array
def uniq_by1
seen = Set.new
select{ |x| seen.add?( yield( x ) ) }
end
def uniq_by2
seen = {}
select{ |x| !seen[v=yield(x)] && (seen[v]=true) }
end
end

require 'benchmark'
a = [ {:a=>1, :d=>1}, {:b=>2}, {:c=>3}, {:a=>1, :d=>3},
{:a=>2, :e=>7}, {:a=>3, :b=>2}, {:a=>1}, {:a=>4}, {:f=>6} ]
N = 10_000
Benchmark.bmbm{ |x|
x.report( 'with_set' ){
N.times{
a.uniq_by1{ |h| h[:a] }
a.uniq_by1{ |h| h[:b] }
}
}
x.report( 'with_hash' ){
N.times{
a.uniq_by2{ |h| h[:a] }
a.uniq_by2{ |h| h[:b] }
}
}
}

#=> Rehearsal ---------------------------------------------
#=> with_set 1.840000 0.030000 1.870000 ( 2.40123
#=> with_hash 1.270000 0.030000 1.300000 ( 1.701307)
#=> ------------------------------------ total: 3.170000sec
#=>
#=> user system total real
#=> with_set 1.820000 0.020000 1.840000 ( 2.187477)
#=> with_hash 1.250000 0.020000 1.270000 ( 1.555490)

(Yes, my laptop is rather old and slow.)

 
Reply With Quote
 
Pit Capitain
Guest
Posts: n/a
 
      03-06-2007
senthil, please don't take this personally, your question is OK, but the
following sounds so very wrong:

> i am having an array of ruby beans (...)


All we have in Ruby are objects. No beans, POROs, ERBs, and all this cruft.

Regards,
Pit

 
Reply With Quote
 
Erik Veenstra
Guest
Posts: n/a
 
      03-06-2007
And here's the inevitable one-liner... :}

(But I do prefer the group_by version...)

gegroet,
Erik V. - http://www.erikveen.dds.nl/

----------------------------------------------------------------

################################################## ##############

arr = [
{:id => 1, :w => 30},
{:id => 2, :w => 20},
{:id => 3, :w => 10},
{:id => 1, :w => 10},
{:id => 1, :w => 40},
{:id => 2, :w => 50},
{:id => 4, :w => 60},
{:id => 4, :w => 30},
{:id => 2, :w => 20},
{:id => 3, :w => 10}
]

################################################## ##############

res1=arr.inject({}){|h,o|(h[o[:id]]||=[])<<o;h}.values.map{|a|
a.sort_by{|o|o[:w]}.pop}

################################################## ##############

res2 =
arr.inject({}) do |h,o|
(h[o[:id]] ||= []) << o ; h
end.values.collect do |a|
a.sort_by do |o|
o[:w]
end.pop
end

################################################## ##############

module Enumerable
def hash_by(&block)
inject({}){|h, o| (h[block.call(o)] ||= []) << o ; h}
end

def group_by(&block)
hash_by(&block).sort.transpose.pop
end
end

res3 =
arr.group_by do |o|
o[:id]
end.collect do |a|
a.sort_by do |o|
o[:w]
end.pop
end

################################################## ##############

p res1
p res2
p res3

################################################## ##############

----------------------------------------------------------------


 
Reply With Quote
 
Phrogz
Guest
Posts: n/a
 
      03-06-2007
Erik Veenstra wrote:
> And here's the inevitable one-liner... :}


Not that we're golfing, but I like this one better in terms of one-
linedness:
Hash[ *map{ |o| [ o[:id], o ] }.flatten ].values

 
Reply With Quote
 
Phrogz
Guest
Posts: n/a
 
      03-06-2007
On Mar 6, 1:47 pm, "Phrogz" <(E-Mail Removed)> wrote:
> Erik Veenstra wrote:
> > And here's the inevitable one-liner... :}

>
> Not that we're golfing, but I like this one better in terms of one-
> linedness:
> Hash[ *map{ |o| [ o[:id], o ] }.flatten ].values


Oops, I meant:
Hash[ *a.map{ |o| [ o[:id], o ] }.flatten ].values

 
Reply With Quote
 
Phrogz
Guest
Posts: n/a
 
      03-06-2007
On Mar 6, 7:40 am, "Phrogz" <(E-Mail Removed)> wrote:
> On Mar 6, 7:27 am, "Phrogz" <(E-Mail Removed)> wrote:
>
> > Here's another (assumedly slower) version that doesn't rely on Set:

>
> Huh...actually, the hash-based one seems faster than the Set-based
> one:


And faster still, by a hair, is a last-in approach. Upon reflection,
all these techniques rely only on methods already in Enumerable, so
they can be put there instead of being Array-specific.

module Enumerable
require 'set'
def uniq_by1
seen = Set.new
select{ |x| seen.add?( yield( x ) ) }
end
def uniq_by2
seen = {}
select{ |x| !seen[v=yield(x)] && (seen[v]=true) }
end
def uniq_by3
Hash[ *map{ |x| [ yield(x), x ] }.flatten ].values
end

def uniq_by4
# fastest, preserves last-seen value for a key
h = {}
each{ |x| h[yield(x)] = x }
h.values
end

def uniq_by5
# near-fastest, preserves first-seen value for a key
h = {}
each{ |x| v=yield(x); h[v]=x unless h.include?(v) }
h.values
end
end

a = [ {:a=>1, :d=>1}, {:b=>2}, {:c=>3}, {:a=>1, :d=>3},
{:a=>2, :e=>7}, {:a=>3, :b=>2}, {:a=>1}, {:a=>4}, {:f=>6} ]

require 'benchmark'
N = 20_000
Benchmark.bmbm{ |x|
x.report( 'with set' ){
N.times{
a.uniq_by1{ |h| h[:a] }
a.uniq_by1{ |h| h[:b] }
}
}
x.report( 'with hash' ){
N.times{
a.uniq_by2{ |h| h[:a] }
a.uniq_by2{ |h| h[:b] }
}
}
x.report( 'Hash.[].values' ){
N.times{
a.uniq_by3{ |h| h[:a] }
a.uniq_by3{ |h| h[:b] }
}
}
x.report( '#values (last in)' ){
N.times{
a.uniq_by4{ |h| h[:a] }
a.uniq_by4{ |h| h[:b] }
}
}
x.report( '#values (first in)' ){
N.times{
a.uniq_by5{ |h| h[:a] }
a.uniq_by5{ |h| h[:b] }
}
}
}

#=> Rehearsal ------------------------------------------------------
#=> with set 2.500000 0.016000 2.516000 ( 2.547000)
#=> with hash 1.312000 0.000000 1.312000 ( 1.313000)
#=> Hash.[].values 2.453000 0.000000 2.453000 ( 2.453000)
#=> #values (last in) 1.110000 0.000000 1.110000 ( 1.109000)
#=> #values (first in) 1.296000 0.000000 1.296000 ( 1.297000)
#=> --------------------------------------------- total: 8.687000sec
#=>
#=> user system total real
#=> with set 2.000000 0.000000 2.000000 ( 1.999000)
#=> with hash 1.297000 0.000000 1.297000 ( 1.297000)
#=> Hash.[].values 2.531000 0.000000 2.531000 ( 2.532000)
#=> #values (last in) 1.125000 0.015000 1.140000 ( 1.140000)
#=> #values (first in) 1.344000 0.000000 1.344000 ( 1.344000)

 
Reply With Quote
 
Erik Veenstra
Guest
Posts: n/a
 
      03-06-2007
> Hash[ *a.map{ |o| [ o[:id], o ] }.flatten ].values

Not bad...

How does this ensure that the maximum :w is used?

gegroet,
Erik V. - http://www.erikveen.dds.nl/


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Remove inherited web.config values that are duplicates Jesper Mortensen ASP .Net 1 07-06-2009 10:38 PM
Array#uniq with Hash elements... can't remove duplicates andrea Ruby 2 05-12-2008 08:30 AM
deleting an object in array, based on object.attribute Josselin Ruby 3 08-17-2007 08:49 AM
Merge multiple rows and remove duplicates --based on the first value Susan Perl Misc 6 01-27-2006 07:21 PM
remove duplicates from list *preserving order* Steven Bethard Python 11 02-07-2005 09:34 AM



Advertisments