Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Speed Golf - Remove Early Dups

Reply
Thread Tools

Speed Golf - Remove Early Dups

 
 
Phrogz
Guest
Posts: n/a
 
      12-06-2005
SUMMARY
What's the fastest and/or shortest you can turn this:
in = ["i", "v", "w", "e", "l", "d", "u", "f", "e", "v", "f", "e",
"d", "e", "w", "d"]

into this:
["i", "l", "u", "v", "f", "e", "w", "d"]
?


DETAILS
The goal is, given a stream of items, to produce the same ordered
stream with no duplicates, where the relative position of the last item
wins. The shortest (and fastest?) for me turns out to be simply:
in.reverse.uniq.reverse

I have the question mark because the in-place version is sometimes
faster, sometimes slower in my tests:
out = in.dup; out.reverse!; out.uniq!; out.reverse!

I thought it might be faster to loop through the array in one pass
myself, but that turns out to be about 4x slower:
seen = {}
out = in.dup
out.each_with_index{ |v,i|
if n=h[v]
out[n] = nil
end
h[v] = i
}
out.compact!

The input for me isn't REALLY an array, but rather a series of items
that I receive one at a time from a depth-first traversal of a graph.
If this influences your answer, so be it


BACKGROUND
Why is this problem mildly interesting? See
http://phrogz.net/nodes/traversingdirectedgraph.asp

I was reading that page for nostalgia the other day and thought about
porting the calculation library to Ruby for fun, and to see how much
easier solving the problem would be in Ruby.

Here's the code I put together for fun (the cells and formulae in the
below correspond to the graphic on the above url), before adding the
'minimal update chain' optimization.

require 'set'
module ClassContainer
def initialize( *args )
self.class[ self.name ] = self
super
end

def self.included( klass )
klass.class_eval{
def self.inherited( subklass )
( @subclasses ||= [] ) << subklass
end

def self.all
@subclasses ||= []
@instances ||= {}
all = @instances.dup
@subclasses.each{ |sub| all.merge!( sub.all ) }
all
end

def self.each
all.each_value{ |i| yield i }
end

def self.[]( name )
all[ name ]
end

def self.[]=( name, instance )
( @instances ||= {} )[ name ] = instance
end

def self.sorted
all.values.sort_by{ |n| n.name.to_s }
end
}
end
end

module GraphNode
attr_reader utgoing_edges, :incoming_edges

def initialize( *args )
@outgoing_edges = Set.new
@incoming_edges = Set.new
end

def edges
@outgoing_edges + @incoming_edges
end

def add_edge_to( other_node )
@outgoing_edges << other_node
other_node.incoming_edges << self
self
end

def clear_incoming
@incoming_edges.each{ |n|
n.outgoing_edges.delete self
}
@incoming_edges = Set.new
self
end

def traverse( seen_nodes = Set.new, &block )
new_seen = seen_nodes.dup << self
@outgoing_edges.each{ |destination|
unless seen_nodes.include?( destination )
yield destination
destination.traverse( new_seen, &block )
end
}
end
end

class Cell
include GraphNode
include ClassContainer

attr_reader :name
attr_accessor :value
def initialize( name, value=0 )
raise "Duplicate ID" if Cell[ name ]
@name = name
@value = value * 1.0

super
end

def value=( new_value )
# Naive update approach
return if @value == new_value
@value = new_value
update_dependents
new_value
end

def update_dependents
traverse{ |formula|
formula.evaluate( true )
}
end

def to_s
'<%s "%s" value=%.2f>' % [ self.class.name, @name, @value ]
end
end

class Formula < Cell
def initialize( *args )
super
@formula, @value = @value.to_s, 0.0
update_dependencies
end

def update_dependencies
clear_incoming
@formula.scan( /@(\w+)/ ).flatten.each{ |source_name|
if incoming = Cell[ source_name.to_sym ]
incoming.add_edge_to( self )
end
}
end

def evaluate( skip_dependents = false )
scope = ValueSpace.new
Cell.each{ |n| scope.set( n.name, n.value ) }
@value = eval( @formula, scope.get_binding )
puts "Updated #{self}"
update_dependents unless skip_dependents
end

def to_s
'<%s "%s" formula="%s" value=%.2f>' % [ self.class.name, @name,
@formula, @value ]
end
end

class ValueSpace
def set( name, value )
instance_variable_set :"@#{name}", value
end
def get_binding
binding
end
end

if $0 == __FILE__
Cell.new( :a, 5 )
Cell.new( :d, 3 )
Cell.new( :e, 32 )
Cell.new( :i, 10 )
Cell.new( :l, 12 )

Formula.new( :b, "@d + @e + @a" )
Formula.new( :g, "@h * @i" ).value = 20
Formula.new( :h, "@g / @i" ).value = 2
Formula.new( :c, "@a + 3" )
Formula.new( :f, "@b + @c" )
Formula.new( :j, "@c + 8" )
Formula.new( :k, "@l + @j" )
Formula.new( :m, "@l + 10" )

# Need to call here to handle circular dependencies
Formula.each{ |f| f.update_dependencies }

# Naive approach...
puts "Naive initial evaluation..."
Formula.each{ |f| f.evaluate }

puts "*" * 40
puts "Initial values..."
puts Cell.sorted

puts "*" * 40
puts "Changing 'a' to 7..."
Cell[ :a ].value = 7

puts "*" * 40
puts "Changing 'i' to 36..."
Cell[ :i ].value = 36

puts '-' * 40
puts "Per cell dependency..."
Cell.each{ |cell|
puts "#{cell.name} -> #{cell.outgoing_edges.map{|c| c.name}.join(',
')}"
}

end

(which outputs)

Naive initial evaluation...
Updated <Formula "c" formula="@a + 3" value=8.00>
Updated <Formula "f" formula="@b + @c" value=8.00>
Updated <Formula "j" formula="@c + 8" value=16.00>
Updated <Formula "k" formula="@l + @j" value=28.00>
Updated <Formula "f" formula="@b + @c" value=8.00>
Updated <Formula "j" formula="@c + 8" value=16.00>
Updated <Formula "k" formula="@l + @j" value=28.00>
Updated <Formula "b" formula="@d + @e + @a" value=40.00>
Updated <Formula "f" formula="@b + @c" value=48.00>
Updated <Formula "k" formula="@l + @j" value=28.00>
Updated <Formula "g" formula="@h * @i" value=20.00>
Updated <Formula "h" formula="@g / @i" value=2.00>
Updated <Formula "m" formula="@l + 10" value=22.00>
Updated <Formula "h" formula="@g / @i" value=2.00>
Updated <Formula "g" formula="@h * @i" value=20.00>
****************************************
Initial values...
<Cell "a" value=5.00>
<Formula "b" formula="@d + @e + @a" value=40.00>
<Formula "c" formula="@a + 3" value=8.00>
<Cell "d" value=3.00>
<Cell "e" value=32.00>
<Formula "f" formula="@b + @c" value=48.00>
<Formula "g" formula="@h * @i" value=20.00>
<Formula "h" formula="@g / @i" value=2.00>
<Cell "i" value=10.00>
<Formula "j" formula="@c + 8" value=16.00>
<Formula "k" formula="@l + @j" value=28.00>
<Cell "l" value=12.00>
<Formula "m" formula="@l + 10" value=22.00>
****************************************
Changing 'a' to 7...
Updated <Formula "b" formula="@d + @e + @a" value=42.00>
Updated <Formula "f" formula="@b + @c" value=50.00>
Updated <Formula "c" formula="@a + 3" value=10.00>
Updated <Formula "f" formula="@b + @c" value=52.00>
Updated <Formula "j" formula="@c + 8" value=18.00>
Updated <Formula "k" formula="@l + @j" value=30.00>
****************************************
Changing 'i' to 36...
Updated <Formula "h" formula="@g / @i" value=0.56>
Updated <Formula "g" formula="@h * @i" value=20.00>
Updated <Formula "g" formula="@h * @i" value=20.00>
Updated <Formula "h" formula="@g / @i" value=0.56>
----------------------------------------
Per cell dependency...
c -> f, j
e -> b
f ->
i -> h, g
l -> k, m
j -> k
b -> f
k ->
g -> h
m ->
a -> b, c
h -> g
d -> b

 
Reply With Quote
 
 
 
 
ako...
Guest
Posts: n/a
 
      12-06-2005
perhaps in.reverse!.uniq!.reverse! ?

 
Reply With Quote
 
 
 
 
ako...
Guest
Posts: n/a
 
      12-06-2005
i meant in.reverse.uniq!.reverse! (the first reverse is non-destructive)

 
Reply With Quote
 
William James
Guest
Posts: n/a
 
      12-06-2005
Phrogz wrote:
> SUMMARY
> What's the fastest and/or shortest you can turn this:
> in = ["i", "v", "w", "e", "l", "d", "u", "f", "e", "v", "f", "e",
> "d", "e", "w", "d"]
>
> into this:
> ["i", "l", "u", "v", "f", "e", "w", "d"]
> ?
>
>
> DETAILS
> The goal is, given a stream of items, to produce the same ordered
> stream with no duplicates, where the relative position of the last item
> wins. The shortest (and fastest?) for me turns out to be simply:
> in.reverse.uniq.reverse
>

....
> The input for me isn't REALLY an array, but rather a series of items
> that I receive one at a time from a depth-first traversal of a graph.
> If this influences your answer, so be it


In = ["i", "v", "w", "e", "l", "d", "u", "f", "e", "v", "f", "e",
"d", "e", "w", "d"]

out = []

In.each{ |x| out.delete( x ); out << x }

p out

 
Reply With Quote
 
ako...
Guest
Posts: n/a
 
      12-06-2005
does this have a quadratic time complexity? doing this by sorting might
be faster... or am i mistaken?

konstantin

 
Reply With Quote
 
Gavin Kistner
Guest
Posts: n/a
 
      12-06-2005
On Dec 5, 2005, at 6:47 PM, ako... wrote:
> does this have a quadratic time complexity? doing this by sorting
> might
> be faster... or am i mistaken?


Quadratic? No. The suggested algorithms are O(2n) or O(3n).


 
Reply With Quote
 
Gavin Kistner
Guest
Posts: n/a
 
      12-06-2005
On Dec 5, 2005, at 6:27 PM, ako... wrote:
> i meant in.reverse.uniq!.reverse! (the first reverse is non-
> destructive)


Array#uniq! can return nil if no changes were made. This will not
occur given the above input, but can for the general case.

Using reverse rather than dup does shave a bit of time off, though,
which puts it as the winner on my machine. So far:

Rehearsal --------------------------------------------------------
William James 4.230000 0.060000 4.290000 ( 4.369104)
Simple Copies 1.180000 0.010000 1.190000 ( 1.227496)
Dup and In-Place 1.210000 0.010000 1.220000 ( 1.240726)
Reverse and In-Place 1.130000 0.010000 1.140000 ( 1.158873)
Hash and Compact 7.220000 0.090000 7.310000 ( 7.552673)
---------------------------------------------- total: 15.150000sec

user system total real
William James 4.230000 0.040000 4.270000 ( 4.465252)
Simple Copies 1.190000 0.010000 1.200000 ( 1.218064)
Dup and In-Place 1.220000 0.010000 1.230000 ( 1.268994)
Reverse and In-Place 1.120000 0.010000 1.130000 ( 1.15253
Hash and Compact 7.220000 0.060000 7.280000 ( 7.511923)


input = ["i", "v", "w", "e", "l", "d", "u", "f", "e", "v", "f", "e",
"d", "e", "w", "d"]

require 'benchmark'
Benchmark.bmbm( 20 ) do |r|
n = 100_000

r.report( 'William James' ){
n.times{
out = []
input.each{ |x| out.delete( x ); out << x }
}
}

r.report( 'Simple Copies' ){
n.times{
input.reverse.uniq.reverse
}
}

r.report( 'Dup and In-Place' ){
n.times{
out = input.dup
out.reverse!
out.uniq!
out.reverse!
out
}
}

r.report( 'Reverse and In-Place' ){
n.times{
out = input.reverse
out.uniq!
out.reverse!
out
}
}

r.report( "Hash and Compact" ){
n.times{
seen = {}
out = input.dup
out.each_with_index{ |val,idx|
if old_idx = seen[ val ]
out[ old_idx ] = nil
end
seen[ val ] = idx
}
}
}

end


 
Reply With Quote
 
Dominik Bathon
Guest
Posts: n/a
 
      12-06-2005
On Tue, 06 Dec 2005 02:27:44 +0100, ako... <(E-Mail Removed)> wrote:

> i meant in.reverse.uniq!.reverse! (the first reverse is non-destructive=

)

Don't chain bang methods:

>> ["i", "l", "u", "v", "f", "e", "w", "d"].reverse.uniq!.reverse!

NoMethodError: undefined method `reverse!' for nil:NilClass
from (irb):3
from :0

They return nil if nothing changed...


 
Reply With Quote
 
David A. Black
Guest
Posts: n/a
 
      12-06-2005
Hi --

On Tue, 6 Dec 2005, Phrogz wrote:

> SUMMARY
> What's the fastest and/or shortest you can turn this:
> in = ["i", "v", "w", "e", "l", "d", "u", "f", "e", "v", "f", "e",
> "d", "e", "w", "d"]
>
> into this:
> ["i", "l", "u", "v", "f", "e", "w", "d"]
> ?
>
>
> DETAILS
> The goal is, given a stream of items, to produce the same ordered
> stream with no duplicates, where the relative position of the last item
> wins. The shortest (and fastest?) for me turns out to be simply:
> in.reverse.uniq.reverse


This entry is not in contention on shortness or speed (don't even bother
to benchmark it -- it's completely off the charts), but I just thought it
looked really cool

a.inject([]){|r,*b|r-b+b}


David
__
David A. Black
http://www.velocityreviews.com/forums/(E-Mail Removed)

"Ruby for Rails", forthcoming from Manning Publications, April 2006!


 
Reply With Quote
 
ako...
Guest
Posts: n/a
 
      12-06-2005
i do not see why not. this is almost exactly a bubble sort. which is in
worst case quadratic.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
High Speed Golf club and ball collisions- images kinsman Digital Photography 4 08-10-2007 08:56 PM
AppendDataBoundItems giving dups David C ASP .Net 2 06-25-2007 05:45 PM
Totally OT: The Disc Golf Twins turn Seven =?iso-8859-1?Q?Frisbee=AE?= MCSE 6 03-21-2005 10:11 PM
remove dups from ArrayList DaveF ASP .Net 1 11-21-2004 10:42 PM
speed speed speed a.metselaar Computer Support 14 12-30-2003 03:34 AM



Advertisments