Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Faster datastructure for lookups wanted

Reply
Thread Tools

Faster datastructure for lookups wanted

 
 
m94asr@gmail.com
Guest
Posts: n/a
 
      09-07-2006
Hi all,

maybe somebody can recommend me the right datastructure or
any other advice would be a big help.

My code spends most of its execution time doing lookups from
a hashtable with about 1M keys. The keys are strings and the values
are arrays of integers. Most of the time only of length 1.

I do not care how long the construction of the datastructure takes,
but the lookup should be as fast as possible.

xs.each{|x|
if found = hash[x]
#do sth.
end
}


Thanks a lot!
-Armin

 
Reply With Quote
 
 
 
 
Mauricio Fernandez
Guest
Posts: n/a
 
      09-07-2006
On Fri, Sep 08, 2006 at 06:55:12AM +0900, http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> maybe somebody can recommend me the right datastructure or
> any other advice would be a big help.
>
> My code spends most of its execution time doing lookups from
> a hashtable with about 1M keys. The keys are strings and the values
> are arrays of integers. Most of the time only of length 1.
>
> I do not care how long the construction of the datastructure takes,
> but the lookup should be as fast as possible.


It hardly gets faster than a Hash in Ruby.
You can also try a trie (Patricia tree if you have long keys and care about
space), or Judy arrays (http://rjudy.sourceforge.net), but I wouldn't expect
major performance gains (Judy::JudySL, being more specialized than Hash,
might have a chance)...

--
Mauricio Fernandez - http://eigenclass.org - singular Ruby

 
Reply With Quote
 
 
 
 
Eero Saynatkari
Guest
Posts: n/a
 
      09-07-2006
Mauricio Fernandez wrote:
> On Fri, Sep 08, 2006 at 06:55:12AM +0900, (E-Mail Removed) wrote:
>> maybe somebody can recommend me the right datastructure or
>> any other advice would be a big help.
>>
>> My code spends most of its execution time doing lookups from
>> a hashtable with about 1M keys. The keys are strings and the values
>> are arrays of integers. Most of the time only of length 1.
>>
>> I do not care how long the construction of the datastructure takes,
>> but the lookup should be as fast as possible.

>
> It hardly gets faster than a Hash in Ruby.
> You can also try a trie (Patricia tree if you have long keys and care
> about
> space),


A Trie optimised by cutting off unambiguous traversal would
be a definite possibility.

> or Judy arrays (http://rjudy.sourceforge.net), but I wouldn't
> expect
> major performance gains (Judy::JudySL, being more specialized than Hash,
> might have a chance)...



--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Gregory Seidman
Guest
Posts: n/a
 
      09-08-2006
On Fri, Sep 08, 2006 at 08:28:33AM +0900, Eero Saynatkari wrote:
} Mauricio Fernandez wrote:
} > On Fri, Sep 08, 2006 at 06:55:12AM +0900, (E-Mail Removed) wrote:
} >> maybe somebody can recommend me the right datastructure or
} >> any other advice would be a big help.
} >>
} >> My code spends most of its execution time doing lookups from
} >> a hashtable with about 1M keys. The keys are strings and the values
} >> are arrays of integers. Most of the time only of length 1.
} >>
} >> I do not care how long the construction of the datastructure takes,
} >> but the lookup should be as fast as possible.
} >
} > It hardly gets faster than a Hash in Ruby.
} > You can also try a trie (Patricia tree if you have long keys and care
} > about
} > space),
}
} A Trie optimised by cutting off unambiguous traversal would
} be a definite possibility.

There is a trie gem that implements a Patricia Trie.

http://gemjack.com/gems/trie-0.0.1/classes/Trie.html

Of course, a Patricia Trie assumes no a priori knowledge of your string
inputs. If you know something about your keys, you may be able to do better
with a hash of hashes (to however many layers is appropriate), splitting as
appropriate for your key space. For example, if you know that your keys are
IPv4 addresses that come in dotted quad notation (e.g. 127.0.0.1), you
could do better with (note: untested):

class SplittableHash
def initialize(split)
@split = split
@root = {}
end

def [](key)
key.split(@split).inject(@root) { |h,k| h[k] if h }
end

def []=(key, val)
path = key.split(@split)
key = path.pop
path.inject(@root) { |h,k| h[k] ||= {} }[key] = val
end

end

--Greg


 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      09-08-2006
(E-Mail Removed) wrote:
> Hi all,
>
> maybe somebody can recommend me the right datastructure or
> any other advice would be a big help.
>
> My code spends most of its execution time doing lookups from
> a hashtable with about 1M keys. The keys are strings and the values
> are arrays of integers. Most of the time only of length 1.
>
> I do not care how long the construction of the datastructure takes,
> but the lookup should be as fast as possible.
>
> xs.each{|x|
> if found = hash[x]
> #do sth.
> end
> }


As others said already, a Hash is pretty much the fastest for the
general case. How do your string keys look like? Maybe it is worth
trying symbols instead of strings?

If you unveil a bit more about your application we might be able to come
up with more suggestions.

Kind regards

robert
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
What datastructure to use to store nodes/locations I've visited 6tc1@qlink.queensu.ca Java 3 06-23-2005 02:28 AM
Index-based datastructure Sharp Java 1 03-14-2005 11:56 AM
Set DataStructure Anony! Java 2 08-13-2004 02:32 AM
HELP WANTED HELP WANTED HELP WANTED Harvey ASP .Net 1 07-16-2004 01:12 PM
HELP WANTED HELP WANTED HELP WANTED Harvey ASP .Net 0 07-16-2004 10:00 AM



Advertisments