Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > SymbolTable and string dictionnary

Reply
Thread Tools

SymbolTable and string dictionnary

 
 
Sigfried
Guest
Posts: n/a
 
      11-26-2008
I need to do some lookup on a set of strings with associated data. I've
tried HashMap and TreeMap (wich are slower).

I'd like to get more performance, so i've read about Ternary Search Tree
which are supposed to be equally fast with hashmap on successful searchs.

So is there faster than that ? I mean at least 30 % faster than HashMap
? Is there a digital search tree java implementation (with benchmark?)
around ? I didn't find any on google.
 
Reply With Quote
 
 
 
 
Mark Space
Guest
Posts: n/a
 
      11-26-2008
Sigfried wrote:

> So is there faster than that ? I mean at least 30 % faster than HashMap
> ? Is there a digital search tree java implementation (with benchmark?)
> around ? I didn't find any on google.


No, I think HashMap is as fast as it gets, regardless of language.
There may be some tricks you can plan for a specific application, but
you didn't let us in on what your app is doing.
 
Reply With Quote
 
 
 
 
Roedy Green
Guest
Posts: n/a
 
      11-26-2008
On Wed, 26 Nov 2008 17:12:40 +0100, Sigfried <(E-Mail Removed)>
wrote, quoted or indirectly quoted someone who said :

>I need to do some lookup on a set of strings with associated data. I've
>tried HashMap and TreeMap (wich are slower).
>
>I'd like to get more performance, so i've read about Ternary Search Tree
>which are supposed to be equally fast with hashmap on successful searchs.
>
>So is there faster than that ? I mean at least 30 % faster than HashMap
>? Is there a digital search tree java implementation (with benchmark?)
>around ? I didn't find any on google.


For HashMap to work well it needs plenty of breathing room. Increase
the capacity till you find the optimum setting.

HashMap itself has almost no overhead. About the only thing it does is
call hash and chains of collisions. If the capacity is high enough
there won't be many collisions.

Think about tuning hash -- caching, computing a faster way...

--
Roedy Green Canadian Mind Products
http://mindprod.com
"Humanity is conducting an unintended, uncontrolled, globally pervasive experiment
whose ultimate consequences could be second only to global nuclear war."
~ Environment Canada (The Canadian equivalent of the EPA on global warming)
 
Reply With Quote
 
Sigfried
Guest
Posts: n/a
 
      11-27-2008
Mark Space a écrit :
> Sigfried wrote:
>
>> So is there faster than that ? I mean at least 30 % faster than
>> HashMap ? Is there a digital search tree java implementation (with
>> benchmark?) around ? I didn't find any on google.

>
> No, I think HashMap is as fast as it gets, regardless of language. There
> may be some tricks you can plan for a specific application, but you
> didn't let us in on what your app is doing.


You are right. Since i know the total number of elements in the HashMap,
i can pass the initial capacity, and i can hope that there won't be any
collision at all. It's sad there is no public method to check collisions
count.

So String caches the hashCode, and there is one array access in the
table so it can't be faster than that.

So i would need to be sure that there is no collision by using a more
open implementation, which can specify the hash method (the same string
can belongs to several symbol table).
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      11-27-2008
On Thu, 27 Nov 2008 09:18:01 +0100, Sigfried <(E-Mail Removed)>
wrote, quoted or indirectly quoted someone who said :

>
>You are right. Since i know the total number of elements in the HashMap,
>i can pass the initial capacity, and i can hope that there won't be any
>collision at all. It's sad there is no public method to check collisions
>count.


if you know the keys ahead of time, i.e. they don't change, you can
construct a hashtable with zero collisions.

Sorry I don't have an URL for the algorithm. It is quite old. It
constructs a special purpose hash function.

--
Roedy Green Canadian Mind Products
http://mindprod.com
"Humanity is conducting an unintended, uncontrolled, globally pervasive experiment
whose ultimate consequences could be second only to global nuclear war."
~ Environment Canada (The Canadian equivalent of the EPA on global warming)
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      11-28-2008
On Thu, 27 Nov 2008 20:49:54 -0800, Patricia Shanahan <(E-Mail Removed)>
wrote, quoted or indirectly quoted someone who said :

>The search term for this is "perfect hash".


see http://mindprod.com/jgloss/perfecthash.html
--
Roedy Green Canadian Mind Products
http://mindprod.com
"Humanity is conducting an unintended, uncontrolled, globally pervasive experiment
whose ultimate consequences could be second only to global nuclear war."
~ Environment Canada (The Canadian equivalent of the EPA on global warming)
 
Reply With Quote
 
Sigfried
Guest
Posts: n/a
 
      11-28-2008
Roedy Green a écrit :
> On Thu, 27 Nov 2008 20:49:54 -0800, Patricia Shanahan <(E-Mail Removed)>
> wrote, quoted or indirectly quoted someone who said :
>
>> The search term for this is "perfect hash".

>
> see http://mindprod.com/jgloss/perfecthash.html


Thnaks but i don't know the keys at compile time. I load them once from
data and then i don't update the hashmap.
 
Reply With Quote
 
Tom Anderson
Guest
Posts: n/a
 
      11-28-2008
On Fri, 28 Nov 2008, Sigfried wrote:

> Roedy Green a écrit :
>> On Thu, 27 Nov 2008 20:49:54 -0800, Patricia Shanahan <(E-Mail Removed)>
>> wrote, quoted or indirectly quoted someone who said :
>>
>>> The search term for this is "perfect hash".

>>
>> see http://mindprod.com/jgloss/perfecthash.html

>
> Thnaks but i don't know the keys at compile time. I load them once from
> data and then i don't update the hashmap.


You could still set up the perfect hash at runtime. It might not be worth
it, though - it all depends on how expensive the construction is, how much
you save on each lookup by doing it, and how many times you do lookups, of
course.

tom

--
Would you like to remember more?
 
Reply With Quote
 
Sigfried
Guest
Posts: n/a
 
      12-01-2008
Tom Anderson a écrit :
> On Fri, 28 Nov 2008, Sigfried wrote:
>
>> Roedy Green a écrit :
>>> On Thu, 27 Nov 2008 20:49:54 -0800, Patricia Shanahan <(E-Mail Removed)>
>>> wrote, quoted or indirectly quoted someone who said :
>>>
>>>> The search term for this is "perfect hash".
>>>
>>> see http://mindprod.com/jgloss/perfecthash.html

>>
>> Thnaks but i don't know the keys at compile time. I load them once
>> from data and then i don't update the hashmap.

>
> You could still set up the perfect hash at runtime. It might not be


The question is how ? I only know i will have a bunch of strings as keys...

> worth it, though - it all depends on how expensive the construction is,
> how much you save on each lookup by doing it, and how many times you do
> lookups, of course.
>
> tom
>

 
Reply With Quote
 
Tom Anderson
Guest
Posts: n/a
 
      12-02-2008
On Mon, 1 Dec 2008, Sigfried wrote:

> Tom Anderson a écrit :
>> On Fri, 28 Nov 2008, Sigfried wrote:
>>
>>> Roedy Green a écrit :
>>>> On Thu, 27 Nov 2008 20:49:54 -0800, Patricia Shanahan <(E-Mail Removed)>
>>>> wrote, quoted or indirectly quoted someone who said :
>>>>
>>>>> The search term for this is "perfect hash".
>>>>
>>>> see http://mindprod.com/jgloss/perfecthash.html
>>>
>>> Thnaks but i don't know the keys at compile time. I load them once from
>>> data and then i don't update the hashmap.

>>
>> You could still set up the perfect hash at runtime. It might not be

>
> The question is how ? I only know i will have a bunch of strings as keys...


There is an algorithm which sets up the hashtable - it takes a set of keys
as input, and gives you some kind of configuration variables for the table
as output. Traditionally, you run this when you write the code. Instead,
you run it after you've loaded the keys from the data.

tom

--
Caps lock is like cruise control for cool.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Counting occurances of string A in string B, and adding it to string B Sandman Perl Misc 7 08-03-2004 08:46 PM
Dictionnary vs Class for configuration Famille Delorme Python 7 05-01-2004 01:27 PM
Re: Identity dictionnary Bob Ippolito Python 1 02-29-2004 02:49 AM
deflater/inflater and dictionnary for huffman NOBODY Java 2 10-17-2003 08:56 AM
newbie question about dictionnary ? Sophie Alléon Python 9 09-05-2003 06:53 PM



Advertisments