Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Re: Binary search trees or hash tables?

Reply
Thread Tools

Re: Binary search trees or hash tables?

 
 
BGB / cr88192
Guest
Posts: n/a
 
      01-06-2010

"remod" <(E-Mail Removed)> wrote in message
news:4b42f961$0$1131$(E-Mail Removed). ..
> Talking about containers made me wanting to reconsider my choice of hash
> tables as the basis for the mapping key/value I have in my library.
>
> My (self imposed) requirement are:
>
> - container for key/value mapping
> - simplicity of implementation
> - reasonable fast insert/search/delete
> - reasonable overhead
>


<snip>

>
> This very rough estimate made me lean forward hashtables. Did I miss
> anything?
>
> I should now make a more precise analyis and take a final decision:
> rewrite/optimize the current hashtable functions or move to BST?
>
> What would you suggest?
>


personally, I have not used BST's much for key/value lookup and similar
uses.

typically, I have used either:
large hashes, which can be very fast for certain uses;
hashed arrays, possibly with chaining between the array elements.

so, yeah, large hashes, allow quick lookup, and do have costs for resizes or
deletes (as a result, most places where I have used them, they have been
fixed-size without support for delete).

a hashed array is typically a bit better, since in this case one can have a
small fixed-size hash, and use it to speed up lookup into a potentially much
larger array (and, as well, the array need not have bunches of wasted
space).

sorted arrays are also nice, because they allow an O(log n) lookup without
the added bulk or rebalancing of a BST, but may have a higher cost for
inserts or deletes if the array needs to be kept sorted. sorted arrays are
possibly a good option if one needs fast lookup but inserts/deletes are
infrequent (a BST will often involve a little higher overhead for lookups,
due mostly node stepping, and possibly a notably higher cost if one
implements them via recursion rather than a loop).

another observation: for small N, a linear lookup may actually be faster
than either a binary lookup or a BST (for N<=5 or so, linear-search seems to
be the best bet).

yes, IME, one can find themselves doing a very large number of lookups over
a small number of items, in contrast to the much more "popular" case of a
small number of lookups over a large number items (well, except if the
number of lookups is very rare, where the cost of maintaining a BST or
sorted array could easily exceed the costs of just doing lookups with a
linear search).


so, I don't think there is a generally "best" option here, more specific
needs in specific use cases.


in my case, I have mostly used binary trees for spatial lookup tasks, since
these are not readily optimized with other strategies (one can't exactly
"hash" a spatial coordinate).

actually, one algo I like is actually derived partly from quicksort, since I
had noted that with slight tweaking a quicksort-like strategy can build BSPs
in O(n log n) time, vs the O(n^2) or O(n^3) as is more commonly the case for
BSP building, but this is at the cost that they tend to be far from optimal.

so, it is usually useful for building a BSP in real-time, but not so good if
one needs a good static BSP...


 
Reply With Quote
 
 
 
 
BGB / cr88192
Guest
Posts: n/a
 
      01-06-2010

"Richard Harter" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> On Wed, 6 Jan 2010 12:19:19 -0700, "BGB / cr88192"
> <(E-Mail Removed)> wrote:
>
> [snip]
>
>>another observation: for small N, a linear lookup may actually be faster
>>than either a binary lookup or a BST (for N<=5 or so, linear-search seems
>>to
>>be the best bet).

>
> For searches of integers linear lookups using a sentinel are
> typically faster up to about N=20 to 25.
>


ok, I hadn't checked exactly...

where I had observed this mostly was while using it for looking up a value
within a range:
min<=value<max.

I tried switching to a binary lookup, but had observed that is was actually
slower (and also that N was generally small), but had not gone on an attempt
to figure out where the "break-even" point was.

my estimate of "5 or so" was based roughly on algorithmic complexity, noting
that, for example, log2(20) is a bit less than 20, ...



 
Reply With Quote
 
 
 
 
BGB / cr88192
Guest
Posts: n/a
 
      01-06-2010

"remod" <(E-Mail Removed)> wrote in message
news:4b44f637$0$1119$(E-Mail Removed). ..
> Richard Harter ha scritto:
>> For searches of integers linear lookups using a sentinel are
>> typically faster up to about N=20 to 25.

>
> This would make me lean even further towards hashtables. I was thinking
> about implmenting some form of cuckoo hashing but I learned yesterday
> about "Hopscotch hashing" (which I had never heard) and that would be
> intersting to use.
>


I have usually implemented the step-hopping via a small PRNG (pseudo-random
number generator).
the PRNG is usually a fairly simple expression:
key1=key*prime;

which can them be masked (or shifted and masked, depending on the prime
chosen, ...) to get the hash-spot.

IME this usually works fairly well...


oddly, I have never heard of anyone else doing this.

then again, I am probably also in a minority for using a memory allocator
based on bit-maps as well, ...



 
Reply With Quote
 
Moi
Guest
Posts: n/a
 
      01-06-2010
On Wed, 06 Jan 2010 14:10:33 -0700, BGB / cr88192 wrote:

> "remod" <(E-Mail Removed)> wrote in message
> news:4b44f637$0$1119$(E-Mail Removed). ..
>> Richard Harter ha scritto:


> I have usually implemented the step-hopping via a small PRNG
> (pseudo-random number generator).
> the PRNG is usually a fairly simple expression: key1=key*prime;



odd * odd -->> odd
odd * even -->> even
even * even --> even

Now consider what happens when your prime is odd (which is very likely


> which can them be masked (or shifted and masked, depending on the prime
> chosen, ...) to get the hash-spot.
>
> IME this usually works fairly well...
>
>
> oddly, I have never heard of anyone else doing this.


Maybe there is a reason for that.

AvK
 
Reply With Quote
 
BGB / cr88192
Guest
Posts: n/a
 
      01-06-2010

"Moi" <(E-Mail Removed)> wrote in message
news:69aee$4b4505ac$5350c024$(E-Mail Removed) abel.net...
> On Wed, 06 Jan 2010 14:10:33 -0700, BGB / cr88192 wrote:
>
>> "remod" <(E-Mail Removed)> wrote in message
>> news:4b44f637$0$1119$(E-Mail Removed). ..
>>> Richard Harter ha scritto:

>
>> I have usually implemented the step-hopping via a small PRNG
>> (pseudo-random number generator).
>> the PRNG is usually a fairly simple expression: key1=key*prime;

>
>
> odd * odd -->> odd
> odd * even -->> even
> even * even --> even
>
> Now consider what happens when your prime is odd (which is very likely
>


I often do a right shift as well (to key the slot index).

so, if my prime is 4093, I often use >>12.
however, if the prime is mersenne, it seems to be better not to shift.

this basic strategy was grabbed from 'rand()', and in general seems to have
fairly good distribution properties.


>
>> which can them be masked (or shifted and masked, depending on the prime
>> chosen, ...) to get the hash-spot.
>>
>> IME this usually works fairly well...
>>
>>
>> oddly, I have never heard of anyone else doing this.

>
> Maybe there is a reason for that.
>


dunno...

most people "seem" to be a lot of funkiness with xor, but in my tests the
statistical distribution tends to be worse with these xor based tricks, and
also there does not seem to be much speed difference between an integer
multiply and an xor.


> AvK



 
Reply With Quote
 
BGB / cr88192
Guest
Posts: n/a
 
      01-07-2010

"remod" <(E-Mail Removed)> wrote in message
news:4b450975$0$1115$(E-Mail Removed). ..
> BGB / cr88192 ha scritto:
>
>> "remod" <(E-Mail Removed)> wrote in message
>>> [...] I learned yesterday about "Hopscotch hashing"(which
> >> I had never heard of) and that would be intersting to use.

>
>> I have usually implemented the step-hopping via a small PRNG
>> (pseudo-random number generator). oddly, I have never heard of anyone
>> else doing this.

>
> Probably because it's not easy to ensure the PRNG has a period that is
> long enough. The technique is well documented in the literature, this
> report, for example, gives a good explanation (section 2.2 and section 3)
> on why the generator (5i mod 4*n)/4 works well with tables that have a
> number of slots n that is a power of 2:
> http://citeseerx.ist.psu.edu/viewdoc...10.1.1.95.5606
>
> The Hopscotch hashing should be more cache-friendly that using a PRNG,
> since it's highly probable that elements that hashes to the same number
> are also in the same cache line.
>


well, I never really looked into the theory that much, and most of this was
actually a result of occasional use of statistical tests and benchmarking...


> Anyway, seems that I discarded BST again!
>


ok.


> R.D.



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
hash of hash of hash of hash in c++ rp C++ 1 11-10-2011 04:45 PM
Binary search trees (AVL trees) jacob navia C Programming 34 01-08-2010 07:27 PM
Re: Binary search trees or hash tables? Dann Corbit C Programming 0 01-06-2010 04:16 AM
binary search trees j_depp_99@yahoo.com C++ 2 03-05-2008 02:11 PM
Help : Merging Binary Search Trees ptrSriram C Programming 3 12-12-2005 04:37 AM



Advertisments