Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > a promising hash

Reply
Thread Tools

a promising hash

 
 
bob_jenkins@burtleburtle.net
Guest
Posts: n/a
 
      08-16-2006
The C code
for (hash=seed, i=0; i<length; ++i)
hash = tab[(key[i] ^ hash) & 0xff] ^ (hash >> ;
with the bytes of tab[] being permutations of 0..255 is a nice portable
hash, but not all that fast. Or it's a CRC if you fill the table
appropriately. A CRC is linear, has many nice mathematical properties
you may want (or may want to not have). Filling tab[] with
permutations, I've got to check whether that's linear in some sense,
but I don't think it is. Nonlinear functions make better
approximations of random mappings.

A CRC can be made twice as fast,
for (hash=seed, i=0; i<length; ++i)
hash = tab[(key[i] ^ hash) & 0xffff] ^ (hash >> 16);
if key[] has 2-byte entries instead of 1-byte entries, and it can
compute the same hash as the 1-byte version (though you'd need
different tables for different endian machines). In fact you could
wrap the one byte version around the two byte version, using the two
byte version on an aligned subset of the data and the one byte version
on any unaligned head and tail.

Can the same trick be used where tab[] fills each byte with
permutations, or was that trick possible only due to the linearity of
CRC? Apparently the trick can be used no matter how tab[] is filled.
Looking at individual bytes:
a, b, c, d (let x = d^k[i])
=> ta[x], tb[x] ^ a, tc[x] ^ b, td[x] ^ c (let y = td[x] ^ (c^k[i+1]))
=> ta[y], tb[y] ^ ta[x], tc[y] ^ tb[x] ^ a, td[y] ^ tc[x] ^ b
All the bytes of the 64k lookup table are well defined, using (x,y) as
index and combinations of ta, tb, tc, td to define the values. Given
d^k[i], c^k[i+1], and td[] we can get x,y. The ^(hash>>16) would take
care of XORing a,b to the bottom two bytes. One iteration of the two
byte version produces the same result as two iterations of the one byte
version.

I've got to do some more tuning and timings and tests, but this strikes
me as a winner for Perl's internal hash function. It can be fast for
short strings (using the 1-byte version), fast for long strings (mostly
using the 2-byte version), doesn't care about alignment, and can give
the same results on both big endian and little endian machines. The
most likely thing to kill it would be if the 64k array for the two-byte
version gets paged out often enough that the cost of loading it is more
than the benefit of using it. Or if it's linear in GF(2^^ with
unfortunate properties or something like that. I've also got to time
it against Paul Hsieh's hash.

 
Reply With Quote
 
 
 
 
Sherm Pendley
Guest
Posts: n/a
 
      08-16-2006
writes:

> I've got to do some more tuning and timings and tests, but this strikes
> me as a winner for Perl's internal hash function.


I'd suggest bringing this up on p5p, aka the perl5-porters mailing list.
Info on the list is here:

<http://lists.cpan.org/showlist.cgi?name=perl5-porters>

Keep in mind that working patches and benchmark results will be taken a
lot more seriously than ideas and theory.

sherm--

--
Web Hosting by West Virginians, for West Virginians: http://wv-www.net
Cocoa programming in Perl: http://camelbones.sourceforge.net
 
Reply With Quote
 
 
 
 
bob_jenkins@burtleburtle.net
Guest
Posts: n/a
 
      08-20-2006
Sherm Pendley wrote:
> writes:
>
> > I've got to do some more tuning and timings and tests, but this strikes
> > me as a winner for Perl's internal hash function.

>
> I'd suggest bringing this up on p5p, aka the perl5-porters mailing list.
> Info on the list is here:
>
> <http://lists.cpan.org/showlist.cgi?name=perl5-porters>
>
> Keep in mind that working patches and benchmark results will be taken a
> lot more seriously than ideas and theory.
>
> sherm--
>
> --
> Web Hosting by West Virginians, for West Virginians: http://wv-www.net
> Cocoa programming in Perl: http://camelbones.sourceforge.net


Nevermind. At least on a Pentium 4 running gcc 3.3.3, my latest hash
(http://burtleburtle.net/bob/c/lookup3.c) can also give consistent
results regardless of endianness, is faster for 100-byte strings than
the crc-like hash I described above regardless of endianness (2x faster
on little endian, 1.5x faster on big endian), and its speed is roughly
the same for 4-byte strings. Plus it has no tables instead of that 64k
lookup table, so caching would be less of an issue.

Last I checked Perl was using my one-at-a-time hash, which is slower
than either of those hashes. Someone did benchmarks recently showing
Paul Hsieh's hash would be much faster than one-at-a-time, which is
true. Which of Paul's and lookup3's hash was faster varied with the
chip and compiler I tried; but I can plug my lookup3 by saying it's
more thorough and you can extract a 64-bit result from it for the same
price as a 32-bit result. All these hashes (the crc-like one, lookup3,
one-at-a-time, and Paul's) are thorough enough that you can use hash
tables that are powers of two and probably never notice any flaws in
the way they hash.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
hash of hash of hash of hash in c++ rp C++ 1 11-10-2011 04:45 PM
Hash#select returns an array but Hash#reject returns a hash... Srijayanth Sridhar Ruby 19 07-02-2008 12:49 PM
In 'HashMap.put', "if (e.hash == hash && eq(k, e.key))" ? Red Orchid Java 3 01-30-2006 07:04 PM
HEXUS.lifestyle :: Avivo still promising more than it delivers? Silverstrand Front Page News 0 12-23-2005 04:13 PM
New Raw converter looks promising Bart van der Wolf Digital Photography 22 02-28-2005 04:03 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57