Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > HashMap vs linear table lookup

Reply
Thread Tools

HashMap vs linear table lookup

 
 
Roedy Green
Guest
Posts: n/a
 
      02-15-2008
For sufficiently small tables, a linear array lookup to find a string
would be faster than a HashMap. Has anyone experimented to get a rough
idea where the break-even point is?
--

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
 
Reply With Quote
 
 
 
 
Mark Space
Guest
Posts: n/a
 
      02-15-2008
Roedy Green wrote:
> For sufficiently small tables, a linear array lookup to find a string
> would be faster than a HashMap. Has anyone experimented to get a rough
> idea where the break-even point is?



If you try to measure this, I think it will depend on the length of the
Strings. The hashing algorithm for Strings used to only pick a few
characters to "save time." This didn't work well for large hash tables
since many similar strings would have the letters chosen in common,
resulting in poor hash distribution. So now I think every letter in a
String is used when computing the hash.

So anyway, yeah, don't forget about string size.
 
Reply With Quote
 
 
 
 
Christian
Guest
Posts: n/a
 
      02-15-2008
Roedy Green schrieb:
> For sufficiently small tables, a linear array lookup to find a string
> would be faster than a HashMap. Has anyone experimented to get a rough
> idea where the break-even point is?
> --
>
> Roedy Green Canadian Mind Products
> The Java Glossary
> http://mindprod.com


well Intresting.. hope you accept this as valid test:

public class TestString {

/**
* @param args
*/
public static void main(String[] args) {

for (int size=1;size < 100; size++) {
if (testHashMap(size) <= testArray(size)) {
System.out.println("hashmap wins at: "+size);
}
}


}


private static long testHashMap(int size) {
Set<String> map = new HashSet<String>();
for (int i=0 ; i < size; i++) {
map.add(i+"+"+(-i)+"= 0");
}

long time = System.nanoTime();
for (int repetitions=0; repetitions < 10000; repetitions++ ) {
for (int i=0;i < size; i++) {
if (!map.contains(i+"+"+(-i)+"= 0")) {
throw new IllegalStateException();
}
}
}
return System.nanoTime()-time;
}

private static long testArray(int size) {
String[] array = new String[size];
for (int i=0 ; i < size; i++) {
array[i] =i+"+"+(-i)+"= 0";
}

long time = System.nanoTime();
for (int repetitions=0; repetitions < 10000; repetitions++ ) {
for (int i=0;i < size; i++) {
String onSearch = i+"+"+(-i)+"= 0";
for (int x=0; x < size; x++) {
if (onSearch.equals(array[x])) {
break;
}
}
}
}
return System.nanoTime()-time;
}
}

on my machine with 5 elements it seems about even .. sometimes the
HashSet wins .. and sometimes the array.. after that HashSet is the winner.

Christian
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      02-15-2008
On Fri, 15 Feb 2008 19:31:58 GMT, Roedy Green
<> wrote, quoted or indirectly quoted
someone who said :

>For sufficiently small tables, a linear array lookup to find a string
>would be faster than a HashMap. Has anyone experimented to get a rough
>idea where the break-even point is?
>--


I wrote this code to experiment. Turns out binary search is the pits.
HashSet starts to get better around 11 items.
--

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      02-15-2008
On Fri, 15 Feb 2008 20:29:30 GMT, Roedy Green
<> wrote, quoted or indirectly quoted
someone who said :

>I wrote this code to experiment. Turns out binary search is the pits.
>HashSet starts to get better around 11 items.


The sample keys are nicely scrambled to make HashSet happier than it
deserves to be.


package com.mindprod.example;

import java.util.Arrays;
import java.util.HashSet;
import java.util.Random;

/*
* Test relative speed of 3 methods of looking up a key to see if it is
is the list of legal keys.
* HashSet, binary search, linear search for different numbers of keys
n.
* <p/>
* composed with IntelliJ IDEA
*
* @author Roedy Green, Canadian Mind Products
* @version 1.0, 2008-02-15
*/
public class TestFinding
{
// ------------------------------ FIELDS
------------------------------

/**
* lookup the random strings
*/
private static HashSet<String> h;

/**
* random number generator
*/
private final static Random wheel = new Random();

/**
* list of strings we want to look up
*/
private static String[] keys;

// -------------------------- STATIC METHODS
--------------------------

/**
* build lookup structure
*/
private static void build()
{
// build HashSet
h = new HashSet<String>( Arrays.asList( keys ) );

// build binary search
Arrays.sort( keys );
}

/**
* find all the keys by BinarySearch
*/
private static void findAllViaBinarySearch()
{
for ( String lookFor : keys )
{
int whereFound = Arrays.binarySearch( keys, lookFor );
// -ve means not found +ve tells where found.
}
}

/**
* find all the keys by HashSet
*/
private static void findAllViaHashSet()
{
for ( String lookFor : keys )
{
boolean found = h.contains( lookFor );
}
}

/**
* find all the keys by linear search
*/
private static void findAllViaLinearSearch()
{
for ( String lookFor : keys )
{
boolean found = false;
for ( String candidate : keys )
{
if ( lookFor.equals( candidate ) )
{
found = true;
break;
}
}
}
}

/**
* generate N random keys and their lookup structure
*/
private static void generate( int n )
{
keys = new String[n];
for ( int i = 0; i < n; i++ )
{
keys[ i ] = Long.toString( wheel.nextLong() );
}
}

// --------------------------- main() method
---------------------------

/**
* main method
*
* @param args not used
*/
public static void main( String[] args )
{

for ( int n = 1; n <= 10000; n *= 10 )
{
generate( n );
build();
final long t1 = System.nanoTime();
findAllViaBinarySearch();
final long t2 = System.nanoTime();
findAllViaHashSet();
final long t3 = System.nanoTime();
findAllViaLinearSearch();
final long t4 = System.nanoTime();

System.out
.println( "n:" +
n +
" binary search: " +
( t2 - t1 ) +
" HashSet: " +
( t3 - t2 ) +
" Linear: " +
( t4 - t3 ) );
}
// Here in output on my machine With JDK 1.6.0_04
//Linear is best for 10 or fewer, then HashSet.
// Binary search is never optimal.
// n:1 binary search: 21600 HashSet: 9720 Linear: 6400*
// n:10 binary search: 58720 HashSet: 11360 Linear: 11120*
// n:100 binary search: 707320 HashSet: 97520* Linear: 695960
// n:1000 binary search: 1294360 HashSet: 1255480* Linear:
10911040
// n:10000 binary search: 9823600 HashSet: 3920200* Linear:
1513846600

// Here in output on my machine with Jet
// n:1 binary search: 5520 HashSet: 4760 Linear: 1240*
// n:10 binary search: 3560 HashSet: 2480 Linear: 2400*
// n:100 binary search: 28160 HashSet: 5480* Linear: 44840
// n:1000 binary search: 408840 HashSet: 43560* Linear:
7862960
// n:10000 binary search: 9142440 HashSet: 2295320* Linear:
1852690520
}
}
--

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
 
Reply With Quote
 
Lord Zoltar
Guest
Posts: n/a
 
      02-15-2008
On Feb 15, 3:14*pm, Mark Space <marksp...@sbc.global.net> wrote:
> Roedy Green wrote:
> > For sufficiently small tables, a linear array lookup to find a string
> > would be faster than a HashMap. Has anyone experimented to get a rough
> > idea where the break-even point is?

>
> If you try to measure this, I think it will depend on the length of the
> Strings. *The hashing algorithm for Strings used to only pick a few
> characters to "save time." *This didn't work well for large hash tables
> since many similar strings would have the letters chosen in common,
> resulting in poor hash distribution. *So now I think every letter in a
> String is used when computing the hash.
>
> So anyway, yeah, don't forget about string size.


It would probably also be possible to write your own hash function, if
you know more about the input (such as avg length of strings, if the
strings only use a certain subset of all characters, etc...). Doing
that might provide better performance, in certain cases. You'd just
have to try it to see
 
Reply With Quote
 
Patricia Shanahan
Guest
Posts: n/a
 
      02-15-2008
Roedy Green wrote:
....
> The sample keys are nicely scrambled to make HashSet happier than it
> deserves to be.

....

Although the approximate break-even point is interesting, if I had a
performance problem involving this sort of lookup I would do my own
experiments, using typical table contents and probe sequences from my
application.

Patricia
 
Reply With Quote
 
Stefan Ram
Guest
Posts: n/a
 
      02-15-2008
Lord Zoltar <> writes:
>It would probably also be possible to write your own hash function,


Here is an example, using a lookup indexed by the final
two letters of the month TLA. (This might actually be a
perfect hash, I am not sure about this.)

A small table might be used instead of the »switch« in »show«.
This is no hash map, but a direct array look up by a special
hash function, I believe O(1).

public class Main
{
int[] v;

public Main()
{ v = new int[ 256 ]; for( int i = 0; i < 256; ++i )v[ i ]=15;
v[ 97 ]= 0; v[ 98 ]= 9; v[ 99 ]= 4; v[ 101 ]= 2; v[ 103 ]= 9;
v[ 108 ]= 8; v[ 110 ]= 2; v[ 111 ]= 4; v[ 112 ]= 3; v[ 114 ]= 1;
v[ 116 ]= 3; v[ 117 ]= 1; v[ 118 ]= 4; v[ 121 ]= 0; }

int hash( final java.lang.String s )
{ return v[ s.charAt( 1 )]+v[ s.charAt( 2 )] ; }

void show( final java.lang.String s )
{ switch ( hash( s ))
{ case 4: case 3: case 5: case 8:
java.lang.System.out.println( 30 ); break;
case 11: break;
default: java.lang.System.out.println( 31 ); break; }}

public static void main( final java.lang.String[] args )
{ new Main().show( "Jun" ); new Main().show( "Oct" ); }}

It prints:

30
31

 
Reply With Quote
 
Mike Schilling
Guest
Posts: n/a
 
      02-16-2008
Roedy Green wrote:
> For sufficiently small tables, a linear array lookup to find a
> string
> would be faster than a HashMap. Has anyone experimented to get a
> rough
> idea where the break-even point is?


I presume that it's small enough that, in all nromal cases, they're


 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      02-16-2008
On Fri, 15 Feb 2008 20:31:19 GMT, Roedy Green
<> wrote, quoted or indirectly quoted
someone who said :

> for ( int n = 1; n <= 10000; n *= 10 )


at n = 100,000 the benchmark becomes impossibly slow. The linear
search total time is o(n^2).
--

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
reuse HashMap$Entry (or HashMap in total) to avoid millions of allocations Vince Darley Java 4 03-02-2010 07:48 AM
java.util.Properties extending from HashMap<Object, Object> insteadof HashMap<String, String> Rakesh Java 10 04-08-2008 04:22 AM
Hash table: linear probe vs. chaining Ian Pilcher Java 18 12-13-2005 05:27 AM
Code for linear congruences, diophantine linear equations alessandra_cabrini@virgilio.it Java 1 11-15-2005 12:23 PM
populating an asp list box from a simple access lookup list (single column not a table) gerry ASP .Net 0 04-24-2004 09:21 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57