Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C Programming (http://www.velocityreviews.com/forums/f42-c-programming.html)
-   -   hash table - simple implementation (http://www.velocityreviews.com/forums/t953495-hash-table-simple-implementation.html)

wkevin 10-17-2012 09:02 AM

hash table - simple implementation
 
Hi,
I tried to look for as much as simple implementation of a hash table in "c".

The performance and quality of the hash function do not interest me for this
task.

I found quite a lot of such example. When I reviewed the code, I always found
that something seems to be more complex than needed, and some logic which seems
that can be simplified, some redundant code, etc.

So I tried writing my own.

I tried it on some test cases and it was ok.
I would appreciate if you could review it correctness; I mean, are there
inputs where it will not work correctly?


Here is the code:

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

struct entry {
char *key;
char *val;
struct entry *next;
};


typedef struct entry entry_t;

struct hashtable {
struct entry **table;
int size;
};

typedef struct hashtable hashtable_t;

int hashfn(char *key, hashtable_t *htable)
{
int sum=0, i;
for (i=0; i<strlen(key);i++)
sum+=key[i];
return sum%htable->size;
}

void insert(hashtable_t *htable, char *key, char *val)
{
int hash;
hash = hashfn(key,htable);
entry_t *entry = htable->table[hash];
entry_t *last;
entry_t *new_entry;
while (entry != NULL) {
// replace
if (strcmp(entry->key,key) == 0) {
entry->val = val;
return;
}
last = entry;
entry = entry->next;
}

new_entry = malloc(sizeof(entry_t));
if (!new_entry) {
printf("could not allocate memory for new_entry\n");
return;
}

new_entry->key = key;
new_entry->val = val;
new_entry->next = NULL;
if (htable->table[hash] == NULL)
htable->table[hash] = new_entry;
else
{
// first
if (entry == htable->table[hash])
htable->table[hash] = new_entry;
else
last->next = new_entry;
}
}

char *get(hashtable_t *htable, char *key)
{
int hash = hashfn(key, htable);
entry_t *entry = htable->table[hash];
while (entry != NULL) {
if (strcmp(entry->key,key) == 0)
return entry->val;
entry = entry->next;
}
return NULL;
}


hashtable_t *ht_create(int size)
{
int i;
hashtable_t *htable = malloc(sizeof(hashtable_t));
if (!htable)
return NULL;
htable->table = malloc(size * sizeof(entry_t *));
if (!htable->table)
return NULL;
for (i=0; i<size; i++)
htable->table[i] = NULL;
htable->size = size;
return htable;
}

int main()
{
hashtable_t *htable = ht_create(10);
if (!htable)
return;
insert(htable,"aba","111");
insert(htable,"bbb","222");
insert(htable,"aab","333");
printf("get(aba)=%s\n", get(htable,"aba"));
printf("get(bbb)=%s\n", get(htable,"bbb"));
printf("get(aab)=%s\n", get(htable,"aab"));
}



rgs,
Kevin


Phil Carmody 10-17-2012 12:12 PM

Re: hash table - simple implementation
 
wkevin <wkevils@gmail.com> writes:
> int hashfn(char *key, hashtable_t *htable)
> {
> int sum=0, i;
> for (i=0; i<strlen(key);i++)


I know you said that performance didn't interest you, but I'm always
hypersensitive to unnecessary use of strlen. There's no point iterating
down the string twice. You don't need the length, you know when to stop
already - when you reach the \0.

for (i=0; key[i]; i++)

> sum+=key[i];
> return sum%htable->size;
> }


Phil

--
Regarding TSA regulations:
How are four small bottles of liquid different from one large bottle?
Because four bottles can hold the components of a binary liquid explosive,
whereas one big bottle can't. -- camperdave responding to MacAndrew on /.

Ben Bacarisse 10-17-2012 12:50 PM

Re: hash table - simple implementation
 
wkevin <wkevils@gmail.com> writes:

> Hi,
> I tried to look for as much as simple implementation of a hash table in "c".
>
> The performance and quality of the hash function do not interest me for this
> task.
>
> I found quite a lot of such example. When I reviewed the code, I
> always found that something seems to be more complex than needed, and
> some logic which seems that can be simplified, some redundant code,
> etc.


I think you have some redundant code too.

> So I tried writing my own.
>
> I tried it on some test cases and it was ok.
> I would appreciate if you could review it correctness; I mean, are there
> inputs where it will not work correctly?


You don't give a specification so there is nothing to verify the code
against. There certainly are parameter values that can be supplied that
make the code go wrong. For example, a zero or negative size for
ht_create, or a null pointer used as a table pointer or as a key. But
the most significant problem is that there are also some key strings
that could cause problems on some systems (mine for example!).

You didn't ask for any code improvement but I've made a couple of
suggestions none the less.

> Here is the code:
>
> #include <stdlib.h>
> #include <string.h>
> #include <stdio.h>
>
> struct entry {
> char *key;
> char *val;
> struct entry *next;
> };
>
>
> typedef struct entry entry_t;
>
> struct hashtable {
> struct entry **table;
> int size;
> };
>
> typedef struct hashtable hashtable_t;
>
> int hashfn(char *key, hashtable_t *htable)


I'd use const where appropriate:

int hashfn(const char *key, const hashtable_t *htable)

> {
> int sum=0, i;
> for (i=0; i<strlen(key);i++)
> sum+=key[i];
> return sum%htable->size;


The problem here is that char can be signed so the resulting index may
be negative. All you need do is declare sum as unsigned.

> }
>
> void insert(hashtable_t *htable, char *key, char *val)


Again, I'd use const here (three times).

Also, whenever I write a function that does not return a value, I feel
I've probably gone wrong. Is there not some value that could be useful
to the caller? Here I think you could usefully return the old string
(if there is one). If the strings are allocated, your code will
introduce a memory leak because the old pointer gets lost forever.

> {
> int hash;
> hash = hashfn(key,htable);
> entry_t *entry = htable->table[hash];
> entry_t *last;
> entry_t *new_entry;
> while (entry != NULL) {
> // replace
> if (strcmp(entry->key,key) == 0) {
> entry->val = val;
> return;
> }
> last = entry;
> entry = entry->next;
> }
>
> new_entry = malloc(sizeof(entry_t));


I like to write

new_entry = malloc(sizeof *new_entry);

so that I don't have to check the type -- I know the size is right just
be looking here.

> if (!new_entry) {
> printf("could not allocate memory for new_entry\n");


Errors should be output on stderr. Also, I'd be tempted to exit here.
If you are not going to signal the error to the caller, there's probably
very little point in carrying on.

> return;
> }
>
> new_entry->key = key;
> new_entry->val = val;
> new_entry->next = NULL;
> if (htable->table[hash] == NULL)
> htable->table[hash] = new_entry;
> else
> {
> // first
> if (entry == htable->table[hash])
> htable->table[hash] = new_entry;
> else
> last->next = new_entry;


This second "if" is redundant -- only the "else" branch can ever be
taken.

Also, you may be over-complicating the code since you don't have any a
priori reason to suppose that adding the new entry to the end of the
chain is any better that putting it at the front.

> }
> }
>
> char *get(hashtable_t *htable, char *key)
> {
> int hash = hashfn(key, htable);
> entry_t *entry = htable->table[hash];
> while (entry != NULL) {
> if (strcmp(entry->key,key) == 0)
> return entry->val;
> entry = entry->next;
> }
> return NULL;
> }
>
>
> hashtable_t *ht_create(int size)


I'd copy this prefix and use it everywhere: ht_get, ht_insert etc.

You might want to review your use of signed and unsigned types. If you
want to put the onus of correctness on the caller, you could declare
this parameter to be an unsigned integer type (and I'd then opt for
size_t). It won't prevent problems, of course -- passing -1 will still
be wrong, but it alerts the user of the code to beware. If you leave it
as a signed type, then you should probably check for < 0. In both cases
(signed and unsigned) I'd check for 0.

> {
> int i;
> hashtable_t *htable = malloc(sizeof(hashtable_t));
> if (!htable)
> return NULL;
> htable->table = malloc(size * sizeof(entry_t *));


You can use 'htable->table = malloc(size * sizeof *htable->table);' to,
again, avoid making the reader check the type. (Ditto three lines
above.)

> if (!htable->table)
> return NULL;


Here you have a memory leak, though maybe not a very significant
one. Still, I'd want to free(htable) before returning NULL.

> for (i=0; i<size; i++)
> htable->table[i] = NULL;
> htable->size = size;
> return htable;
> }
>
> int main()


int main(void) is preferred in C.

> {
> hashtable_t *htable = ht_create(10);
> if (!htable)
> return;


You need to return and int! If your compiler did not tell you about
this, turn up the warning level.

> insert(htable,"aba","111");
> insert(htable,"bbb","222");
> insert(htable,"aab","333");
> printf("get(aba)=%s\n", get(htable,"aba"));
> printf("get(bbb)=%s\n", get(htable,"bbb"));
> printf("get(aab)=%s\n", get(htable,"aab"));


Here, you don't need to return anything if you are using C99 or C11 --
the newer standards permit "falling off the end of main", but you don't
use any other features of C99, so I'd add a return here too.

> }


--
Ben.

Ike Naar 10-17-2012 12:51 PM

Re: hash table - simple implementation
 
On 2012-10-17, Phil Carmody <thefatphil_demunged@yahoo.co.uk> wrote:
> wkevin <wkevils@gmail.com> writes:
>> int hashfn(char *key, hashtable_t *htable)
>> {
>> int sum=0, i;
>> for (i=0; i<strlen(key);i++)

>
> I know you said that performance didn't interest you, but I'm always
> hypersensitive to unnecessary use of strlen. There's no point iterating
> down the string twice. You don't need the length, you know when to stop
> already - when you reach the \0.
>
> for (i=0; key[i]; i++)
>
>> sum+=key[i];
>> return sum%htable->size;
>> }


It's even possible that the number of iterations down the string in

>> for (i=0; i<strlen(key);i++)


is larger than two. The compiler is allowed to compute strlen(key)
for every iteration of the loop, which would make the runtime
quadratic.

Eric Sosman 10-17-2012 01:28 PM

Re: hash table - simple implementation
 
On 10/17/2012 5:02 AM, wkevin wrote:
> Hi,
> I tried to look for as much as simple implementation of a hash table in "c".
>
> The performance and quality of the hash function do not interest me for this
> task.


Clearly.

> I found quite a lot of such example. When I reviewed the code, I always found
> that something seems to be more complex than needed, and some logic which seems
> that can be simplified, some redundant code, etc.
>
> So I tried writing my own.
>
> I tried it on some test cases and it was ok.
> I would appreciate if you could review it correctness; I mean, are there
> inputs where it will not work correctly?


Yes, or at least "Yes, possibly."

> Here is the code:
>
> #include <stdlib.h>
> #include <string.h>
> #include <stdio.h>
>
> struct entry {
> char *key;
> char *val;
> struct entry *next;
> };
>
>
> typedef struct entry entry_t;
>
> struct hashtable {
> struct entry **table;
> int size;
> };
>
> typedef struct hashtable hashtable_t;
>
> int hashfn(char *key, hashtable_t *htable)
> {
> int sum=0, i;
> for (i=0; i<strlen(key);i++)
> sum+=key[i];
> return sum%htable->size;


If the key is long enough that the sum of its character codes
exceeds the capacity of an `int' (which might be only 32767), you
get undefined behavior. One of the more likely outcomes is that
`sum' will become negative, in which case the returned value will
be negative or zero. Since you use the returned value as an index
into the array of list headers, a negative value would send you
completely off the rails.

Even though you say you're not interested in the quality of
the hash function, this one is so appallingly bad that I feel duty-
bound to opine that the word "quality" doesn't even apply.

> }
>
> void insert(hashtable_t *htable, char *key, char *val)
> {
> int hash;
> hash = hashfn(key,htable);
> entry_t *entry = htable->table[hash];
> entry_t *last;
> entry_t *new_entry;
> while (entry != NULL) {
> // replace
> if (strcmp(entry->key,key) == 0) {
> entry->val = val;
> return;
> }
> last = entry;
> entry = entry->next;
> }
>
> new_entry = malloc(sizeof(entry_t));
> if (!new_entry) {
> printf("could not allocate memory for new_entry\n");
> return;
> }
>
> new_entry->key = key;
> new_entry->val = val;
> new_entry->next = NULL;
> if (htable->table[hash] == NULL)
> htable->table[hash] = new_entry;
> else
> {
> // first
> if (entry == htable->table[hash])
> htable->table[hash] = new_entry;
> else
> last->next = new_entry;
> }


This should work, but it seems to be an example of the "more
complex than needed" code you set out to eliminate. Why do you
need three cases (including one that never executes!) to add one
item to a linked list?

> }
>
> char *get(hashtable_t *htable, char *key)
> {
> int hash = hashfn(key, htable);
> entry_t *entry = htable->table[hash];
> while (entry != NULL) {
> if (strcmp(entry->key,key) == 0)
> return entry->val;
> entry = entry->next;
> }
> return NULL;
> }
>
>
> hashtable_t *ht_create(int size)
> {
> int i;
> hashtable_t *htable = malloc(sizeof(hashtable_t));
> if (!htable)
> return NULL;
> htable->table = malloc(size * sizeof(entry_t *));
> if (!htable->table)
> return NULL;


Memory leak here.

> for (i=0; i<size; i++)
> htable->table[i] = NULL;
> htable->size = size;
> return htable;
> }
>
> int main()
> {
> hashtable_t *htable = ht_create(10);
> if (!htable)
> return;


Compiler should have squawked here.

> insert(htable,"aba","111");
> insert(htable,"bbb","222");
> insert(htable,"aab","333");
> printf("get(aba)=%s\n", get(htable,"aba"));


Good thing it didn't return NULL, eh?

> printf("get(bbb)=%s\n", get(htable,"bbb"));
> printf("get(aab)=%s\n", get(htable,"aab"));
> }


Overall impressions: It's missing features one expects in a
hash table (deletion, iteration, resizing), and it's utterly
devoid of defensive coding. Still, it'll probably "work" if you
don't push it very hard.

--
Eric Sosman
esosman@comcast-dot-net.invalid

Willem 10-17-2012 03:22 PM

Re: hash table - simple implementation
 
Ike Naar wrote:
)>> for (i=0; i<strlen(key);i++)
)
) is larger than two. The compiler is allowed to compute strlen(key)
) for every iteration of the loop, which would make the runtime
) quadratic.

ITYM: required.

The compiler is allowed to compute it only once, if it can prove that it is
invariant (i.e. that nothing changes the string during the loop).


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Kaz Kylheku 10-17-2012 04:33 PM

Re: hash table - simple implementation
 
On 2012-10-17, wkevin <wkevils@gmail.com> wrote:
> Hi,
> I tried to look for as much as simple implementation of a hash table in "c".
>
> The performance and quality of the hash function do not interest me for this
> task.


If performance and quality do not interest you, then use an associative
list which is exhaustively searched for items.

Hash tables are used when performance matters.

Ike Naar 10-17-2012 08:36 PM

Re: hash table - simple implementation
 
On 2012-10-17, Willem <willem@turtle.stack.nl> wrote:
> Ike Naar wrote:
> )>> for (i=0; i<strlen(key);i++)
> )
> ) is larger than two. The compiler is allowed to compute strlen(key)
> ) for every iteration of the loop, which would make the runtime
> ) quadratic.
>
> ITYM: required.


Given the context, no.

> The compiler is allowed to compute it only once, if it can prove that it is
> invariant (i.e. that nothing changes the string during the loop).


(Nit: if it can prove that nothing changes the *length of the
string* during the loop; one can change a string in ways that leave
the length invariant).

The loop shown in the context of this discussion does not modify
the string, so, in this context, repeated computation of strlen is
not required, but it is allowed.

Keith Thompson 10-17-2012 08:55 PM

Re: hash table - simple implementation
 
Phil Carmody <thefatphil_demunged@yahoo.co.uk> writes:
> wkevin <wkevils@gmail.com> writes:
>> int hashfn(char *key, hashtable_t *htable)
>> {
>> int sum=0, i;
>> for (i=0; i<strlen(key);i++)

>
> I know you said that performance didn't interest you, but I'm always
> hypersensitive to unnecessary use of strlen. There's no point iterating
> down the string twice. You don't need the length, you know when to stop
> already - when you reach the \0.
>
> for (i=0; key[i]; i++)
>
>> sum+=key[i];
>> return sum%htable->size;
>> }


And as Ike Naar points out, it doesn't just iterate over the string
twice; it iterates over the string once for each iteration of the loop.
(strlen() has to traverse the string to find the terminating '\0'.)

Checking for the '\0' yourself is a good approach, though I'd write it a
bit more explicitly:

for (i = 0; key[i] != '\0'; i ++) ...

Another option is to call strlen(), but only once:

int sum = 0;
size_t len = strlen(key);
for (size_t i = 0; i < len; i ++) {
sum += key[i];
}

This does traverse the string twice, but it's not nearly as bad as
recomputing strlen() in the for loop header.

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Eric Sosman 10-17-2012 10:28 PM

Re: hash table - simple implementation
 
On 10/17/2012 9:28 AM, Eric Sosman wrote:
> On 10/17/2012 5:02 AM, wkevin wrote:
>>[...]
>> int hashfn(char *key, hashtable_t *htable)
>> {
>> int sum=0, i;
>> for (i=0; i<strlen(key);i++)
>> sum+=key[i];
>> return sum%htable->size;

>
> If the key is long enough that the sum of its character codes
> exceeds the capacity of an `int' [...]


As Ben Bacarisse points out, even a short string will
get you in trouble on some systems. For a concrete example,

insert(htable, "", "cent sign");

has key string of only one character, but may quite possibly
produce a negative hash index. If you're using the gcc compiler
suite, build your code with the `-fsigned-char' option, run this
insertion, sit back and watch the fun ...

--
Eric Sosman
esosman@comcast-dot-net.invalid


All times are GMT. The time now is 07:04 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.