![]() |
hash table - simple implementation
Hi,
I tried to look for as much as simple implementation of a hash table in "c". The performance and quality of the hash function do not interest me for this task. I found quite a lot of such example. When I reviewed the code, I always found that something seems to be more complex than needed, and some logic which seems that can be simplified, some redundant code, etc. So I tried writing my own. I tried it on some test cases and it was ok. I would appreciate if you could review it correctness; I mean, are there inputs where it will not work correctly? Here is the code: #include <stdlib.h> #include <string.h> #include <stdio.h> struct entry { char *key; char *val; struct entry *next; }; typedef struct entry entry_t; struct hashtable { struct entry **table; int size; }; typedef struct hashtable hashtable_t; int hashfn(char *key, hashtable_t *htable) { int sum=0, i; for (i=0; i<strlen(key);i++) sum+=key[i]; return sum%htable->size; } void insert(hashtable_t *htable, char *key, char *val) { int hash; hash = hashfn(key,htable); entry_t *entry = htable->table[hash]; entry_t *last; entry_t *new_entry; while (entry != NULL) { // replace if (strcmp(entry->key,key) == 0) { entry->val = val; return; } last = entry; entry = entry->next; } new_entry = malloc(sizeof(entry_t)); if (!new_entry) { printf("could not allocate memory for new_entry\n"); return; } new_entry->key = key; new_entry->val = val; new_entry->next = NULL; if (htable->table[hash] == NULL) htable->table[hash] = new_entry; else { // first if (entry == htable->table[hash]) htable->table[hash] = new_entry; else last->next = new_entry; } } char *get(hashtable_t *htable, char *key) { int hash = hashfn(key, htable); entry_t *entry = htable->table[hash]; while (entry != NULL) { if (strcmp(entry->key,key) == 0) return entry->val; entry = entry->next; } return NULL; } hashtable_t *ht_create(int size) { int i; hashtable_t *htable = malloc(sizeof(hashtable_t)); if (!htable) return NULL; htable->table = malloc(size * sizeof(entry_t *)); if (!htable->table) return NULL; for (i=0; i<size; i++) htable->table[i] = NULL; htable->size = size; return htable; } int main() { hashtable_t *htable = ht_create(10); if (!htable) return; insert(htable,"aba","111"); insert(htable,"bbb","222"); insert(htable,"aab","333"); printf("get(aba)=%s\n", get(htable,"aba")); printf("get(bbb)=%s\n", get(htable,"bbb")); printf("get(aab)=%s\n", get(htable,"aab")); } rgs, Kevin |
Re: hash table - simple implementation
wkevin <wkevils@gmail.com> writes:
> int hashfn(char *key, hashtable_t *htable) > { > int sum=0, i; > for (i=0; i<strlen(key);i++) I know you said that performance didn't interest you, but I'm always hypersensitive to unnecessary use of strlen. There's no point iterating down the string twice. You don't need the length, you know when to stop already - when you reach the \0. for (i=0; key[i]; i++) > sum+=key[i]; > return sum%htable->size; > } Phil -- Regarding TSA regulations: How are four small bottles of liquid different from one large bottle? Because four bottles can hold the components of a binary liquid explosive, whereas one big bottle can't. -- camperdave responding to MacAndrew on /. |
Re: hash table - simple implementation
wkevin <wkevils@gmail.com> writes:
> Hi, > I tried to look for as much as simple implementation of a hash table in "c". > > The performance and quality of the hash function do not interest me for this > task. > > I found quite a lot of such example. When I reviewed the code, I > always found that something seems to be more complex than needed, and > some logic which seems that can be simplified, some redundant code, > etc. I think you have some redundant code too. > So I tried writing my own. > > I tried it on some test cases and it was ok. > I would appreciate if you could review it correctness; I mean, are there > inputs where it will not work correctly? You don't give a specification so there is nothing to verify the code against. There certainly are parameter values that can be supplied that make the code go wrong. For example, a zero or negative size for ht_create, or a null pointer used as a table pointer or as a key. But the most significant problem is that there are also some key strings that could cause problems on some systems (mine for example!). You didn't ask for any code improvement but I've made a couple of suggestions none the less. > Here is the code: > > #include <stdlib.h> > #include <string.h> > #include <stdio.h> > > struct entry { > char *key; > char *val; > struct entry *next; > }; > > > typedef struct entry entry_t; > > struct hashtable { > struct entry **table; > int size; > }; > > typedef struct hashtable hashtable_t; > > int hashfn(char *key, hashtable_t *htable) I'd use const where appropriate: int hashfn(const char *key, const hashtable_t *htable) > { > int sum=0, i; > for (i=0; i<strlen(key);i++) > sum+=key[i]; > return sum%htable->size; The problem here is that char can be signed so the resulting index may be negative. All you need do is declare sum as unsigned. > } > > void insert(hashtable_t *htable, char *key, char *val) Again, I'd use const here (three times). Also, whenever I write a function that does not return a value, I feel I've probably gone wrong. Is there not some value that could be useful to the caller? Here I think you could usefully return the old string (if there is one). If the strings are allocated, your code will introduce a memory leak because the old pointer gets lost forever. > { > int hash; > hash = hashfn(key,htable); > entry_t *entry = htable->table[hash]; > entry_t *last; > entry_t *new_entry; > while (entry != NULL) { > // replace > if (strcmp(entry->key,key) == 0) { > entry->val = val; > return; > } > last = entry; > entry = entry->next; > } > > new_entry = malloc(sizeof(entry_t)); I like to write new_entry = malloc(sizeof *new_entry); so that I don't have to check the type -- I know the size is right just be looking here. > if (!new_entry) { > printf("could not allocate memory for new_entry\n"); Errors should be output on stderr. Also, I'd be tempted to exit here. If you are not going to signal the error to the caller, there's probably very little point in carrying on. > return; > } > > new_entry->key = key; > new_entry->val = val; > new_entry->next = NULL; > if (htable->table[hash] == NULL) > htable->table[hash] = new_entry; > else > { > // first > if (entry == htable->table[hash]) > htable->table[hash] = new_entry; > else > last->next = new_entry; This second "if" is redundant -- only the "else" branch can ever be taken. Also, you may be over-complicating the code since you don't have any a priori reason to suppose that adding the new entry to the end of the chain is any better that putting it at the front. > } > } > > char *get(hashtable_t *htable, char *key) > { > int hash = hashfn(key, htable); > entry_t *entry = htable->table[hash]; > while (entry != NULL) { > if (strcmp(entry->key,key) == 0) > return entry->val; > entry = entry->next; > } > return NULL; > } > > > hashtable_t *ht_create(int size) I'd copy this prefix and use it everywhere: ht_get, ht_insert etc. You might want to review your use of signed and unsigned types. If you want to put the onus of correctness on the caller, you could declare this parameter to be an unsigned integer type (and I'd then opt for size_t). It won't prevent problems, of course -- passing -1 will still be wrong, but it alerts the user of the code to beware. If you leave it as a signed type, then you should probably check for < 0. In both cases (signed and unsigned) I'd check for 0. > { > int i; > hashtable_t *htable = malloc(sizeof(hashtable_t)); > if (!htable) > return NULL; > htable->table = malloc(size * sizeof(entry_t *)); You can use 'htable->table = malloc(size * sizeof *htable->table);' to, again, avoid making the reader check the type. (Ditto three lines above.) > if (!htable->table) > return NULL; Here you have a memory leak, though maybe not a very significant one. Still, I'd want to free(htable) before returning NULL. > for (i=0; i<size; i++) > htable->table[i] = NULL; > htable->size = size; > return htable; > } > > int main() int main(void) is preferred in C. > { > hashtable_t *htable = ht_create(10); > if (!htable) > return; You need to return and int! If your compiler did not tell you about this, turn up the warning level. > insert(htable,"aba","111"); > insert(htable,"bbb","222"); > insert(htable,"aab","333"); > printf("get(aba)=%s\n", get(htable,"aba")); > printf("get(bbb)=%s\n", get(htable,"bbb")); > printf("get(aab)=%s\n", get(htable,"aab")); Here, you don't need to return anything if you are using C99 or C11 -- the newer standards permit "falling off the end of main", but you don't use any other features of C99, so I'd add a return here too. > } -- Ben. |
Re: hash table - simple implementation
On 2012-10-17, Phil Carmody <thefatphil_demunged@yahoo.co.uk> wrote:
> wkevin <wkevils@gmail.com> writes: >> int hashfn(char *key, hashtable_t *htable) >> { >> int sum=0, i; >> for (i=0; i<strlen(key);i++) > > I know you said that performance didn't interest you, but I'm always > hypersensitive to unnecessary use of strlen. There's no point iterating > down the string twice. You don't need the length, you know when to stop > already - when you reach the \0. > > for (i=0; key[i]; i++) > >> sum+=key[i]; >> return sum%htable->size; >> } It's even possible that the number of iterations down the string in >> for (i=0; i<strlen(key);i++) is larger than two. The compiler is allowed to compute strlen(key) for every iteration of the loop, which would make the runtime quadratic. |
Re: hash table - simple implementation
On 10/17/2012 5:02 AM, wkevin wrote:
> Hi, > I tried to look for as much as simple implementation of a hash table in "c". > > The performance and quality of the hash function do not interest me for this > task. Clearly. > I found quite a lot of such example. When I reviewed the code, I always found > that something seems to be more complex than needed, and some logic which seems > that can be simplified, some redundant code, etc. > > So I tried writing my own. > > I tried it on some test cases and it was ok. > I would appreciate if you could review it correctness; I mean, are there > inputs where it will not work correctly? Yes, or at least "Yes, possibly." > Here is the code: > > #include <stdlib.h> > #include <string.h> > #include <stdio.h> > > struct entry { > char *key; > char *val; > struct entry *next; > }; > > > typedef struct entry entry_t; > > struct hashtable { > struct entry **table; > int size; > }; > > typedef struct hashtable hashtable_t; > > int hashfn(char *key, hashtable_t *htable) > { > int sum=0, i; > for (i=0; i<strlen(key);i++) > sum+=key[i]; > return sum%htable->size; If the key is long enough that the sum of its character codes exceeds the capacity of an `int' (which might be only 32767), you get undefined behavior. One of the more likely outcomes is that `sum' will become negative, in which case the returned value will be negative or zero. Since you use the returned value as an index into the array of list headers, a negative value would send you completely off the rails. Even though you say you're not interested in the quality of the hash function, this one is so appallingly bad that I feel duty- bound to opine that the word "quality" doesn't even apply. > } > > void insert(hashtable_t *htable, char *key, char *val) > { > int hash; > hash = hashfn(key,htable); > entry_t *entry = htable->table[hash]; > entry_t *last; > entry_t *new_entry; > while (entry != NULL) { > // replace > if (strcmp(entry->key,key) == 0) { > entry->val = val; > return; > } > last = entry; > entry = entry->next; > } > > new_entry = malloc(sizeof(entry_t)); > if (!new_entry) { > printf("could not allocate memory for new_entry\n"); > return; > } > > new_entry->key = key; > new_entry->val = val; > new_entry->next = NULL; > if (htable->table[hash] == NULL) > htable->table[hash] = new_entry; > else > { > // first > if (entry == htable->table[hash]) > htable->table[hash] = new_entry; > else > last->next = new_entry; > } This should work, but it seems to be an example of the "more complex than needed" code you set out to eliminate. Why do you need three cases (including one that never executes!) to add one item to a linked list? > } > > char *get(hashtable_t *htable, char *key) > { > int hash = hashfn(key, htable); > entry_t *entry = htable->table[hash]; > while (entry != NULL) { > if (strcmp(entry->key,key) == 0) > return entry->val; > entry = entry->next; > } > return NULL; > } > > > hashtable_t *ht_create(int size) > { > int i; > hashtable_t *htable = malloc(sizeof(hashtable_t)); > if (!htable) > return NULL; > htable->table = malloc(size * sizeof(entry_t *)); > if (!htable->table) > return NULL; Memory leak here. > for (i=0; i<size; i++) > htable->table[i] = NULL; > htable->size = size; > return htable; > } > > int main() > { > hashtable_t *htable = ht_create(10); > if (!htable) > return; Compiler should have squawked here. > insert(htable,"aba","111"); > insert(htable,"bbb","222"); > insert(htable,"aab","333"); > printf("get(aba)=%s\n", get(htable,"aba")); Good thing it didn't return NULL, eh? > printf("get(bbb)=%s\n", get(htable,"bbb")); > printf("get(aab)=%s\n", get(htable,"aab")); > } Overall impressions: It's missing features one expects in a hash table (deletion, iteration, resizing), and it's utterly devoid of defensive coding. Still, it'll probably "work" if you don't push it very hard. -- Eric Sosman esosman@comcast-dot-net.invalid |
Re: hash table - simple implementation
Ike Naar wrote:
)>> for (i=0; i<strlen(key);i++) ) ) is larger than two. The compiler is allowed to compute strlen(key) ) for every iteration of the loop, which would make the runtime ) quadratic. ITYM: required. The compiler is allowed to compute it only once, if it can prove that it is invariant (i.e. that nothing changes the string during the loop). SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT |
Re: hash table - simple implementation
On 2012-10-17, wkevin <wkevils@gmail.com> wrote:
> Hi, > I tried to look for as much as simple implementation of a hash table in "c". > > The performance and quality of the hash function do not interest me for this > task. If performance and quality do not interest you, then use an associative list which is exhaustively searched for items. Hash tables are used when performance matters. |
Re: hash table - simple implementation
On 2012-10-17, Willem <willem@turtle.stack.nl> wrote:
> Ike Naar wrote: > )>> for (i=0; i<strlen(key);i++) > ) > ) is larger than two. The compiler is allowed to compute strlen(key) > ) for every iteration of the loop, which would make the runtime > ) quadratic. > > ITYM: required. Given the context, no. > The compiler is allowed to compute it only once, if it can prove that it is > invariant (i.e. that nothing changes the string during the loop). (Nit: if it can prove that nothing changes the *length of the string* during the loop; one can change a string in ways that leave the length invariant). The loop shown in the context of this discussion does not modify the string, so, in this context, repeated computation of strlen is not required, but it is allowed. |
Re: hash table - simple implementation
Phil Carmody <thefatphil_demunged@yahoo.co.uk> writes:
> wkevin <wkevils@gmail.com> writes: >> int hashfn(char *key, hashtable_t *htable) >> { >> int sum=0, i; >> for (i=0; i<strlen(key);i++) > > I know you said that performance didn't interest you, but I'm always > hypersensitive to unnecessary use of strlen. There's no point iterating > down the string twice. You don't need the length, you know when to stop > already - when you reach the \0. > > for (i=0; key[i]; i++) > >> sum+=key[i]; >> return sum%htable->size; >> } And as Ike Naar points out, it doesn't just iterate over the string twice; it iterates over the string once for each iteration of the loop. (strlen() has to traverse the string to find the terminating '\0'.) Checking for the '\0' yourself is a good approach, though I'd write it a bit more explicitly: for (i = 0; key[i] != '\0'; i ++) ... Another option is to call strlen(), but only once: int sum = 0; size_t len = strlen(key); for (size_t i = 0; i < len; i ++) { sum += key[i]; } This does traverse the string twice, but it's not nearly as bad as recomputing strlen() in the for loop header. -- Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst> Will write code for food. "We must do something. This is something. Therefore, we must do this." -- Antony Jay and Jonathan Lynn, "Yes Minister" |
Re: hash table - simple implementation
On 10/17/2012 9:28 AM, Eric Sosman wrote:
> On 10/17/2012 5:02 AM, wkevin wrote: >>[...] >> int hashfn(char *key, hashtable_t *htable) >> { >> int sum=0, i; >> for (i=0; i<strlen(key);i++) >> sum+=key[i]; >> return sum%htable->size; > > If the key is long enough that the sum of its character codes > exceeds the capacity of an `int' [...] As Ben Bacarisse points out, even a short string will get you in trouble on some systems. For a concrete example, insert(htable, "¢", "cent sign"); has key string of only one character, but may quite possibly produce a negative hash index. If you're using the gcc compiler suite, build your code with the `-fsigned-char' option, run this insertion, sit back and watch the fun ... -- Eric Sosman esosman@comcast-dot-net.invalid |
| All times are GMT. The time now is 08:36 AM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.