Steve555 wrote:

> Thanks for the comprehensive reply. I've just googled as much as I

> could to try and understand before replying.

> I really need to learn about hash tables, before I get why I need six

> of them!

> (I did say I was new to search techniques )
Feel free to use std::hash_map and forget about the internal structure

of hash tables.

> The lookup table (library of sequences) never changes. The elements

> will never be bigger than 171, so yes I could pack 4 of them into 32

> bits.

> Are you, then, saying to store each sequence-of-4 in a single

> unsigned long, and do some kind of bit search?
typedef unsigned char MyEntry[4];

will most likely fit your needs.

If you encounter problems because array types are a bit special in C/C++

wrap it in a struct:

struct MyEntry

{ unsigned char Column[4];

};

Objects of type MyEntry are now 32 bits.

struct MyHashKey

{ unsigned char Key[2];

};

MyHashKey is intended to be used as hash key for two arbitrary columns.

Because MyHashKey is no integral type you must define a hash function to

use it in a hashtable.

int MyHashFunc(MyHashKey key)

{ // portable version:

return key[0] | (key[1] <<

;

// fast non-portable version:

// return *(short*)key;

}

As content I would recommend to use something like:

template <typename Compare>

typedef MyHashEntry std::set<MyEntry, Compare>;

It will contain the matching MyEntry lines of your data ordered by the

remaining columns, to speed up lookups with one or no wildcards. If you

have two wildcards enumerate over the entire sublist.

The Compare template parameter is needed because Content is sorted by

different columns depending on your first key columns. It is a functor

that provides the comperator for two MyEntries.

Now you need the comparer mentioned above.

template <int COL1, int COL2>

bool MyColumnComparer(MyEntry l, MyEntry r)

{ unsigned lkey = (l.Column[COL1] <<

| l.Column[COL2];

unsigned rkey = (r.Column[COL1] <<

| r.Column[COL2];

return lkey < rkey;

}

And now you can define your root data structures:

std::hash_set<MyHashKey,

MyHashEntry<MyColumnComparer<3,4> >,

MyHashFunc> EntriesBy1234;

The above hashtable instance is intended to store the Entries hashed by

the first two columns and for each distinct values of the first two

colums ordered by the third column. In the same way you could define

further entry points for your searches.

std::hash_set<MyHashKey,

MyHashEntry<MyColumnComparer<2,4> >,

MyHashFunc> EntriesBy1324;

std::hash_set<MyHashKey,

MyHashEntry<MyColumnComparer<2,3> >,

MyHashFunc> EntriesBy1423;

std::hash_set<MyHashKey,

MyHashEntry<MyColumnComparer<1,4> >,

MyHashFunc> EntriesBy2314;

std::hash_set<MyHashKey,

MyHashEntry<MyColumnComparer<1,3> >,

MyHashFunc> EntriesBy2413;

std::hash_set<MyHashKey,

MyHashEntry<MyColumnComparer<1,2> >,

MyHashFunc> EntriesBy3412;

For lookups with only one wildcard you can choose an arbitrary list with

the required key columns. There is some redundancy in this case.

Note, that all of that is only a rough idea. It may even contain

syntactical errors because I wrote down it quite quickly.

Also note that the data structure is only optimized for lookups not for

the population process. Changes are quite slow because they have to be

applied at different places simultaneously.

> Sorry, I don't know how that would work... how would I differentiate

> between the ones the exist in the library, and those that don't?
Sorry, I didn't understand the your last question.

Marcel