Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C++ (http://www.velocityreviews.com/forums/f39-c.html)
-   -   hash_map (http://www.velocityreviews.com/forums/t514419-hash_map.html)

 aaragon 06-14-2007 12:16 AM

hash_map

Hello everyone,

I have a VERY BIG set of double values that I want to map to intervals
so I thought a clever way to do this was using a hash table. Let's say
that I want to map all double values in the range 0-0.5 to a single
std::pair<double,double>.

This is what I've done so far:

#include <iostream>
#include <ext/hash_map>
#include <boost/functional/hash.hpp>

using namespace __gnu_cxx;
using namespace std;

struct eqstr
{
bool operator()(const double& o, const double& p) const
{
return (o == p);
}
};

namespace __gnu_cxx{

template<>
struct hash<double>
{
size_t operator()(double __x) const
{
boost::hash<double> double_hash;
return double_hash(__x);
}
};

}

void lookup(const hash_map<double, pair<double,double> , hash<double>,
eqstr>& Map,
const double number)
{
hash_map<double, pair<double,double> , hash<double>,
eqstr>::const_iterator it
= Map.find(number);
cout << number << ": "
<< (it != Map.end() ? "present" : "not present")
<< endl;
}

int main()
{
hash_map<double, pair<double,double> , hash<double>, eqstr> HashMap;
HashMap.insert(make_pair(0.1,make_pair(0.,0.5)));
lookup(HashMap,0.1);
lookup(HashMap,0.05);
}

aaragon@aaragon-laptop:~/Desktop\$ ./a.out
0.1: present
0.05: not present

Now, the thing is that I can't map the value of 0.05 to the same pair
because my hashing function doesn't to this. Any ideas???

Thank you,

a^2

 BobR 06-14-2007 03:27 AM

Re: hash_map

aaragon <alejandro.aragon@gmail.com> wrote in message ...
> Hello everyone,
>
> I have a VERY BIG set of double values that I want to map to intervals
> so I thought a clever way to do this was using a hash table. Let's say
> that I want to map all double values in the range 0-0.5 to a single
> std::pair<double,double>.
>
> This is what I've done so far:
>
> #include <iostream>
> #include <ext/hash_map>

> #include <boost/functional/hash.hpp>

>
> using namespace __gnu_cxx;

OUCH!!

> using namespace std;

ouch!!

>
> struct eqstr
> {
> bool operator()(const double& o, const double& p) const
> {
> return (o == p);

Comparing doubles for equality is most often bad [1]. Compare to a range.

if( ( o + 0.0001 > p ) && ( o - 0.0001 < p ) ){ return true;}
return false;

Output the numbers using a stream with 'fixed' and a big 'precision', and
see if they are (really/almost) equal (in the limited output (rounding
errors in stream output)).

Not sure this is your problem, but, it (==) should be fixed.

> }
> };
>
> namespace __gnu_cxx{

Are you a GNU systems/compiler developer?

>
> template<> struct hash<double> {
> size_t operator()(double __x) const {
> boost::hash<double> double_hash;
> return double_hash(__x);
> }
> };
>
> }
>
> void lookup(const hash_map<double, pair<double,double> , hash<double>,
> eqstr>& Map, const double number){
> hash_map<double, pair<double,double> , hash<double>,
> eqstr>::const_iterator it = Map.find(number);
> cout << number << ": "
> << (it != Map.end() ? "present" : "not present")
> << endl;
> }
>
> int main(){
> hash_map<double, pair<double,double> , hash<double>, eqstr> HashMap;
> HashMap.insert(make_pair(0.1,make_pair(0.,0.5)));
> lookup(HashMap,0.1);
> lookup(HashMap,0.05);
> }
>
> aaragon@aaragon-laptop:~/Desktop\$ ./a.out
> 0.1: present
> 0.05: not present
>
> Now, the thing is that I can't map the value of 0.05 to the same pair
> because my hashing function doesn't to this. Any ideas???
> Thank you,
> a^2

[1]
{
std::ostringstream sos;
sos.setf( std::ios_base::fixed, std::ios::floatfield );
sos.precision( 40 );

double num( 0.1 );

sos<<"double num( 0.1 )="<<num<<std::endl;
std::cout<<sos.str()<<std::endl;
}
// out: double num( 0.1 )=0.10000000000000001

--
Bob R
POVrookie

 =?iso-8859-1?q?Erik_Wikstr=F6m?= 06-14-2007 06:34 AM

Re: hash_map

On 14 Juni, 02:16, aaragon <alejandro.ara...@gmail.com> wrote:
> Hello everyone,
>
> I have a VERY BIG set of double values that I want to map to intervals
> so I thought a clever way to do this was using a hash table. Let's say
> that I want to map all double values in the range 0-0.5 to a single
> std::pair<double,double>.

[snip]

> Now, the thing is that I can't map the value of 0.05 to the same pair
> because my hashing function doesn't to this. Any ideas???

I don't know how big your VERY BIG set of doubles is, but unless you
have performed some profiling using "standard solutions" I'd suggest
you do that before trying to optimize. In this case that means to use
std::map. Sure a good hash table is O(1), but if you don't have a good
hash-function you might end up with O(n) instead, a map is guaranteed
to be logarithmic, always.

There's of course the problem with comparing doubles like Bob R
pointed out, but a custom comparison functor would take care of that.

--
Erik Wikström

 James Kanze 06-14-2007 08:41 AM

Re: hash_map

On Jun 14, 8:34 am, Erik Wikström <eri...@student.chalmers.se> wrote:
> On 14 Juni, 02:16, aaragon <alejandro.ara...@gmail.com> wrote:
> > I have a VERY BIG set of double values that I want to map to intervals
> > so I thought a clever way to do this was using a hash table. Let's say
> > that I want to map all double values in the range 0-0.5 to a single
> > std::pair<double,double>.

> [snip]

> > Now, the thing is that I can't map the value of 0.05 to the same pair
> > because my hashing function doesn't to this. Any ideas???

> I don't know how big your VERY BIG set of doubles is, but unless you
> have performed some profiling using "standard solutions" I'd suggest
> you do that before trying to optimize. In this case that means to use
> std::map. Sure a good hash table is O(1), but if you don't have a good
> hash-function you might end up with O(n) instead, a map is guaranteed
> to be logarithmic, always.

> There's of course the problem with comparing doubles like Bob R
> pointed out, but a custom comparison functor would take care of that.

There are two problems: finding a hash function, and comparing.
For the first, I've yet to find a good solution to generate a
hash code from a double. It's far from trivial. For the
second... there's no guarantee that std::map won't have it as
well. There are two possible problems:

-- He has two double values which really aren't equal. In such
a case, std::map will not find the first using the second as
a key. If they're not equal, they're not equal.

-- He is calculating two values dynamically which should be
equal; the comparison function does something like:

bool operator()( Obj const& lhs, Obj const& rhs ) const
{
return lhs.f() < rhs.f() ;
}

Under certain conditions, with certain compilers, on certain
systems, this results in an unstable comparison functions,
because the intermediate results may be in extended
precision, and because the compiler spills one to memory
(thus reducing it to double precision), but keeps the other
in registers (in extended precision).

I'm very sceptical of the possibilities of using double for an
index, unless the doubles have a known source which ensures an
application correct rounding before they are used as an index.

--
James Kanze (GABI Software, from CAI) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

 James Kanze 06-14-2007 09:12 AM

Re: hash_map

On Jun 14, 5:27 am, "BobR" <removeBadB...@worldnet.att.net> wrote:
> aaragon <alejandro.ara...@gmail.com> wrote in message ...

[...]
> > struct eqstr
> > {
> > bool operator()(const double& o, const double& p) const
> > {
> > return (o == p);

> Comparing doubles for equality is most often bad [1].

Not really. Comparing doubles when you don't know what you're

> Compare to a range.

Rarely solves anything.

> if( ( o + 0.0001 > p ) && ( o - 0.0001 < p ) ){ return true;}
> return false;

Which isn't a legal comparison function for anything, since it
isn't transitive. To use double as a key in a hash table, you
need an equivalence relationship and a hash function which is
compatible with that equivalence relationship: a == b implies
hash(a) == hash(b).

> Output the numbers using a stream with 'fixed' and a big
> 'precision', and see if they are (really/almost) equal (in the
> limited output (rounding errors in stream output)).

And check how the hash function works. I've yet to find a good
portable way to hash doubles. (Non portably, you could probably
extract the exponent, sign and the mantissa as integer values,
and hash them as integer values. With special handling for 0,
so that +0.0 and -0.0 hash to the same value. And special
handling for infinities and NaN's---in some applications,
assertion failure would be an acceptable special handling in
such cases.)

> Not sure this is your problem, but, it (==) should be fixed.

> > }
> > };

> > namespace __gnu_cxx{

> Are you a GNU systems/compiler developer?

GNU currently provides a hash table similar to the one that will
be in the next version of the standard, and similar to the one
provided by most other compiler vendors. Since it isn't
standard (yet), they place it in a special namespace, to
indicate that it is a compiler specific extension. Obviously,
the name of this namespace must be in the implementation's
namespace. Thus the __.

The rules for using this namespace are exactly the same as for
using std. GNU authorizes users to explicitly specialize
templates which it contains on user defined types.

> > template<> struct hash<double> {
> > size_t operator()(double __x) const {
> > boost::hash<double> double_hash;
> > return double_hash(__x);
> > }
> > };
> > }

The question here is what double_hash does. (I'm not sure, but
Boost seems to adopt a strategy roughly like what I described
before, using frexp and ldexp to get at the integral values, and
ignoring the sign (to avoid problems due to +/- 0.0). So it
should be correct, except maybe for infinity and NaNs, but it
probably also is slow enough that an std::map would be faster.)

> > void lookup(const hash_map<double, pair<double,double> , hash<double>,
> > eqstr>& Map, const double number){
> > hash_map<double, pair<double,double> , hash<double>,
> > eqstr>::const_iterator it = Map.find(number);
> > cout << number << ": "
> > << (it != Map.end() ? "present" : "not present")
> > << endl;
> > }

> > int main(){
> > hash_map<double, pair<double,double> , hash<double>, eqstr> HashMap;
> > HashMap.insert(make_pair(0.1,make_pair(0.,0.5)));
> > lookup(HashMap,0.1);
> > lookup(HashMap,0.05);
> > }

> > aaragon@aaragon-laptop:~/Desktop\$ ./a.out
> > 0.1: present
> > 0.05: not present

> > Now, the thing is that I can't map the value of 0.05 to the
> > same pair because my hashing function doesn't to this. Any
> > ideas???

I'm not sure what you're trying to do, but maybe rounding or
whatever should be done on the values before comparing or
calling the boost hash function. Or just comparing, using
std::map. Basically, etablish a canonical representation for
each equivalence category, and use it.

--
James Kanze (GABI Software, from CAI) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

 BobR 06-14-2007 05:47 PM

Re: hash_map

James Kanze wrote in message...
On Jun 14, 5:27 am, "BobR" wrote:

/* """ quote

> > namespace __gnu_cxx{

> Are you a GNU systems/compiler developer?

GNU currently provides a hash table similar to the one that will
be in the next version of the standard, and similar to the one
provided by most other compiler vendors. Since it isn't
standard (yet), they place it in a special namespace, to
indicate that it is a compiler specific extension. Obviously,
the name of this namespace must be in the implementation's
namespace. Thus the __.
""" */ unquote

I remember GNU haveing 'hashtable', then it disappeared (into the 'backward'
directory) due to standards. So, now it's (with revisions, I assume) comeing
back? Go figure. <G>

Thanks for the other info.
--
Bob R
POVrookie

 aaragon 06-14-2007 05:55 PM

Re: hash_map

On Jun 14, 4:12 am, James Kanze <james.ka...@gmail.com> wrote:
> On Jun 14, 5:27 am, "BobR" <removeBadB...@worldnet.att.net> wrote:
>
> > aaragon <alejandro.ara...@gmail.com> wrote in message ...

>
> [...]
>
> > > struct eqstr
> > > {
> > > bool operator()(const double& o, const double& p) const
> > > {
> > > return (o == p);

> > Comparing doubles for equality is most often bad [1].

>
> Not really. Comparing doubles when you don't know what you're
>
> > Compare to a range.

>
> Rarely solves anything.
>
> > if( ( o + 0.0001 > p ) && ( o - 0.0001 < p ) ){ return true;}
> > return false;

>
> Which isn't a legal comparison function for anything, since it
> isn't transitive. To use double as a key in a hash table, you
> need an equivalence relationship and a hash function which is
> compatible with that equivalence relationship: a == b implies
> hash(a) == hash(b).
>
> > Output the numbers using a stream with 'fixed' and a big
> > 'precision', and see if they are (really/almost) equal (in the
> > limited output (rounding errors in stream output)).

>
> And check how the hash function works. I've yet to find a good
> portable way to hash doubles. (Non portably, you could probably
> extract the exponent, sign and the mantissa as integer values,
> and hash them as integer values. With special handling for 0,
> so that +0.0 and -0.0 hash to the same value. And special
> handling for infinities and NaN's---in some applications,
> assertion failure would be an acceptable special handling in
> such cases.)
>
> > Not sure this is your problem, but, it (==) should be fixed.
> > > }
> > > };
> > > namespace __gnu_cxx{

> > Are you a GNU systems/compiler developer?

>
> GNU currently provides a hash table similar to the one that will
> be in the next version of the standard, and similar to the one
> provided by most other compiler vendors. Since it isn't
> standard (yet), they place it in a special namespace, to
> indicate that it is a compiler specific extension. Obviously,
> the name of this namespace must be in the implementation's
> namespace. Thus the __.
>
> The rules for using this namespace are exactly the same as for
> using std. GNU authorizes users to explicitly specialize
> templates which it contains on user defined types.
>
> > > template<> struct hash<double> {
> > > size_t operator()(double __x) const {
> > > boost::hash<double> double_hash;
> > > return double_hash(__x);
> > > }
> > > };
> > > }

>
> The question here is what double_hash does. (I'm not sure, but
> Boost seems to adopt a strategy roughly like what I described
> before, using frexp and ldexp to get at the integral values, and
> ignoring the sign (to avoid problems due to +/- 0.0). So it
> should be correct, except maybe for infinity and NaNs, but it
> probably also is slow enough that an std::map would be faster.)
>
>
>
> > > void lookup(const hash_map<double, pair<double,double> , hash<double>,
> > > eqstr>& Map, const double number){
> > > hash_map<double, pair<double,double> , hash<double>,
> > > eqstr>::const_iterator it = Map.find(number);
> > > cout << number << ": "
> > > << (it != Map.end() ? "present" : "not present")
> > > << endl;
> > > }
> > > int main(){
> > > hash_map<double, pair<double,double> , hash<double>, eqstr> HashMap;
> > > HashMap.insert(make_pair(0.1,make_pair(0.,0.5)));
> > > lookup(HashMap,0.1);
> > > lookup(HashMap,0.05);
> > > }
> > > aaragon@aaragon-laptop:~/Desktop\$ ./a.out
> > > 0.1: present
> > > 0.05: not present
> > > Now, the thing is that I can't map the value of 0.05 to the
> > > same pair because my hashing function doesn't to this. Any
> > > ideas???

>
> I'm not sure what you're trying to do, but maybe rounding or
> whatever should be done on the values before comparing or
> calling the boost hash function. Or just comparing, using
> std::map. Basically, etablish a canonical representation for
> each equivalence category, and use it.
>
> --
> James Kanze (GABI Software, from CAI) email:james.ka...@gmail.com
> Conseils en informatique orientée objet/
> Beratung in objektorientierter Datenverarbeitung
> 9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Thanks for all your suggestions. It seems that the problem I'm dealing
with is harder than I thought. The problem is as follows:
I have a series of intervals that are obtained at run time:

|_________________|_______________________________ |
__________________|_____________|
0.1 0.5 1.3
1.5 1.65

Then, I have large data set of doubles that I need to map into each of
these intervals. Therefore, for example the double 0.56 would enter
the interval (0.5,1.3] (note that the interval is open on the left). I
could just use a sign approach:
double testSign = (0.56 - Imin)*(0.56*Imax);
where Imim and Imax correspond to the extreme values of the interval.
Of course, testSign would only be negative in the right interval so I
could use an if statement to increment a counter for that interval
whenever this happens;
if(testSign < 0)
incrementCounter(interval(i));

However, this would take a lot of time if I have a very BIG data set
(which is my case). So I thought that using a hashing function that
maps any double to the correct interval would be a very clever way to
do this in constant time. So basically, what I need is a function that
maps a double within an interval to a unique value so I can use the
hash_map provided by the __gnu_cxx namespace.

 James Kanze 06-15-2007 08:58 AM

Re: hash_map

On Jun 14, 7:47 pm, "BobR" <removeBadB...@worldnet.att.net> wrote:
> James Kanze wrote in message...

> On Jun 14, 5:27 am, "BobR" wrote:

> /* """ quote

> > > namespace __gnu_cxx{

> > Are you a GNU systems/compiler developer?

> GNU currently provides a hash table similar to the one that will
> be in the next version of the standard, and similar to the one
> provided by most other compiler vendors. Since it isn't
> standard (yet), they place it in a special namespace, to
> indicate that it is a compiler specific extension. Obviously,
> the name of this namespace must be in the implementation's
> namespace. Thus the __.
> """ */ unquote

You're using OE, too:-)?

(This didn't used to be a problem. Since I'm sure that OE
hasn't changed, it must be something either at Google or in our
firewall that has changed. I'm still looking.)

> I remember GNU haveing 'hashtable', then it disappeared (into
> the 'backward' directory) due to standards. So, now it's (with
> revisions, I assume) comeing back? Go figure. <G>

There was a proposal for a hash table right at the end of the
last standard's effort. It was too late to be really
considered, but many (most) implementations added it anyway.
Regretfully, they added it each with their own subtle variations
and differences.

It was quickly realized that adding things like this to the
standard namespace was not a good idea. Which left the library
writers in a crunch; they could move it to a private namespace
where it belonged, but this would break user code. G++ chose to
break user code, VC++ no.

When the standards committee got back to the question, they were
variant already in std::, and that the specifications for that
hash table varied. Rather than create even more problems with
existing user code, they chose to rename it unordered_[set,map].

The last time I looked at it, the specification used separate
template arguments for the equivalence function and the hash
function, which seems like a sure recepe for incompatible
versions to me. But nothing is final yet.

--
James Kanze (GABI Software, from CAI) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

 James Kanze 06-15-2007 09:03 AM

Re: hash_map

On Jun 14, 7:55 pm, aaragon <alejandro.ara...@gmail.com> wrote:

> Thanks for all your suggestions. It seems that the problem I'm
> dealing with is harder than I thought.

That's a frequent case with floating point:-).

> The problem is as follows: I have a series of intervals that
> are obtained at run time:

> |_________________|_______________________________ |__________________|_____________|
> 0.1 0.5 1.3 1.5 1.65

> Then, I have large data set of doubles that I need to map into each of
> these intervals. Therefore, for example the double 0.56 would enter
> the interval (0.5,1.3] (note that the interval is open on the left). I
> could just use a sign approach:
> double testSign = (0.56 - Imin)*(0.56*Imax);
> where Imim and Imax correspond to the extreme values of the interval.
> Of course, testSign would only be negative in the right interval so I
> could use an if statement to increment a counter for that interval
> whenever this happens;
> if(testSign < 0)
> incrementCounter(interval(i));

> However, this would take a lot of time if I have a very BIG data set
> (which is my case). So I thought that using a hashing function that
> maps any double to the correct interval would be a very clever way to
> do this in constant time. So basically, what I need is a function that
> maps a double within an interval to a unique value so I can use the
> hash_map provided by the __gnu_cxx namespace.

To calculate the hash code, however, you need to normalize the
value in the interval. Hash codes require a very strong concept
of equivalence.

This can be done very easily with std::map, which uses an
ordering relationship. And while std::map is O(lg n), rather
than O(1), it typically takes a very big set for the difference
to be measurable.

--
James Kanze (GABI Software, from CAI) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

 aaragon 06-15-2007 06:08 PM

Re: hash_map

On Jun 15, 4:03 am, James Kanze <james.ka...@gmail.com> wrote:
> On Jun 14, 7:55 pm, aaragon <alejandro.ara...@gmail.com> wrote:
>
> > Thanks for all your suggestions. It seems that the problem I'm
> > dealing with is harder than I thought.

>
> That's a frequent case with floating point:-).
>
>
>
> > The problem is as follows: I have a series of intervals that
> > are obtained at run time:
> > |_________________|_______________________________ |__________________|_____________|
> > 0.1 0.5 1.3 1.5 1.65
> > Then, I have large data set of doubles that I need to map into each of
> > these intervals. Therefore, for example the double 0.56 would enter
> > the interval (0.5,1.3] (note that the interval is open on the left). I
> > could just use a sign approach:
> > double testSign = (0.56 - Imin)*(0.56*Imax);
> > where Imim and Imax correspond to the extreme values of the interval.
> > Of course, testSign would only be negative in the right interval so I
> > could use an if statement to increment a counter for that interval
> > whenever this happens;
> > if(testSign < 0)
> > incrementCounter(interval(i));
> > However, this would take a lot of time if I have a very BIG data set
> > (which is my case). So I thought that using a hashing function that
> > maps any double to the correct interval would be a very clever way to
> > do this in constant time. So basically, what I need is a function that
> > maps a double within an interval to a unique value so I can use the
> > hash_map provided by the __gnu_cxx namespace.

>
> To calculate the hash code, however, you need to normalize the
> value in the interval. Hash codes require a very strong concept
> of equivalence.
>
> This can be done very easily with std::map, which uses an
> ordering relationship. And while std::map is O(lg n), rather
> than O(1), it typically takes a very big set for the difference
> to be measurable.

Well, with the map you still need a one to one correspondence between
keys and values. The different running time complexity relies in the
different data structures that are used by the map and the hash table
but they need the same mapping. Therefore, I just need to find a
function that maps several double values to a single std::pair (like a
mathematical function pair<double,double> = f(double)). Once I have
this, I could implement the mapping using either the std::map or the
hash table, it doesn't matter.

>
> --
> James Kanze (GABI Software, from CAI) email:james.ka...@gmail.com
> Conseils en informatique orientée objet/
> Beratung in objektorientierter Datenverarbeitung
> 9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

All times are GMT. The time now is 05:37 PM.