![]() |
hash_map
Hello everyone,
I have a VERY BIG set of double values that I want to map to intervals so I thought a clever way to do this was using a hash table. Let's say that I want to map all double values in the range 0-0.5 to a single std::pair<double,double>. This is what I've done so far: #include <iostream> #include <ext/hash_map> #include <boost/functional/hash.hpp> using namespace __gnu_cxx; using namespace std; struct eqstr { bool operator()(const double& o, const double& p) const { return (o == p); } }; namespace __gnu_cxx{ template<> struct hash<double> { size_t operator()(double __x) const { boost::hash<double> double_hash; return double_hash(__x); } }; } void lookup(const hash_map<double, pair<double,double> , hash<double>, eqstr>& Map, const double number) { hash_map<double, pair<double,double> , hash<double>, eqstr>::const_iterator it = Map.find(number); cout << number << ": " << (it != Map.end() ? "present" : "not present") << endl; } int main() { hash_map<double, pair<double,double> , hash<double>, eqstr> HashMap; HashMap.insert(make_pair(0.1,make_pair(0.,0.5))); lookup(HashMap,0.1); lookup(HashMap,0.05); } aaragon@aaragon-laptop:~/Desktop$ ./a.out 0.1: present 0.05: not present Now, the thing is that I can't map the value of 0.05 to the same pair because my hashing function doesn't to this. Any ideas??? Thank you, a^2 |
Re: hash_map
aaragon <alejandro.aragon@gmail.com> wrote in message ... > Hello everyone, > > I have a VERY BIG set of double values that I want to map to intervals > so I thought a clever way to do this was using a hash table. Let's say > that I want to map all double values in the range 0-0.5 to a single > std::pair<double,double>. > > This is what I've done so far: > > #include <iostream> > #include <ext/hash_map> non-standard header. (AFAIK) > #include <boost/functional/hash.hpp> non-standard header. > > using namespace __gnu_cxx; OUCH!! > using namespace std; ouch!! > > struct eqstr > { > bool operator()(const double& o, const double& p) const > { > return (o == p); Comparing doubles for equality is most often bad [1]. Compare to a range. if( ( o + 0.0001 > p ) && ( o - 0.0001 < p ) ){ return true;} return false; Output the numbers using a stream with 'fixed' and a big 'precision', and see if they are (really/almost) equal (in the limited output (rounding errors in stream output)). Not sure this is your problem, but, it (==) should be fixed. > } > }; > > namespace __gnu_cxx{ Are you a GNU systems/compiler developer? > > template<> struct hash<double> { > size_t operator()(double __x) const { > boost::hash<double> double_hash; > return double_hash(__x); > } > }; > > } > > void lookup(const hash_map<double, pair<double,double> , hash<double>, > eqstr>& Map, const double number){ > hash_map<double, pair<double,double> , hash<double>, > eqstr>::const_iterator it = Map.find(number); > cout << number << ": " > << (it != Map.end() ? "present" : "not present") > << endl; > } > > int main(){ > hash_map<double, pair<double,double> , hash<double>, eqstr> HashMap; > HashMap.insert(make_pair(0.1,make_pair(0.,0.5))); > lookup(HashMap,0.1); > lookup(HashMap,0.05); > } > > aaragon@aaragon-laptop:~/Desktop$ ./a.out > 0.1: present > 0.05: not present > > Now, the thing is that I can't map the value of 0.05 to the same pair > because my hashing function doesn't to this. Any ideas??? > Thank you, > a^2 [1] { std::ostringstream sos; sos.setf( std::ios_base::fixed, std::ios::floatfield ); sos.precision( 40 ); double num( 0.1 ); sos<<"double num( 0.1 )="<<num<<std::endl; std::cout<<sos.str()<<std::endl; } // out: double num( 0.1 )=0.10000000000000001 -- Bob R POVrookie |
Re: hash_map
On 14 Juni, 02:16, aaragon <alejandro.ara...@gmail.com> wrote:
> Hello everyone, > > I have a VERY BIG set of double values that I want to map to intervals > so I thought a clever way to do this was using a hash table. Let's say > that I want to map all double values in the range 0-0.5 to a single > std::pair<double,double>. [snip] > Now, the thing is that I can't map the value of 0.05 to the same pair > because my hashing function doesn't to this. Any ideas??? I don't know how big your VERY BIG set of doubles is, but unless you have performed some profiling using "standard solutions" I'd suggest you do that before trying to optimize. In this case that means to use std::map. Sure a good hash table is O(1), but if you don't have a good hash-function you might end up with O(n) instead, a map is guaranteed to be logarithmic, always. There's of course the problem with comparing doubles like Bob R pointed out, but a custom comparison functor would take care of that. -- Erik Wikström |
Re: hash_map
On Jun 14, 8:34 am, Erik Wikström <eri...@student.chalmers.se> wrote:
> On 14 Juni, 02:16, aaragon <alejandro.ara...@gmail.com> wrote: > > I have a VERY BIG set of double values that I want to map to intervals > > so I thought a clever way to do this was using a hash table. Let's say > > that I want to map all double values in the range 0-0.5 to a single > > std::pair<double,double>. > [snip] > > Now, the thing is that I can't map the value of 0.05 to the same pair > > because my hashing function doesn't to this. Any ideas??? > I don't know how big your VERY BIG set of doubles is, but unless you > have performed some profiling using "standard solutions" I'd suggest > you do that before trying to optimize. In this case that means to use > std::map. Sure a good hash table is O(1), but if you don't have a good > hash-function you might end up with O(n) instead, a map is guaranteed > to be logarithmic, always. > There's of course the problem with comparing doubles like Bob R > pointed out, but a custom comparison functor would take care of that. There are two problems: finding a hash function, and comparing. For the first, I've yet to find a good solution to generate a hash code from a double. It's far from trivial. For the second... there's no guarantee that std::map won't have it as well. There are two possible problems: -- He has two double values which really aren't equal. In such a case, std::map will not find the first using the second as a key. If they're not equal, they're not equal. -- He is calculating two values dynamically which should be equal; the comparison function does something like: bool operator()( Obj const& lhs, Obj const& rhs ) const { return lhs.f() < rhs.f() ; } Under certain conditions, with certain compilers, on certain systems, this results in an unstable comparison functions, because the intermediate results may be in extended precision, and because the compiler spills one to memory (thus reducing it to double precision), but keeps the other in registers (in extended precision). I'm very sceptical of the possibilities of using double for an index, unless the doubles have a known source which ensures an application correct rounding before they are used as an index. -- James Kanze (GABI Software, from CAI) email:james.kanze@gmail.com Conseils en informatique orientée objet/ Beratung in objektorientierter Datenverarbeitung 9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34 |
Re: hash_map
On Jun 14, 5:27 am, "BobR" <removeBadB...@worldnet.att.net> wrote:
> aaragon <alejandro.ara...@gmail.com> wrote in message ... [...] > > struct eqstr > > { > > bool operator()(const double& o, const double& p) const > > { > > return (o == p); > Comparing doubles for equality is most often bad [1]. Not really. Comparing doubles when you don't know what you're doing is usually bad. > Compare to a range. Rarely solves anything. > if( ( o + 0.0001 > p ) && ( o - 0.0001 < p ) ){ return true;} > return false; Which isn't a legal comparison function for anything, since it isn't transitive. To use double as a key in a hash table, you need an equivalence relationship and a hash function which is compatible with that equivalence relationship: a == b implies hash(a) == hash(b). > Output the numbers using a stream with 'fixed' and a big > 'precision', and see if they are (really/almost) equal (in the > limited output (rounding errors in stream output)). And check how the hash function works. I've yet to find a good portable way to hash doubles. (Non portably, you could probably extract the exponent, sign and the mantissa as integer values, and hash them as integer values. With special handling for 0, so that +0.0 and -0.0 hash to the same value. And special handling for infinities and NaN's---in some applications, assertion failure would be an acceptable special handling in such cases.) > Not sure this is your problem, but, it (==) should be fixed. > > } > > }; > > namespace __gnu_cxx{ > Are you a GNU systems/compiler developer? GNU currently provides a hash table similar to the one that will be in the next version of the standard, and similar to the one provided by most other compiler vendors. Since it isn't standard (yet), they place it in a special namespace, to indicate that it is a compiler specific extension. Obviously, the name of this namespace must be in the implementation's namespace. Thus the __. The rules for using this namespace are exactly the same as for using std. GNU authorizes users to explicitly specialize templates which it contains on user defined types. > > template<> struct hash<double> { > > size_t operator()(double __x) const { > > boost::hash<double> double_hash; > > return double_hash(__x); > > } > > }; > > } The question here is what double_hash does. (I'm not sure, but Boost seems to adopt a strategy roughly like what I described before, using frexp and ldexp to get at the integral values, and ignoring the sign (to avoid problems due to +/- 0.0). So it should be correct, except maybe for infinity and NaNs, but it probably also is slow enough that an std::map would be faster.) > > void lookup(const hash_map<double, pair<double,double> , hash<double>, > > eqstr>& Map, const double number){ > > hash_map<double, pair<double,double> , hash<double>, > > eqstr>::const_iterator it = Map.find(number); > > cout << number << ": " > > << (it != Map.end() ? "present" : "not present") > > << endl; > > } > > int main(){ > > hash_map<double, pair<double,double> , hash<double>, eqstr> HashMap; > > HashMap.insert(make_pair(0.1,make_pair(0.,0.5))); > > lookup(HashMap,0.1); > > lookup(HashMap,0.05); > > } > > aaragon@aaragon-laptop:~/Desktop$ ./a.out > > 0.1: present > > 0.05: not present > > Now, the thing is that I can't map the value of 0.05 to the > > same pair because my hashing function doesn't to this. Any > > ideas??? I'm not sure what you're trying to do, but maybe rounding or whatever should be done on the values before comparing or calling the boost hash function. Or just comparing, using std::map. Basically, etablish a canonical representation for each equivalence category, and use it. -- James Kanze (GABI Software, from CAI) email:james.kanze@gmail.com Conseils en informatique orientée objet/ Beratung in objektorientierter Datenverarbeitung 9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34 |
Re: hash_map
James Kanze wrote in message... On Jun 14, 5:27 am, "BobR" wrote: /* """ quote > > namespace __gnu_cxx{ > Are you a GNU systems/compiler developer? GNU currently provides a hash table similar to the one that will be in the next version of the standard, and similar to the one provided by most other compiler vendors. Since it isn't standard (yet), they place it in a special namespace, to indicate that it is a compiler specific extension. Obviously, the name of this namespace must be in the implementation's namespace. Thus the __. """ */ unquote I remember GNU haveing 'hashtable', then it disappeared (into the 'backward' directory) due to standards. So, now it's (with revisions, I assume) comeing back? Go figure. <G> Thanks for the other info. -- Bob R POVrookie |
Re: hash_map
On Jun 14, 4:12 am, James Kanze <james.ka...@gmail.com> wrote:
> On Jun 14, 5:27 am, "BobR" <removeBadB...@worldnet.att.net> wrote: > > > aaragon <alejandro.ara...@gmail.com> wrote in message ... > > [...] > > > > struct eqstr > > > { > > > bool operator()(const double& o, const double& p) const > > > { > > > return (o == p); > > Comparing doubles for equality is most often bad [1]. > > Not really. Comparing doubles when you don't know what you're > doing is usually bad. > > > Compare to a range. > > Rarely solves anything. > > > if( ( o + 0.0001 > p ) && ( o - 0.0001 < p ) ){ return true;} > > return false; > > Which isn't a legal comparison function for anything, since it > isn't transitive. To use double as a key in a hash table, you > need an equivalence relationship and a hash function which is > compatible with that equivalence relationship: a == b implies > hash(a) == hash(b). > > > Output the numbers using a stream with 'fixed' and a big > > 'precision', and see if they are (really/almost) equal (in the > > limited output (rounding errors in stream output)). > > And check how the hash function works. I've yet to find a good > portable way to hash doubles. (Non portably, you could probably > extract the exponent, sign and the mantissa as integer values, > and hash them as integer values. With special handling for 0, > so that +0.0 and -0.0 hash to the same value. And special > handling for infinities and NaN's---in some applications, > assertion failure would be an acceptable special handling in > such cases.) > > > Not sure this is your problem, but, it (==) should be fixed. > > > } > > > }; > > > namespace __gnu_cxx{ > > Are you a GNU systems/compiler developer? > > GNU currently provides a hash table similar to the one that will > be in the next version of the standard, and similar to the one > provided by most other compiler vendors. Since it isn't > standard (yet), they place it in a special namespace, to > indicate that it is a compiler specific extension. Obviously, > the name of this namespace must be in the implementation's > namespace. Thus the __. > > The rules for using this namespace are exactly the same as for > using std. GNU authorizes users to explicitly specialize > templates which it contains on user defined types. > > > > template<> struct hash<double> { > > > size_t operator()(double __x) const { > > > boost::hash<double> double_hash; > > > return double_hash(__x); > > > } > > > }; > > > } > > The question here is what double_hash does. (I'm not sure, but > Boost seems to adopt a strategy roughly like what I described > before, using frexp and ldexp to get at the integral values, and > ignoring the sign (to avoid problems due to +/- 0.0). So it > should be correct, except maybe for infinity and NaNs, but it > probably also is slow enough that an std::map would be faster.) > > > > > > void lookup(const hash_map<double, pair<double,double> , hash<double>, > > > eqstr>& Map, const double number){ > > > hash_map<double, pair<double,double> , hash<double>, > > > eqstr>::const_iterator it = Map.find(number); > > > cout << number << ": " > > > << (it != Map.end() ? "present" : "not present") > > > << endl; > > > } > > > int main(){ > > > hash_map<double, pair<double,double> , hash<double>, eqstr> HashMap; > > > HashMap.insert(make_pair(0.1,make_pair(0.,0.5))); > > > lookup(HashMap,0.1); > > > lookup(HashMap,0.05); > > > } > > > aaragon@aaragon-laptop:~/Desktop$ ./a.out > > > 0.1: present > > > 0.05: not present > > > Now, the thing is that I can't map the value of 0.05 to the > > > same pair because my hashing function doesn't to this. Any > > > ideas??? > > I'm not sure what you're trying to do, but maybe rounding or > whatever should be done on the values before comparing or > calling the boost hash function. Or just comparing, using > std::map. Basically, etablish a canonical representation for > each equivalence category, and use it. > > -- > James Kanze (GABI Software, from CAI) email:james.ka...@gmail.com > Conseils en informatique orientée objet/ > Beratung in objektorientierter Datenverarbeitung > 9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34 Thanks for all your suggestions. It seems that the problem I'm dealing with is harder than I thought. The problem is as follows: I have a series of intervals that are obtained at run time: |_________________|_______________________________ | __________________|_____________| 0.1 0.5 1.3 1.5 1.65 Then, I have large data set of doubles that I need to map into each of these intervals. Therefore, for example the double 0.56 would enter the interval (0.5,1.3] (note that the interval is open on the left). I could just use a sign approach: double testSign = (0.56 - Imin)*(0.56*Imax); where Imim and Imax correspond to the extreme values of the interval. Of course, testSign would only be negative in the right interval so I could use an if statement to increment a counter for that interval whenever this happens; if(testSign < 0) incrementCounter(interval(i)); However, this would take a lot of time if I have a very BIG data set (which is my case). So I thought that using a hashing function that maps any double to the correct interval would be a very clever way to do this in constant time. So basically, what I need is a function that maps a double within an interval to a unique value so I can use the hash_map provided by the __gnu_cxx namespace. |
Re: hash_map
On Jun 14, 7:47 pm, "BobR" <removeBadB...@worldnet.att.net> wrote:
> James Kanze wrote in message... > On Jun 14, 5:27 am, "BobR" wrote: > /* """ quote > > > namespace __gnu_cxx{ > > Are you a GNU systems/compiler developer? > GNU currently provides a hash table similar to the one that will > be in the next version of the standard, and similar to the one > provided by most other compiler vendors. Since it isn't > standard (yet), they place it in a special namespace, to > indicate that it is a compiler specific extension. Obviously, > the name of this namespace must be in the implementation's > namespace. Thus the __. > """ */ unquote You're using OE, too:-)? (This didn't used to be a problem. Since I'm sure that OE hasn't changed, it must be something either at Google or in our firewall that has changed. I'm still looking.) > I remember GNU haveing 'hashtable', then it disappeared (into > the 'backward' directory) due to standards. So, now it's (with > revisions, I assume) comeing back? Go figure. <G> There was a proposal for a hash table right at the end of the last standard's effort. It was too late to be really considered, but many (most) implementations added it anyway. Regretfully, they added it each with their own subtle variations and differences. It was quickly realized that adding things like this to the standard namespace was not a good idea. Which left the library writers in a crunch; they could move it to a private namespace where it belonged, but this would break user code. G++ chose to break user code, VC++ no. When the standards committee got back to the question, they were faced with the fact that many implementations had, or had had, a variant already in std::, and that the specifications for that hash table varied. Rather than create even more problems with existing user code, they chose to rename it unordered_[set,map]. The last time I looked at it, the specification used separate template arguments for the equivalence function and the hash function, which seems like a sure recepe for incompatible versions to me. But nothing is final yet. -- James Kanze (GABI Software, from CAI) email:james.kanze@gmail.com Conseils en informatique orientée objet/ Beratung in objektorientierter Datenverarbeitung 9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34 |
Re: hash_map
On Jun 14, 7:55 pm, aaragon <alejandro.ara...@gmail.com> wrote:
> Thanks for all your suggestions. It seems that the problem I'm > dealing with is harder than I thought. That's a frequent case with floating point:-). > The problem is as follows: I have a series of intervals that > are obtained at run time: > |_________________|_______________________________ |__________________|_____________| > 0.1 0.5 1.3 1.5 1.65 > Then, I have large data set of doubles that I need to map into each of > these intervals. Therefore, for example the double 0.56 would enter > the interval (0.5,1.3] (note that the interval is open on the left). I > could just use a sign approach: > double testSign = (0.56 - Imin)*(0.56*Imax); > where Imim and Imax correspond to the extreme values of the interval. > Of course, testSign would only be negative in the right interval so I > could use an if statement to increment a counter for that interval > whenever this happens; > if(testSign < 0) > incrementCounter(interval(i)); > However, this would take a lot of time if I have a very BIG data set > (which is my case). So I thought that using a hashing function that > maps any double to the correct interval would be a very clever way to > do this in constant time. So basically, what I need is a function that > maps a double within an interval to a unique value so I can use the > hash_map provided by the __gnu_cxx namespace. To calculate the hash code, however, you need to normalize the value in the interval. Hash codes require a very strong concept of equivalence. This can be done very easily with std::map, which uses an ordering relationship. And while std::map is O(lg n), rather than O(1), it typically takes a very big set for the difference to be measurable. -- James Kanze (GABI Software, from CAI) email:james.kanze@gmail.com Conseils en informatique orientée objet/ Beratung in objektorientierter Datenverarbeitung 9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34 |
Re: hash_map
On Jun 15, 4:03 am, James Kanze <james.ka...@gmail.com> wrote:
> On Jun 14, 7:55 pm, aaragon <alejandro.ara...@gmail.com> wrote: > > > Thanks for all your suggestions. It seems that the problem I'm > > dealing with is harder than I thought. > > That's a frequent case with floating point:-). > > > > > The problem is as follows: I have a series of intervals that > > are obtained at run time: > > |_________________|_______________________________ |__________________|_____________| > > 0.1 0.5 1.3 1.5 1.65 > > Then, I have large data set of doubles that I need to map into each of > > these intervals. Therefore, for example the double 0.56 would enter > > the interval (0.5,1.3] (note that the interval is open on the left). I > > could just use a sign approach: > > double testSign = (0.56 - Imin)*(0.56*Imax); > > where Imim and Imax correspond to the extreme values of the interval. > > Of course, testSign would only be negative in the right interval so I > > could use an if statement to increment a counter for that interval > > whenever this happens; > > if(testSign < 0) > > incrementCounter(interval(i)); > > However, this would take a lot of time if I have a very BIG data set > > (which is my case). So I thought that using a hashing function that > > maps any double to the correct interval would be a very clever way to > > do this in constant time. So basically, what I need is a function that > > maps a double within an interval to a unique value so I can use the > > hash_map provided by the __gnu_cxx namespace. > > To calculate the hash code, however, you need to normalize the > value in the interval. Hash codes require a very strong concept > of equivalence. > > This can be done very easily with std::map, which uses an > ordering relationship. And while std::map is O(lg n), rather > than O(1), it typically takes a very big set for the difference > to be measurable. Well, with the map you still need a one to one correspondence between keys and values. The different running time complexity relies in the different data structures that are used by the map and the hash table but they need the same mapping. Therefore, I just need to find a function that maps several double values to a single std::pair (like a mathematical function pair<double,double> = f(double)). Once I have this, I could implement the mapping using either the std::map or the hash table, it doesn't matter. > > -- > James Kanze (GABI Software, from CAI) email:james.ka...@gmail.com > Conseils en informatique orientée objet/ > Beratung in objektorientierter Datenverarbeitung > 9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34 |
| All times are GMT. The time now is 07:59 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.