Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Container for Look Up to be Fast

Reply
Thread Tools

Container for Look Up to be Fast

 
 
Pallav singh
Guest
Posts: n/a
 
      08-13-2011
Hi All ,

i have data coming at variable rate and basic requirement was look up
and insertion should be very Fast
i used STL MAP for that , but when data was very HUGE(10 GB data was
to be stored in MAP ), its performance degraded.

Time to fetch data based on KEY increases exponentially.

when I use unordered_map of C++0x to store data, it got reduced
significantly.

Is there any Other container provided by Boost or Ace Framework, which
i can use to reduce my Look up time more.

Thanks
Pallav Singh


 
Reply With Quote
 
 
 
 
Dombo
Guest
Posts: n/a
 
      08-13-2011
Op 13-Aug-11 11:42, Pallav singh schreef:
> Hi All ,
>
> i have data coming at variable rate and basic requirement was look up
> and insertion should be very Fast
> i used STL MAP for that , but when data was very HUGE(10 GB data was
> to be stored in MAP ), its performance degraded.
>
> Time to fetch data based on KEY increases exponentially.


Considering the computational complexity of the std::map (O(log n)) I
wouldn't expect the lookup time to increase exponentially with the size
of the data set.

Probably you are experiencing the effects caching and/or virtual memory
which would explain why performance degrades dramatically when the data
set gets large. When there are a huge amount of elements in the map,
chances are that the nodes are spread out over memory. A lookup could
result in memory accesses all over the place which could result in many
cache misses and/or page faults. These days the way you access memory
(locality of reference) can have a bigger impact on performance than the
computational complexity.

> when I use unordered_map of C++0x to store data, it got reduced
> significantly.
>
> Is there any Other container provided by Boost or Ace Framework, which
> i can use to reduce my Look up time more.


As far as lookup performance is concerned hash based associative
containers are about as good as it gets, assuming the hash function is
good. Possibly you may be able to provide a better hashing function for
your data.

If that doesn't help the only other option I can think of is to tackle
the problem at a more fundamental (design) level.

 
Reply With Quote
 
 
 
 
Jorgen Grahn
Guest
Posts: n/a
 
      08-13-2011
On Sat, 2011-08-13, Pallav singh wrote:
> Hi All ,
>
> i have data coming at variable rate and basic requirement was look up
> and insertion should be very Fast
> i used STL MAP for that , but when data was very HUGE(10 GB data was
> to be stored in MAP ), its performance degraded.
>
> Time to fetch data based on KEY increases exponentially.


But what about insertions and deletions? I'd expect that to be even
worse.

> when I use unordered_map of C++0x to store data, it got reduced
> significantly.
>
> Is there any Other container provided by Boost or Ace Framework, which
> i can use to reduce my Look up time more.


I think I would do an experiment with a BerkeleyDB of some kind stored
on disk, to see if that's better for huge data sets. Maybe also try to
go from an unordered_map<Key, Value> to unordered_map<Key, Value*> to
see if that is more cache-friendly.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
 
Reply With Quote
 
Leo Equinox Gaspard
Guest
Posts: n/a
 
      08-13-2011
Le 13/08/2011 11:42, Pallav singh a écrit :
> Hi All ,
>
> i have data coming at variable rate and basic requirement was look up
> and insertion should be very Fast
> i used STL MAP for that , but when data was very HUGE(10 GB data was
> to be stored in MAP ), its performance degraded.
>
> Time to fetch data based on KEY increases exponentially.
>
> when I use unordered_map of C++0x to store data, it got reduced
> significantly.
>
> Is there any Other container provided by Boost or Ace Framework, which
> i can use to reduce my Look up time more.
>
> Thanks
> Pallav Singh
>
>


As you have 10GB of data, your OS is likely to swap, explaining the
performance issue (except is you have 10GB RAM ?). Moreover, even if
you have the RAM needed, RAM access isn't as fast as cache access.
So, maybe should you use something like a B-Tree with the number of
childs per node adapted to your processor cache, so you could get a
O(logK N) (with K the number of childs and N the number of elements),
and reducing RAM accesses by adapting to your cache processor.

But I'm not sure whether retrieving, say, 1Mo of data from RAM to
processor cache is better or worse in performance than retrieving 10
times 100Ko from RAM to cache.

However, if your OS is swapping (if you have not enough memory), then
the B-Tree is very likely to improve performance, compared to std::map.

Compared to std::unordered_map, I'm not sure which one is the best.
A B-tree is an improvement of a std::map for large amounts of data,
an unordered_map could also be improved.
For example, reserve enough buckets on the creation of the unordered_map,
of create a better hash function.

Then, maybe will some libraries offer you more data structures.

So, my advices :
- Try B-Trees
- For unordered_map :
- Reserve enough buckets to avoid re-hashing
- Provide a good hash function (examine the space your keys are in,
and try to make it fit with the most large space of integers, with
as little collisions as possible)
- Finally, maybe are there some other data structures I don't know yet.

Cheers & hth.,
Leo
 
Reply With Quote
 
Juha Nieminen
Guest
Posts: n/a
 
      08-14-2011
Dombo <> wrote:
> Probably you are experiencing the effects caching and/or virtual memory


I think the term you are looking for is "swapping".
 
Reply With Quote
 
Nick Keighley
Guest
Posts: n/a
 
      08-15-2011
On Aug 14, 7:21*pm, Juha Nieminen <nos...@thanks.invalid> wrote:
> Dombo <do...@disposable.invalid> wrote:
> > Probably you are experiencing the effects caching and/or virtual memory

>
> * I think the term you are looking for is "swapping".


not really
 
Reply With Quote
 
Juha Nieminen
Guest
Posts: n/a
 
      08-15-2011
Nick Keighley <> wrote:
> On Aug 14, 7:21*pm, Juha Nieminen <nos...@thanks.invalid> wrote:
>> Dombo <do...@disposable.invalid> wrote:
>> > Probably you are experiencing the effects caching and/or virtual memory

>>
>> * I think the term you are looking for is "swapping".

>
> not really


I don't think caching and virtual memory would cause std::map to slow
down from logaritmic to exponential. However, swapping sounds exactly
like it could.
 
Reply With Quote
 
Jorgen Grahn
Guest
Posts: n/a
 
      08-15-2011
On Mon, 2011-08-15, Juha Nieminen wrote:
> Nick Keighley <> wrote:
>> On Aug 14, 7:21*pm, Juha Nieminen <nos...@thanks.invalid> wrote:
>>> Dombo <do...@disposable.invalid> wrote:
>>> > Probably you are experiencing the effects caching and/or virtual memory
>>>
>>> * I think the term you are looking for is "swapping".

>>
>> not really

>
> I don't think caching and virtual memory would cause std::map to slow
> down from logaritmic to exponential. However, swapping sounds exactly
> like it could.


Are you two /really/ talking about exponential, or are you using the
word to mean "ridiculously slow"?

Even if you have a tree that's 40 levels deep and everything has been
swapped to disk, the number of disk hits needed to find a certain node
are proportional to the depth -- that is, increases logarithmically
with the size of the tree.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
 
Reply With Quote
 
Leo Equinox Gaspard
Guest
Posts: n/a
 
      08-16-2011
Le 13/08/2011 14:22, Jorgen Grahn a écrit :
[...]
> I think I would do an experiment with a BerkeleyDB of some kind stored
> on disk, to see if that's better for huge data sets. Maybe also try to
> go from an unordered_map<Key, Value> to unordered_map<Key, Value*> to
> see if that is more cache-friendly.
>
> /Jorgen
>


I don't think an unordered_map could be easily cached, because of its
random-like distribution. So, except if you are always requesting the
same elements, I don't think processor cache could choose which of the
thousands of data you got in your map it should cache.

BTW, to get better advices, you (Pallav singh) could say how much a
data is (10GB of 8B data isn't treated by the same data structure as
10GB of 1GB data).

Leo
 
Reply With Quote
 
Juha Nieminen
Guest
Posts: n/a
 
      08-16-2011
Jorgen Grahn <grahn+> wrote:
> Are you two /really/ talking about exponential, or are you using the
> word to mean "ridiculously slow"?


Ok, it's probably not technically speaking O(2^n) with swapping, just
extremely slow. My point was, however, that caching or virtual memory
is not the cause for this, but disk swapping.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
container inside container in stl wolverine C++ 2 07-24-2006 03:08 PM
Copy elements from one STL container to another STL container Marko.Cain.23@gmail.com C++ 4 02-16-2006 05:03 PM
std::transform container => std::abs(container) Steven T. Hatton C++ 4 12-05-2004 07:10 AM
STL: container's values setup by another container Maitre Bart C++ 2 02-11-2004 12:11 AM
std::container::iterator vs std::container::pointer Vivi Orunitia C++ 11 02-04-2004 08:09 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57