Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Meyers's preference for vectors over maps

Reply
Thread Tools

Meyers's preference for vectors over maps

 
 
Fred Ma
Guest
Posts: n/a
 
      01-28-2004
Hello,

I was looking at Meyers's "Effective STL", item 23
about choosing between vectors and maps (at least
that the choice for me). In many cases, using
sorted vectors is faster for lookups. The two
reasons are (1) that vector elements are smaller,
since map elements need at least 3 pointers, and
(2) contiguity of vector elements in memory ensure
fewer page faults.

For me, I will a container of objects that are much
bigger than pointers, so reason (1) doesn't apply.

Reason (2) is what I'm trying to understand better.
Let me describe my application.

In my case, I will have N items in the container,
will look up n (n<N) items, do some processing,
then replace the n worst items in the container
with the n new items from processing. Here, worse
is defined by the sorting function. The problem
is that n can vary from 1 to N, making the advantages
of vector vs. map very unclear. For small n,
map is better because I'm basically interspersing
1 modification of the database with 1 lookup.
For large n, vector is better because it's faster
to sort a whole bunch of objects at once rather
than building a map in a piecemeal fashion.

Since I can't get a clear answer from that line of
reasoning, I looked more carefully at Meyers's
reason (2). I will be using binary search for
lookup, and he claims that contiguity of elements
is good for binary search in particular. More
specifically, I understood him as meaning that
closeness of elements in memory should match
closeness of elements in the traversal of
container elements. However, my understanding
of binary search is that it doesn't traverse
elements in sequential order. So why the
contiguity of elements in a vector be of any
advantage?

Thanks for clarifying. And if there are any
other considerations that would make it easier
to decide upon a proper container, I'd appreciate
those too. Right now, I'm tempted to go for a
vector because I haven't used a map before. But
for that reason, I'm also thinking it's not a bad
reason to dabble in maps, too.

Fred
--
Fred Ma
Dept. of Electronics, Carleton University
1125 Colonel By Drive, Ottawa, Ontario
Canada, K1S 5B6
 
Reply With Quote
 
 
 
 
Hendrik Belitz
Guest
Posts: n/a
 
      01-28-2004
Fred Ma wrote:

> In my case, I will have N items in the container,
> will look up n (n<N) items, do some processing,
> then replace the n worst items in the container
> with the n new items from processing. Here, worse
> is defined by the sorting function. The problem
> is that n can vary from 1 to N, making the advantages
> of vector vs. map very unclear. For small n,
> map is better because I'm basically interspersing
> 1 modification of the database with 1 lookup.
> For large n, vector is better because it's faster
> to sort a whole bunch of objects at once rather
> than building a map in a piecemeal fashion.


That looks like a good candidate for usage of a priority queue.
Removing items from both a vector and a map is slow, and since you need
a sorting criteria, a normal deque will not suffice.

--
To get my real email adress, remove the two onkas
--
Dipl.-Inform. Hendrik Belitz
Central Institute of Electronics
Research Center Juelich
 
Reply With Quote
 
 
 
 
Fred Ma
Guest
Posts: n/a
 
      01-28-2004
Hendrik Belitz wrote:
>
> Fred Ma wrote:
>
> > In my case, I will have N items in the container,
> > will look up n (n<N) items, do some processing,
> > then replace the n worst items in the container
> > with the n new items from processing. Here, worse
> > is defined by the sorting function. The problem
> > is that n can vary from 1 to N, making the advantages
> > of vector vs. map very unclear. For small n,
> > map is better because I'm basically interspersing
> > 1 modification of the database with 1 lookup.
> > For large n, vector is better because it's faster
> > to sort a whole bunch of objects at once rather
> > than building a map in a piecemeal fashion.

>
> That looks like a good candidate for usage of a priority queue.
> Removing items from both a vector and a map is slow, and since you need
> a sorting criteria, a normal deque will not suffice.



Hi, Hendrik,

I looked up priority queues in Josuttis's "The C++
Standard Library". It's built on top of the standard
iterator-supporting containers, with the vector as a
default. Wouldn't I still be faced with the comparison
of issues between vectors and maps for my usage? My
impression is that the special container adapters
simplify the interface rather than changing the
underlying mechanism of data storage and retrieval.
Admittedly, this could be a misimpression on my part.

Fred
 
Reply With Quote
 
Hendrik Belitz
Guest
Posts: n/a
 
      01-28-2004
Fred Ma wrote:

> Hi, Hendrik,
>
> I looked up priority queues in Josuttis's "The C++
> Standard Library". It's built on top of the standard
> iterator-supporting containers, with the vector as a
> default. Wouldn't I still be faced with the comparison
> of issues between vectors and maps for my usage? My
> impression is that the special container adapters
> simplify the interface rather than changing the
> underlying mechanism of data storage and retrieval.
> Admittedly, this could be a misimpression on my part.


Uh well, I didn't even recognized that priority queues are available as an
STL adapter. Normally priority queues are implemented by fibonacci heaps
and are very efficient. Maybe you should look for some implementation which
does it that way instead of using a stl container/adapter.

--
To get my real email adress, remove the two onkas
--
Dipl.-Inform. Hendrik Belitz
Central Institute of Electronics
Research Center Juelich
 
Reply With Quote
 
tom_usenet
Guest
Posts: n/a
 
      01-28-2004
On 28 Jan 2004 08:13:37 GMT, Fred Ma <(E-Mail Removed)> wrote:

>Hello,
>
>I was looking at Meyers's "Effective STL", item 23
>about choosing between vectors and maps (at least
>that the choice for me). In many cases, using
>sorted vectors is faster for lookups. The two
>reasons are (1) that vector elements are smaller,
>since map elements need at least 3 pointers, and
>(2) contiguity of vector elements in memory ensure
>fewer page faults.
>
>For me, I will a container of objects that are much
>bigger than pointers, so reason (1) doesn't apply.
>
>Reason (2) is what I'm trying to understand better.
>Let me describe my application.
>
>In my case, I will have N items in the container,


Is N fixed at runtime? IOW, is the size of the container constant?

>will look up n (n<N) items, do some processing,
>then replace the n worst items in the container
>with the n new items from processing. Here, worse
>is defined by the sorting function. The problem
>is that n can vary from 1 to N, making the advantages
>of vector vs. map very unclear. For small n,
>map is better because I'm basically interspersing
>1 modification of the database with 1 lookup.
>For large n, vector is better because it's faster
>to sort a whole bunch of objects at once rather
>than building a map in a piecemeal fashion.


I would recommend a vector. Keep another empty vector handy. Start by
sorting your main vector. Calculate your new range of n to add and
sort it (in O(n)), then merge that with your old vector into your same
sized vector.

tempvec.reserve(v.size());
std::sort(newrange.begin(), newrange.end());
//merge the N-n remaining elements from v with the new range:
std::merge(v.begin() + n, v.end(), newrange.begin(), newrange.end(),
std::back_inserter(tempvec));

//swap
tempvec.swap(v);
tempvec.clear();

etc. That way the operation is only n log n, not N log N.

>
>Since I can't get a clear answer from that line of
>reasoning, I looked more carefully at Meyers's
>reason (2). I will be using binary search for
>lookup, and he claims that contiguity of elements
>is good for binary search in particular. More
>specifically, I understood him as meaning that
>closeness of elements in memory should match
>closeness of elements in the traversal of
>container elements. However, my understanding
>of binary search is that it doesn't traverse
>elements in sequential order. So why the
>contiguity of elements in a vector be of any
>advantage?


Because they are all close to each other in memory. If a 4k chunk of
the vector is pulled into cache memory, you'll get a lot of cache
hits.

>Thanks for clarifying. And if there are any
>other considerations that would make it easier
>to decide upon a proper container, I'd appreciate
>those too. Right now, I'm tempted to go for a
>vector because I haven't used a map before. But
>for that reason, I'm also thinking it's not a bad
>reason to dabble in maps, too.


Given the fixed size, vector seems a good choice. If your elements are
expensive to copy, I'd go for a vector or smart pointers, or just
pointers (with manual memory management).

Tom

C++ FAQ: http://www.parashift.com/c++-faq-lite/
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
 
Reply With Quote
 
Daniel T.
Guest
Posts: n/a
 
      01-28-2004
Fred Ma <(E-Mail Removed)> wrote:

> I was looking at Meyers's "Effective STL", item 23
> about choosing between vectors and maps (at least
> that the choice for me). In many cases, using
> sorted vectors is faster for lookups. The two
> reasons are (1) that vector elements are smaller,
> since map elements need at least 3 pointers, and
> (2) contiguity of vector elements in memory ensure
> fewer page faults.


I guess the obvious answer is to create an adaptor that will allow you
to easily switch between the two. Then profile both and pick the most
efficient one.
 
Reply With Quote
 
tom_usenet
Guest
Posts: n/a
 
      01-28-2004
On Wed, 28 Jan 2004 12:49:02 +0000, tom_usenet
<(E-Mail Removed)> wrote:

>I would recommend a vector. Keep another empty vector handy. Start by
>sorting your main vector. Calculate your new range of n to add and
>sort it (in O(n)),


That's meant to say O(n log n)!

Tom

C++ FAQ: http://www.parashift.com/c++-faq-lite/
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
 
Reply With Quote
 
Dietmar Kuehl
Guest
Posts: n/a
 
      01-28-2004
Fred Ma <(E-Mail Removed)> wrote:
> I looked up priority queues in Josuttis's "The C++
> Standard Library". It's built on top of the standard
> iterator-supporting containers, with the vector as a
> default. Wouldn't I still be faced with the comparison
> of issues between vectors and maps for my usage?


If the operations provided for a priority queue fit your needs, it is almost
certainly the better choice: this data structure is specialized for finding
the smallest (or biggest; depends on the direction of your predicate)
element as fast as possible. That is, insertion into a priority queue just
does the necessary work to provide fast access to the smallest element.
Likewise for removal.

If you need to access or change other elements than the smallest one, things
change quite a lot, however. The priority queue in the standard library is
not at all suited for this but there are priority queues which are helpful
here, too. However, the difference to other data structures like eg. maps
is reduced.

> My
> impression is that the special container adapters
> simplify the interface rather than changing the
> underlying mechanism of data storage and retrieval.


This is true for stack and queue but not for priority_queue. The latter
implements the logic of a 2-heap (a special case of a d-heap which is just
a balanced tree with d children for each inner node). This is quite different
from doing eg. a binary search etc. Also, the number of elements changed
during inserts or removals is not linear in the size of the vector but
logarithmic.

Concerings Scott's comment about the advantages of vector over map: at first
sight, a map looks superiour to a vector. The first insight why it is not so
in all situations is the amount of work necessary to traverse the map's
internal tree compared to a binary search within the vector which just
effectively uses an "ad-hoc" binary tree. To move from one node to it child,
the map has to dereference a pointer whereas the vector simply a clever
addition or subtraction. This is often much faster. Towards the end of the
search where most of the work is going on and the steps become smaller,
the "nodes" (in case of the map the explicit tree nodes, in case of the vector
the implicit current location) are more likely to sit on one page for the
vector because they are [nearly] adjacent to each other. A page miss is
pretty expensive. Also, the vector's nodes are much smaller (just the data
you store there) compared to the map's nodes (which typically contain at least
three pointers for the map's maintenance in addition to the data you store)
increasing the chances to be on one page even more. Especially for small user
data, the vector's movements are likely to eventually move around only in a
cache-line of the CPU while the map is still causing page misses or at least
cache misses.

For searching it is pretty obvious that you want to use a vector in favour of
a map. If you can go with an unorder structure, a hash is an even better
choice, BTW.

Things become less obvious if you need to change the data: if the number of
searches compared to the number of modifications (insertions and removals;
a change is effectively a removal followed by an insertion for both a map and
a sorted vector although it can be improved a little bit) becomes small, ie.
if you have only few searches between modification, things start to change:
a huge collection of data will cause a lot of moving for vectors while it will
cause only a few pointer adjustments for maps. On the other hand, the pointer
adjustments are more involved than simply moving a set of small PODs in a
fairly small vector. The tricky part is figuring out where one approach is
better than the other in such a situation because apart from the CPU cycles
things like the memory foot-print also have an effect. This normally results
in vectors with even a few hunderd elements being faster than maps.

The details depend on the exact setting, including the size of elements, the
size of the container, the total amount of memory used, the access patterns,
etc. Thus, it is normally reasonable to implement a working solution using an
appropriate class and later profile the whole application to find out whether
this part is critical at all (normally the critical parts are others than one
suspects...) and if so, replace the solution by alternatives to measure them,
too. It may be reasonable to use a class with an appropriate interface which
is internally implemented by one of the choices and whose implementation is
later changed as needed.
--
<(E-Mail Removed)> <http://www.dietmar-kuehl.de/>
Phaidros eaSE - Easy Software Engineering: <http://www.phaidros.com/>
 
Reply With Quote
 
Dietmar Kuehl
Guest
Posts: n/a
 
      01-28-2004
Hendrik Belitz wrote:
> Uh well, I didn't even recognized that priority queues are available as an
> STL adapter. Normally priority queues are implemented by fibonacci heaps
> and are very efficient.


Are they, now? Well, the 'std:riority_queue' adapter is by an order
of magnitude faster than the best Fibonacci heap implementation I could
do. Of course, this alone does not say much but I could also improve on
the adapter implementation from SGI-STL, although only by something like
5%.

The advantage of Fibonacci heaps over the 2-heap normally used by the
priority queue adapter is when it comes down to modifying elements:
this operation is not supported at all while it is the operation
Fibonacci heaps are good at. If this feature is needed, Fibonacci heaps
are on par with d-heaps (d == 2 is actually not the best choice here;
I think d == 5 was optimum for my tests) extended to provide this
feature for all number of elements I tested it with (which was up to
10 million if I remember correctly).

> Maybe you should look for some implementation
> which does it that way instead of using a stl container/adapter.


Have a look at <http://www.dietmar-kuehl.de/cxxrt/heaps.tar.gz>. This
is a collection of priority queues I have implemented. It includes
quite a few different ones, including Fibonacci heaps.
--
<(E-Mail Removed)> <http://www.dietmar-kuehl.de/>
Phaidros eaSE - Easy Software Engineering: <http://www.phaidros.com/>
 
Reply With Quote
 
Fred Ma
Guest
Posts: n/a
 
      01-28-2004
Dietmar Kuehl wrote:
>
> If you need to access or change other elements than the smallest one, things
> change quite a lot, however. The priority queue in the standard library is
> not at all suited for this but there are priority queues which are helpful
> here, too. However, the difference to other data structures like eg. maps
> is reduced.


See, that's the problem. In each outer-most iteration, I lookup n items
that can occur anywhere in the N-length vector, not just at the ends. If
n was either small or almost the same as N, then I can say whether I have
lots of lookups per resorting of the vector (after replacing the n
worst vectors with new elements created from the n elements that were
looked up). (It's a genetic algorithm, in case that's of interest to
anyone). If the standard C++ library's priority queue is not optimized
for this, then I should probably choose between a vector or a map. I'm
not adverse to exploring more custom solutions, but for the first time
wandering away from the ol' vector, I prefer a smaller step, with lots of
published documentation (books, website references, news articles, faqs).

<<SNIP>>

> Concerings Scott's comment about the advantages of vector over map: at first
> sight, a map looks superiour to a vector. The first insight why it is not so
> in all situations is the amount of work necessary to traverse the map's
> internal tree compared to a binary search within the vector which just
> effectively uses an "ad-hoc" binary tree. To move from one node to it child,
> the map has to dereference a pointer whereas the vector simply a clever
> addition or subtraction. This is often much faster. Towards the end of the
> search where most of the work is going on and the steps become smaller,
> the "nodes" (in case of the map the explicit tree nodes, in case of the vector
> the implicit current location) are more likely to sit on one page for the
> vector because they are [nearly] adjacent to each other. A page miss is
> pretty expensive. Also, the vector's nodes are much smaller (just the data
> you store there) compared to the map's nodes (which typically contain at least
> three pointers for the map's maintenance in addition to the data you store)
> increasing the chances to be on one page even more. Especially for small user
> data, the vector's movements are likely to eventually move around only in a
> cache-line of the CPU while the map is still causing page misses or at least
> cache misses.
>
> For searching it is pretty obvious that you want to use a vector in favour of
> a map. If you can go with an unorder structure, a hash is an even better
> choice, BTW.
>
> Things become less obvious if you need to change the data: if the number of
> searches compared to the number of modifications (insertions and removals;
> a change is effectively a removal followed by an insertion for both a map and
> a sorted vector although it can be improved a little bit) becomes small, ie.
> if you have only few searches between modification, things start to change:
> a huge collection of data will cause a lot of moving for vectors while it will
> cause only a few pointer adjustments for maps. On the other hand, the pointer
> adjustments are more involved than simply moving a set of small PODs in a
> fairly small vector. The tricky part is figuring out where one approach is
> better than the other in such a situation because apart from the CPU cycles
> things like the memory foot-print also have an effect. This normally results
> in vectors with even a few hunderd elements being faster than maps.
>
> The details depend on the exact setting, including the size of elements, the
> size of the container, the total amount of memory used, the access patterns,
> etc. Thus, it is normally reasonable to implement a working solution using an
> appropriate class and later profile the whole application to find out whether
> this part is critical at all (normally the critical parts are others than one
> suspects...) and if so, replace the solution by alternatives to measure them,
> too. It may be reasonable to use a class with an appropriate interface which
> is internally implemented by one of the choices and whose implementation is
> later changed as needed.


Yes, I suspected that Meyer's reasoning about page misses would be more
applicable toward the end of the search, since you are bisecting smaller
pieces of your vector. I just wasn't sure whether it was enough for it
to happen toward the end. Also, how much toward the end depends on the
element sizes, as you mentioned, as well as the cache size. I have a
vector of 24 doubles, plus various overhead, and I was trying to keep
away from having to consider the hardware parameters e.g. cache size.
But it seems there's no way to avoid it. Profiling seems to be the
only sure way. I will put that off for now, since I'm trying to get
some results for a deadline (no one else's problem, of course).

I also get the point about the 3 or more pointers per node in a map.
At first, I thought that the issue here was the size of the pointers,
which is much smaller than the container elements in my case. However,
you clarified that the dereferencing overhead also mattered. Thanks for
clearing that up.

I think I'll stick with the ol' vector this time. Expanding my horizons
will happen another day, even such a small expansion as to maps. And, of
course, to a comparison between maps and vectors.

Thanks, all, for your elaborations.

Fred
--
Fred Ma
Dept. of Electronics, Carleton University
1125 Colonel By Drive, Ottawa, Ontario
Canada, K1S 5B6
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
c++ primer statement about vectors containing vectors pauldepstein@att.net C++ 3 03-26-2008 06:22 PM
VOIP over VPN over TCP over WAP over 3G Theo Markettos UK VOIP 2 02-14-2008 03:27 PM
Multidimesional maps, vectors and iterators Sheep C++ 1 08-14-2006 10:15 AM
some help required with maps , stucts and vectors. LeTubs C++ 1 12-05-2005 07:29 PM
Ruby, SWIG and C++: how to properly wrap vector of vectors of doubles (2D vectors)? Ruby 0 09-14-2005 05:47 PM



Advertisments