Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Why is a vector find faster than a set with find()?

Thread Tools

Why is a vector find faster than a set with find()?

James Kanze
Posts: n/a
On Saturday, August 11, 2012 7:35:05 PM UTC+1, Leigh Johnston wrote:
> On 11/08/2012 19:16, James Kanze wrote:

> > On Tuesday, August 7, 2012 2:04:41 PM UTC+1, (unknown) wrote:

> >> I'm storing a large (100K) list of integer id's in a set which
> >> I need to frequenctly search for a given id. When I try doing
> >> this with a set using the standard find() function it takes 3
> >> times longer than doing the same with a vector!

> >> Why is this? Surely given that its ordered a set could be
> >> searched using a binary chop whereas a vector must be a linear
> >> search. Even if the set is searched linearly why does it take
> >> 3 times as long?

> > By definition, std::find does a linear search. That's part of
> > its definition. If performance is important, and you're dealing
> > with sorted data, there is std::lower_bound and company.

> > Practically speaking, neither std::find nor std::lower_bound
> > have any way of knowing whether the range they're iterating over
> > is sorted or not. It's up to you, the user, to tell it whether
> > the range is sorted or not, by choosing the appropriate
> > function. If you lie, it's undefined behavior, because
> > typically, the function has no way of knowing.

> > Finally: using an std::vector, and keeping it sorted, is often
> > superior to using std::set, because of greater locality. I'm
> > currently working on some high performance software, and we've
> > banned the use of node based containers for critical data,
> > because the loss of locality kills performance. We've also
> > banned them in large parts outside of the critical area, because
> > memory is very, very tight, and they tend to use a lot more
> > memory than std::vector as well. (Note that this is *not* a
> > general recommendation. We only took these steps because we had
> > to.)

> You can improve the locality of the node based containers by using a
> custom allocator; I got significant speed increases doing it this way
> (but how much was down to improved locality and how much was down to not
> using a general purpose allocator is unclear).

That would certainly help, but you'll still not get the locality of
std::vector, because each node is larger than an entry in a vector. And
of course, there's no way of telling the allocator where the element
being allocated will end up in the set, so that if you're iterating
through sequentially, you could still end up with relatively poor
locality. (If you had some way of telling the allocator the depth, you
could really win by regrouping the top nodes. Binary search on a vector
has poor locality at the beginning, since it's jumping all over the

In the end, there is no one perfect solution. If the abstraction is
fundamental to your application, define a class for it yourself; use
std::vector or std::set or whatever is simplest for you to implement it.
But provide it with a minimal interface (e.g. forward iterators, if
that's all that's needed, even if the implementation is std::vector), so
that you can easily switch implementation later, when you know where the
performance bottlenecks are.

James Kanze
Reply With Quote
Juha Nieminen
Posts: n/a
Melzzzzz <(E-Mail Removed)> wrote:
>> The amount of operations needed to increment a set iterator may be
>> O(log n) at worst. However, traversing the entire tree from beginning
>> to end is still just O(n) (and not O(n log n) as one could hastily
>> think). It might not be immediately intuitive why, but it's a good
>> thinking exercise.

> Are you sure?

Yes, I'm sure.

Think about the amount of steps required *per node* for a full traversal
from beginning to end. If you think about it, you'll see that the amount
of steps required per node is fixed, ie. O(1). (Thus for the entire tree
the total amount of steps is O(n).)

(If you still can't figure it out, the steps are, roughly: "Arrive at node,
go left, arrive from left, go right, arrive from right, go to parent." This
exact procedure is repeated for each node, and the amount of operations per
node does not depend on the size of the tree.)
Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Which has faster access Binary Search of std::vector or std::set??? John H. C++ 0 03-01-2010 11:05 PM
Initializing vector<vector<int> > and other vector questions... pmatos C++ 6 04-26-2007 05:39 PM
findcontrol("PlaceHolderPrice") why why why why why why why why why why why Mr. SweatyFinger ASP .Net 2 12-02-2006 03:46 PM
Free memory allocate by a STL vector, vector of vector, map of vector C++ 8 02-18-2006 12:48 AM
[Numeric] column vector faster than row vector in mat multiply? Zhang Le Python 1 03-04-2005 08:27 PM