Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Why is a vector find faster than a set with find()?

Reply
Thread Tools

Why is a vector find faster than a set with find()?

 
 
Juha Nieminen
Guest
Posts: n/a
 
      08-07-2012
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> On Tue, 7 Aug 2012 14:59:12 +0000 (UTC)
> Juha Nieminen <(E-Mail Removed)> wrote:
>>(E-Mail Removed) wrote:
>>>>Yes, obviously (both functions do not do the same thing).
>>>
>>> Thats what I didn't realise. I assumed the generic find() would be
>>> specialised internally for the type of container it was operating on. I
>>> didn't realise it just did a dumb linear search on everything.

>>
>>Care to show us a version of find() that searches faster when it's given
>>iterators to a std::set?

>
> What?


My point is that if you actually try to implement a find() function that
tries to take advantage of the fact that the iterators are std::set
iterators and hence tries to make a O(log n) search, you'll quickly see
that it's not that trivial to do with those iterators only. (For one,
you don't have access to the "left" and "right" pointers of the nodes,
not in a standardized way.)

Perhaps <set> *could* provide a specialized version of std::find() that
it befriends and allows it to access those internal pointers in order to
make a O(log n) search. However, this isn't required by the standard and
thus you shouldn't rely on such a thing.
 
Reply With Quote
 
 
 
 
Casey Carter
Guest
Posts: n/a
 
      08-07-2012
On 2012-08-07 08:45, (E-Mail Removed) wrote:
> No, doesn't include filling up but thats so fast it would hardly register.
> I traced the delay down to the search of the set and timed that. Tried a
> vector and it was 3 times faster. Its annoying because I wanted to use a
> generic function with find() in it but now I'm having to explicitely check for
> a set and use the sets built in find() which is about 200 times faster and I'm
> presuming does do a binary chop.


If searches are an order of magnitude or two more common than
inserts/deletes - which seems to be the case from your description - you
will likely get best results from using std::lower_bound to binary
search a sorted vector.
 
Reply With Quote
 
 
 
 
rep_movsd
Guest
Posts: n/a
 
      08-09-2012

On Tuesday, August 7, 2012 6:34:41 PM UTC+5:30, (unknown) wrote:
> Hello
>
>
>
> I'm storing a large (100K) list of integer id's in a set which I need to
>
> frequenctly search for a given id. When I try doing this with a set using the
>
> standard find() function it takes 3 times longer than doing the same with a
>
> vector!
>
>
>
> Why is this? Surely given that its ordered a set could be searched using
>
> a binary chop whereas a vector must be a linear search. Even if the set is
>
> searched linearly why does it take 3 times as long?
>
>
>
> Thanks for any info
>
>
>
> B2003


You have a set of integers, the only sensible operations (which are close to log N) are :

Add int to a set
Remove int from a set
Know if an int is in that set
Finding the iterator pointing to a given int using set::find

std::find makes no assumptions about the iterators its given except that they have ++ and * and it does a linear search.

What exactly are you trying to achieve? The only thing I can think of is you look at a given int in the set and then iterate forward from there to get a certain range perhaps.
If you only need to test if the int is in the set then set::count() is your friend

 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      08-11-2012
On Tuesday, August 7, 2012 2:04:41 PM UTC+1, (unknown) wrote:

> I'm storing a large (100K) list of integer id's in a set which
> I need to frequenctly search for a given id. When I try doing
> this with a set using the standard find() function it takes 3
> times longer than doing the same with a vector!


> Why is this? Surely given that its ordered a set could be
> searched using a binary chop whereas a vector must be a linear
> search. Even if the set is searched linearly why does it take
> 3 times as long?


By definition, std::find does a linear search. That's part of
its definition. If performance is important, and you're dealing
with sorted data, there is std::lower_bound and company.

Practically speaking, neither std::find nor std::lower_bound
have any way of knowing whether the range they're iterating over
is sorted or not. It's up to you, the user, to tell it whether
the range is sorted or not, by choosing the appropriate
function. If you lie, it's undefined behavior, because
typically, the function has no way of knowing.

Finally: using an std::vector, and keeping it sorted, is often
superior to using std::set, because of greater locality. I'm
currently working on some high performance software, and we've
banned the use of node based containers for critical data,
because the loss of locality kills performance. We've also
banned them in large parts outside of the critical area, because
memory is very, very tight, and they tend to use a lot more
memory than std::vector as well. (Note that this is *not* a
general recommendation. We only took these steps because we had
to.)

--
James Kanze
 
Reply With Quote
 
Melzzzzz
Guest
Posts: n/a
 
      08-11-2012
On Sat, 11 Aug 2012 11:16:06 -0700 (PDT)
James Kanze <(E-Mail Removed)> wrote:

> On Tuesday, August 7, 2012 2:04:41 PM UTC+1, (unknown) wrote:
>
> > I'm storing a large (100K) list of integer id's in a set which
> > I need to frequenctly search for a given id. When I try doing
> > this with a set using the standard find() function it takes 3
> > times longer than doing the same with a vector!

>
> > Why is this? Surely given that its ordered a set could be
> > searched using a binary chop whereas a vector must be a linear
> > search. Even if the set is searched linearly why does it take
> > 3 times as long?

>
> By definition, std::find does a linear search. That's part of
> its definition. If performance is important, and you're dealing
> with sorted data, there is std::lower_bound and company.
>
> Practically speaking, neither std::find nor std::lower_bound
> have any way of knowing whether the range they're iterating over
> is sorted or not. It's up to you, the user, to tell it whether
> the range is sorted or not, by choosing the appropriate
> function. If you lie, it's undefined behavior, because
> typically, the function has no way of knowing.
>
> Finally: using an std::vector, and keeping it sorted, is often
> superior to using std::set, because of greater locality. I'm
> currently working on some high performance software, and we've
> banned the use of node based containers for critical data,
> because the loss of locality kills performance. We've also
> banned them in large parts outside of the critical area, because
> memory is very, very tight, and they tend to use a lot more
> memory than std::vector as well. (Note that this is *not* a
> general recommendation. We only took these steps because we had
> to.)
>


You could try Btree as it is combination of vector and tree .
Btree's have better lookup than standard red black trees,
while erasing and inserting is slower due nature
of vectors.


 
Reply With Quote
 
Melzzzzz
Guest
Posts: n/a
 
      08-11-2012
On Sat, 11 Aug 2012 19:35:05 +0100
Leigh Johnston <(E-Mail Removed)> wrote:

> On 11/08/2012 19:16, James Kanze wrote:
> > On Tuesday, August 7, 2012 2:04:41 PM UTC+1, (unknown) wrote:
> >
> >> I'm storing a large (100K) list of integer id's in a set which
> >> I need to frequenctly search for a given id. When I try doing
> >> this with a set using the standard find() function it takes 3
> >> times longer than doing the same with a vector!

> >
> >> Why is this? Surely given that its ordered a set could be
> >> searched using a binary chop whereas a vector must be a linear
> >> search. Even if the set is searched linearly why does it take
> >> 3 times as long?

> >
> > By definition, std::find does a linear search. That's part of
> > its definition. If performance is important, and you're dealing
> > with sorted data, there is std::lower_bound and company.
> >
> > Practically speaking, neither std::find nor std::lower_bound
> > have any way of knowing whether the range they're iterating over
> > is sorted or not. It's up to you, the user, to tell it whether
> > the range is sorted or not, by choosing the appropriate
> > function. If you lie, it's undefined behavior, because
> > typically, the function has no way of knowing.
> >
> > Finally: using an std::vector, and keeping it sorted, is often
> > superior to using std::set, because of greater locality. I'm
> > currently working on some high performance software, and we've
> > banned the use of node based containers for critical data,
> > because the loss of locality kills performance. We've also
> > banned them in large parts outside of the critical area, because
> > memory is very, very tight, and they tend to use a lot more
> > memory than std::vector as well. (Note that this is *not* a
> > general recommendation. We only took these steps because we had
> > to.)

>
> You can improve the locality of the node based containers by using a
> custom allocator; I got significant speed increases doing it this way
> (but how much was down to improved locality and how much was down to
> not using a general purpose allocator is unclear).
>
> /Leigh
>


I have tried to0, with custom allocator, but binary trees are inherently
not cache friendly, as after number of deletions and inserts, order
of nodes tend to became random, no matter how allocator is created.


 
Reply With Quote
 
Melzzzzz
Guest
Posts: n/a
 
      08-12-2012
On Tue, 7 Aug 2012 13:04:41 +0000 (UTC)
(E-Mail Removed) wrote:

> Hello
>
> I'm storing a large (100K) list of integer id's in a set which I need
> to frequenctly search for a given id. When I try doing this with a
> set using the standard find() function it takes 3 times longer than
> doing the same with a vector!


That is because sets iterators are complex. Incrementing sets iterator
is O(n), while incrementing vectors iterator is O(1).


>
> Why is this? Surely given that its ordered a set could be searched
> using a binary chop whereas a vector must be a linear search. Even if
> the set is searched linearly why does it take 3 times as long?


Do you mean sets member function find or generic find?

On my system standard find with vector is 200 times slower than sets
member function find with 100k elements.


 
Reply With Quote
 
Melzzzzz
Guest
Posts: n/a
 
      08-12-2012
On Sun, 12 Aug 2012 03:40:32 +0200
Melzzzzz <(E-Mail Removed)> wrote:

> On Tue, 7 Aug 2012 13:04:41 +0000 (UTC)
> (E-Mail Removed) wrote:
>
> > Hello
> >
> > I'm storing a large (100K) list of integer id's in a set which I
> > need to frequenctly search for a given id. When I try doing this
> > with a set using the standard find() function it takes 3 times
> > longer than doing the same with a vector!

>
> That is because sets iterators are complex. Incrementing sets iterator
> is O(n), while incrementing vectors iterator is O(1).
>
>

Pardon, O(log n)

 
Reply With Quote
 
Juha Nieminen
Guest
Posts: n/a
 
      08-12-2012
Melzzzzz <(E-Mail Removed)> wrote:
> That is because sets iterators are complex. Incrementing sets iterator
> is O(log n)


Actually it's not that simple.

The amount of operations needed to increment a set iterator may be O(log n)
at worst. However, traversing the entire tree from beginning to end is
still just O(n) (and not O(n log n) as one could hastily think). It might
not be immediately intuitive why, but it's a good thinking exercise.

The slowness of traversing a set from beginning to end compared to a
vector does not come from the computational complexity being larger
(it's the same for both). It's just that the set iterator has to do
more things to increment the iterator. (Also, there's a higher probability
of cache misses with a tree than a vector.)
 
Reply With Quote
 
Melzzzzz
Guest
Posts: n/a
 
      08-12-2012
On Sun, 12 Aug 2012 07:16:57 +0000 (UTC)
Juha Nieminen <(E-Mail Removed)> wrote:

> Melzzzzz <(E-Mail Removed)> wrote:
> > That is because sets iterators are complex. Incrementing sets
> > iterator is O(log n)

>
> Actually it's not that simple.
>
> The amount of operations needed to increment a set iterator may be
> O(log n) at worst. However, traversing the entire tree from beginning
> to end is still just O(n) (and not O(n log n) as one could hastily
> think). It might not be immediately intuitive why, but it's a good
> thinking exercise.


Are you sure? Tree iterator has to travel up-down through nodes (take
a look at Treap iterator I posted in thread about binary search trees)?
I didn't think much, but traversing trough several nodes to
reach target node, could not be O(n) when traversing whole tree, I
think.

>
> The slowness of traversing a set from beginning to end compared to a
> vector does not come from the computational complexity being larger
> (it's the same for both). It's just that the set iterator has to do
> more things to increment the iterator. (Also, there's a higher
> probability of cache misses with a tree than a vector.)


Cache misses are another problem, yes.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Which has faster access Binary Search of std::vector or std::set??? John H. C++ 0 03-01-2010 11:05 PM
Initializing vector<vector<int> > and other vector questions... pmatos C++ 6 04-26-2007 05:39 PM
findcontrol("PlaceHolderPrice") why why why why why why why why why why why Mr. SweatyFinger ASP .Net 2 12-02-2006 03:46 PM
Free memory allocate by a STL vector, vector of vector, map of vector Allerdyce.John@gmail.com C++ 8 02-18-2006 12:48 AM
[Numeric] column vector faster than row vector in mat multiply? Zhang Le Python 1 03-04-2005 08:27 PM



Advertisments