Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Why is a vector find faster than a set with find()?

Reply
Thread Tools

Why is a vector find faster than a set with find()?

 
 
boltar2003@boltar.world
Guest
Posts: n/a
 
      08-07-2012
Hello

I'm storing a large (100K) list of integer id's in a set which I need to
frequenctly search for a given id. When I try doing this with a set using the
standard find() function it takes 3 times longer than doing the same with a
vector!

Why is this? Surely given that its ordered a set could be searched using
a binary chop whereas a vector must be a linear search. Even if the set is
searched linearly why does it take 3 times as long?

Thanks for any info

B2003

 
Reply With Quote
 
 
 
 
Alain Ketterlin
Guest
Posts: n/a
 
      08-07-2012
http://www.velocityreviews.com/forums/(E-Mail Removed) writes:

> I'm storing a large (100K) list of integer id's in a set which I need
> to frequenctly search for a given id. When I try doing this with a set
> using the standard find() function it takes 3 times longer than doing
> the same with a vector!


It is kind of surprising with such a large number of items. Linear scan
is certainly faster with small vectors/sets, but 10^5 should be enough
to make algorithmic complexity be the major factor.

What do you count exactly? Does your timer include filling up the set?
Can you post some (simplified) code exhibiting the phenomenon?

> Why is this? Surely given that its ordered a set could be searched
> using a binary chop whereas a vector must be a linear search. Even if
> the set is searched linearly why does it take 3 times as long?


Here is a plausible explanation. A vector stores integers in a
contiguous area of memory, and recent processors have pretty efficient
hardware prefetchers, which give their best when scanning an array
(basically what you are doing when traversing a vector). STL set, on the
other hand, allocates a new block for each element, so find needs to
follow pointers. There is no good reason for these pointers to exhibit
any kind of regularity, so prefetchers are inoperant, and you pay the
cache miss cost on every access (if set cells are about the size of a
cache line). Yes, data locality matters...

Still, it would be spectacular to observe this effect on 10^5 items.

-- Alain.
 
Reply With Quote
 
 
 
 
boltar2003@boltar.world
Guest
Posts: n/a
 
      08-07-2012
On Tue, 07 Aug 2012 15:36:19 +0200
Alain Ketterlin <(E-Mail Removed)-strasbg.fr> wrote:
>(E-Mail Removed) writes:
>
>> I'm storing a large (100K) list of integer id's in a set which I need
>> to frequenctly search for a given id. When I try doing this with a set
>> using the standard find() function it takes 3 times longer than doing
>> the same with a vector!

>
>It is kind of surprising with such a large number of items. Linear scan
>is certainly faster with small vectors/sets, but 10^5 should be enough
>to make algorithmic complexity be the major factor.
>
>What do you count exactly? Does your timer include filling up the set?
>Can you post some (simplified) code exhibiting the phenomenon?


No, doesn't include filling up but thats so fast it would hardly register.
I traced the delay down to the search of the set and timed that. Tried a
vector and it was 3 times faster. Its annoying because I wanted to use a
generic function with find() in it but now I'm having to explicitely check for
a set and use the sets built in find() which is about 200 times faster and I'm
presuming does do a binary chop.

>(basically what you are doing when traversing a vector). STL set, on the
>other hand, allocates a new block for each element, so find needs to
>follow pointers. There is no good reason for these pointers to exhibit


Ah , didn't realise that. I assume a set was the same internally as a vector
except kept ordered. I thought only lists and queues used pointers.

>Still, it would be spectacular to observe this effect on 10^5 items.


My code was nothing more than:

myiterator = find(mydata.begin(),mydata.end(),val);

So it would be fairly easy for someone to reproduce.

B2003

 
Reply With Quote
 
Joe Greer
Guest
Posts: n/a
 
      08-07-2012
(E-Mail Removed) wrote in news:jvr3p9$b7s$(E-Mail Removed):

> Hello
>
> I'm storing a large (100K) list of integer id's in a set which I need
> to frequenctly search for a given id. When I try doing this with a set
> using the standard find() function it takes 3 times longer than doing
> the same with a vector!
>
> Why is this? Surely given that its ordered a set could be searched
> using a binary chop whereas a vector must be a linear search. Even if
> the set is searched linearly why does it take 3 times as long?
>
> Thanks for any info
>
> B2003
>
>


If I had to guess (and I do ), I would guess that it has to do with the
CPU's cache and read prediction logic making things fly for the list of
integers. I suspect that things would start to change if you were dealing
with a list of pointers to an object instead. If the a class were big
enough, I suspect even a list of a non-trivial class would also start to
slow down.

joe
 
Reply With Quote
 
Alain Ketterlin
Guest
Posts: n/a
 
      08-07-2012
(E-Mail Removed) writes:

> On Tue, 07 Aug 2012 15:36:19 +0200
> Alain Ketterlin <(E-Mail Removed)-strasbg.fr> wrote:
>>(E-Mail Removed) writes:
>>
>>> I'm storing a large (100K) list of integer id's in a set which I need
>>> to frequenctly search for a given id. When I try doing this with a set
>>> using the standard find() function it takes 3 times longer than doing
>>> the same with a vector!

>>
>>It is kind of surprising with such a large number of items. Linear scan
>>is certainly faster with small vectors/sets, but 10^5 should be enough
>>to make algorithmic complexity be the major factor.
>>
>>What do you count exactly? Does your timer include filling up the set?
>>Can you post some (simplified) code exhibiting the phenomenon?

>
> No, doesn't include filling up but thats so fast it would hardly
> register.


Not sure. Inserting into a set may involve allocating memory, and that
is costly.

> I traced the delay down to the search of the set and timed that. Tried
> a vector and it was 3 times faster. Its annoying because I wanted to
> use a generic function with find() in it but now I'm having to
> explicitely check for a set and use the sets built in find() which is
> about 200 times faster and I'm presuming does do a binary chop.


You lost me. What is 200 times faster and what is 3 times longer? Do you
mean: myset.find(...) is fast, but find(myset.begin(),myset.end(),...)
is slow? Yes, obviously (both functions do not do the same thing).

OK, forget all this nonsense about cache lines etc. Of course
myset.find() is faster than find(myset.begin(), myset.end(),...). If you
want to stay generic, use a wrapper to std::find(), and specialize it
for sets.

-- Alain.
 
Reply With Quote
 
boltar2003@boltar.world
Guest
Posts: n/a
 
      08-07-2012
On Tue, 07 Aug 2012 16:23:17 +0200
Alain Ketterlin <(E-Mail Removed)-strasbg.fr> wrote:
>> No, doesn't include filling up but thats so fast it would hardly
>> register.

>
>Not sure. Inserting into a set may involve allocating memory, and that
>is costly.


Well, it seems quick when I time it. *shrug*

>> I traced the delay down to the search of the set and timed that. Tried
>> a vector and it was 3 times faster. Its annoying because I wanted to
>> use a generic function with find() in it but now I'm having to
>> explicitely check for a set and use the sets built in find() which is
>> about 200 times faster and I'm presuming does do a binary chop.

>
>You lost me. What is 200 times faster and what is 3 times longer? Do you


myset.find() is about 200 times faster than find(myset.begin().
Using the latter is also about 3 times slower on a set than a vector.

>mean: myset.find(...) is fast, but find(myset.begin(),myset.end(),...)
>is slow?


Yup.

>Yes, obviously (both functions do not do the same thing).


Thats what I didn't realise. I assumed the generic find() would be
specialised internally for the type of container it was operating on. I
didn't realise it just did a dumb linear search on everything.

B2003

 
Reply With Quote
 
Juha Nieminen
Guest
Posts: n/a
 
      08-07-2012
(E-Mail Removed) wrote:
>>Yes, obviously (both functions do not do the same thing).

>
> Thats what I didn't realise. I assumed the generic find() would be
> specialised internally for the type of container it was operating on. I
> didn't realise it just did a dumb linear search on everything.


Care to show us a version of find() that searches faster when it's given
iterators to a std::set?
 
Reply With Quote
 
boltar2003@boltar.world
Guest
Posts: n/a
 
      08-07-2012
On Tue, 7 Aug 2012 14:59:12 +0000 (UTC)
Juha Nieminen <(E-Mail Removed)> wrote:
>(E-Mail Removed) wrote:
>>>Yes, obviously (both functions do not do the same thing).

>>
>> Thats what I didn't realise. I assumed the generic find() would be
>> specialised internally for the type of container it was operating on. I
>> didn't realise it just did a dumb linear search on everything.

>
>Care to show us a version of find() that searches faster when it's given
>iterators to a std::set?


What?

B2003

 
Reply With Quote
 
Jorgen Grahn
Guest
Posts: n/a
 
      08-07-2012
On Tue, 2012-08-07, Juha Nieminen wrote:
> (E-Mail Removed) wrote:
>>>Yes, obviously (both functions do not do the same thing).

>>
>> Thats what I didn't realise. I assumed the generic find() would be
>> specialised internally for the type of container it was operating on. I
>> didn't realise it just did a dumb linear search on everything.

>
> Care to show us a version of find() that searches faster when it's given
> iterators to a std::set?


I don't think he claims there could be such a thing, or even that he's
criticizing our beloved STL! Hopefully he got enlightened instead.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
 
Reply With Quote
 
Jorgen Grahn
Guest
Posts: n/a
 
      08-07-2012
On Tue, 2012-08-07, Alain Ketterlin wrote:
> (E-Mail Removed) writes:
>
>> I'm storing a large (100K) list of integer id's in a set which I need
>> to frequenctly search for a given id. When I try doing this with a set
>> using the standard find() function it takes 3 times longer than doing
>> the same with a vector!

>
> It is kind of surprising with such a large number of items. Linear scan
> is certainly faster with small vectors/sets, but 10^5 should be enough
> to make algorithmic complexity be the major factor.

....
> Still, it would be spectacular to observe this effect on 10^5 items.


(Later postings showed that the OP didn't do tree-based lookups.)

FWIW, I expect that effect to be observable up to 20--100 items, on a
recent machine. I.e., if I really needed performance for semi-small
data sets, I'd do a benchmark before choosing std::map or similar.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Which has faster access Binary Search of std::vector or std::set??? John H. C++ 0 03-01-2010 11:05 PM
Initializing vector<vector<int> > and other vector questions... pmatos C++ 6 04-26-2007 05:39 PM
findcontrol("PlaceHolderPrice") why why why why why why why why why why why Mr. SweatyFinger ASP .Net 2 12-02-2006 03:46 PM
Free memory allocate by a STL vector, vector of vector, map of vector Allerdyce.John@gmail.com C++ 8 02-18-2006 12:48 AM
[Numeric] column vector faster than row vector in mat multiply? Zhang Le Python 1 03-04-2005 08:27 PM



Advertisments