![]() |
Why is a vector find faster than a set with find()?
Hello
I'm storing a large (100K) list of integer id's in a set which I need to frequenctly search for a given id. When I try doing this with a set using the standard find() function it takes 3 times longer than doing the same with a vector! Why is this? Surely given that its ordered a set could be searched using a binary chop whereas a vector must be a linear search. Even if the set is searched linearly why does it take 3 times as long? Thanks for any info B2003 |
Re: Why is a vector find faster than a set with find()?
boltar2003@boltar.world writes:
> I'm storing a large (100K) list of integer id's in a set which I need > to frequenctly search for a given id. When I try doing this with a set > using the standard find() function it takes 3 times longer than doing > the same with a vector! It is kind of surprising with such a large number of items. Linear scan is certainly faster with small vectors/sets, but 10^5 should be enough to make algorithmic complexity be the major factor. What do you count exactly? Does your timer include filling up the set? Can you post some (simplified) code exhibiting the phenomenon? > Why is this? Surely given that its ordered a set could be searched > using a binary chop whereas a vector must be a linear search. Even if > the set is searched linearly why does it take 3 times as long? Here is a plausible explanation. A vector stores integers in a contiguous area of memory, and recent processors have pretty efficient hardware prefetchers, which give their best when scanning an array (basically what you are doing when traversing a vector). STL set, on the other hand, allocates a new block for each element, so find needs to follow pointers. There is no good reason for these pointers to exhibit any kind of regularity, so prefetchers are inoperant, and you pay the cache miss cost on every access (if set cells are about the size of a cache line). Yes, data locality matters... Still, it would be spectacular to observe this effect on 10^5 items. -- Alain. |
Re: Why is a vector find faster than a set with find()?
On Tue, 07 Aug 2012 15:36:19 +0200
Alain Ketterlin <alain@dpt-info.u-strasbg.fr> wrote: >boltar2003@boltar.world writes: > >> I'm storing a large (100K) list of integer id's in a set which I need >> to frequenctly search for a given id. When I try doing this with a set >> using the standard find() function it takes 3 times longer than doing >> the same with a vector! > >It is kind of surprising with such a large number of items. Linear scan >is certainly faster with small vectors/sets, but 10^5 should be enough >to make algorithmic complexity be the major factor. > >What do you count exactly? Does your timer include filling up the set? >Can you post some (simplified) code exhibiting the phenomenon? No, doesn't include filling up but thats so fast it would hardly register. I traced the delay down to the search of the set and timed that. Tried a vector and it was 3 times faster. Its annoying because I wanted to use a generic function with find() in it but now I'm having to explicitely check for a set and use the sets built in find() which is about 200 times faster and I'm presuming does do a binary chop. >(basically what you are doing when traversing a vector). STL set, on the >other hand, allocates a new block for each element, so find needs to >follow pointers. There is no good reason for these pointers to exhibit Ah , didn't realise that. I assume a set was the same internally as a vector except kept ordered. I thought only lists and queues used pointers. >Still, it would be spectacular to observe this effect on 10^5 items. My code was nothing more than: myiterator = find(mydata.begin(),mydata.end(),val); So it would be fairly easy for someone to reproduce. B2003 |
Re: Why is a vector find faster than a set with find()?
boltar2003@boltar.world wrote in news:jvr3p9$b7s$1@speranza.aioe.org:
> Hello > > I'm storing a large (100K) list of integer id's in a set which I need > to frequenctly search for a given id. When I try doing this with a set > using the standard find() function it takes 3 times longer than doing > the same with a vector! > > Why is this? Surely given that its ordered a set could be searched > using a binary chop whereas a vector must be a linear search. Even if > the set is searched linearly why does it take 3 times as long? > > Thanks for any info > > B2003 > > If I had to guess (and I do :) ), I would guess that it has to do with the CPU's cache and read prediction logic making things fly for the list of integers. I suspect that things would start to change if you were dealing with a list of pointers to an object instead. If the a class were big enough, I suspect even a list of a non-trivial class would also start to slow down. joe |
Re: Why is a vector find faster than a set with find()?
boltar2003@boltar.world writes:
> On Tue, 07 Aug 2012 15:36:19 +0200 > Alain Ketterlin <alain@dpt-info.u-strasbg.fr> wrote: >>boltar2003@boltar.world writes: >> >>> I'm storing a large (100K) list of integer id's in a set which I need >>> to frequenctly search for a given id. When I try doing this with a set >>> using the standard find() function it takes 3 times longer than doing >>> the same with a vector! >> >>It is kind of surprising with such a large number of items. Linear scan >>is certainly faster with small vectors/sets, but 10^5 should be enough >>to make algorithmic complexity be the major factor. >> >>What do you count exactly? Does your timer include filling up the set? >>Can you post some (simplified) code exhibiting the phenomenon? > > No, doesn't include filling up but thats so fast it would hardly > register. Not sure. Inserting into a set may involve allocating memory, and that is costly. > I traced the delay down to the search of the set and timed that. Tried > a vector and it was 3 times faster. Its annoying because I wanted to > use a generic function with find() in it but now I'm having to > explicitely check for a set and use the sets built in find() which is > about 200 times faster and I'm presuming does do a binary chop. You lost me. What is 200 times faster and what is 3 times longer? Do you mean: myset.find(...) is fast, but find(myset.begin(),myset.end(),...) is slow? Yes, obviously (both functions do not do the same thing). OK, forget all this nonsense about cache lines etc. Of course myset.find() is faster than find(myset.begin(), myset.end(),...). If you want to stay generic, use a wrapper to std::find(), and specialize it for sets. -- Alain. |
Re: Why is a vector find faster than a set with find()?
On Tue, 07 Aug 2012 16:23:17 +0200
Alain Ketterlin <alain@dpt-info.u-strasbg.fr> wrote: >> No, doesn't include filling up but thats so fast it would hardly >> register. > >Not sure. Inserting into a set may involve allocating memory, and that >is costly. Well, it seems quick when I time it. *shrug* >> I traced the delay down to the search of the set and timed that. Tried >> a vector and it was 3 times faster. Its annoying because I wanted to >> use a generic function with find() in it but now I'm having to >> explicitely check for a set and use the sets built in find() which is >> about 200 times faster and I'm presuming does do a binary chop. > >You lost me. What is 200 times faster and what is 3 times longer? Do you myset.find() is about 200 times faster than find(myset.begin(). Using the latter is also about 3 times slower on a set than a vector. >mean: myset.find(...) is fast, but find(myset.begin(),myset.end(),...) >is slow? Yup. >Yes, obviously (both functions do not do the same thing). Thats what I didn't realise. I assumed the generic find() would be specialised internally for the type of container it was operating on. I didn't realise it just did a dumb linear search on everything. B2003 |
Re: Why is a vector find faster than a set with find()?
boltar2003@boltar.world wrote:
>>Yes, obviously (both functions do not do the same thing). > > Thats what I didn't realise. I assumed the generic find() would be > specialised internally for the type of container it was operating on. I > didn't realise it just did a dumb linear search on everything. Care to show us a version of find() that searches faster when it's given iterators to a std::set? |
Re: Why is a vector find faster than a set with find()?
On Tue, 7 Aug 2012 14:59:12 +0000 (UTC)
Juha Nieminen <nospam@thanks.invalid> wrote: >boltar2003@boltar.world wrote: >>>Yes, obviously (both functions do not do the same thing). >> >> Thats what I didn't realise. I assumed the generic find() would be >> specialised internally for the type of container it was operating on. I >> didn't realise it just did a dumb linear search on everything. > >Care to show us a version of find() that searches faster when it's given >iterators to a std::set? What? B2003 |
Re: Why is a vector find faster than a set with find()?
On Tue, 2012-08-07, Juha Nieminen wrote:
> boltar2003@boltar.world wrote: >>>Yes, obviously (both functions do not do the same thing). >> >> Thats what I didn't realise. I assumed the generic find() would be >> specialised internally for the type of container it was operating on. I >> didn't realise it just did a dumb linear search on everything. > > Care to show us a version of find() that searches faster when it's given > iterators to a std::set? I don't think he claims there could be such a thing, or even that he's criticizing our beloved STL! Hopefully he got enlightened instead. /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o . |
Re: Why is a vector find faster than a set with find()?
On Tue, 2012-08-07, Alain Ketterlin wrote:
> boltar2003@boltar.world writes: > >> I'm storing a large (100K) list of integer id's in a set which I need >> to frequenctly search for a given id. When I try doing this with a set >> using the standard find() function it takes 3 times longer than doing >> the same with a vector! > > It is kind of surprising with such a large number of items. Linear scan > is certainly faster with small vectors/sets, but 10^5 should be enough > to make algorithmic complexity be the major factor. .... > Still, it would be spectacular to observe this effect on 10^5 items. (Later postings showed that the OP didn't do tree-based lookups.) FWIW, I expect that effect to be observable up to 20--100 items, on a recent machine. I.e., if I really needed performance for semi-small data sets, I'd do a benchmark before choosing std::map or similar. /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o . |
| All times are GMT. The time now is 07:22 AM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.