Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C++ (http://www.velocityreviews.com/forums/f39-c.html)
-   -   Why is a vector find faster than a set with find()? (http://www.velocityreviews.com/forums/t949217-why-is-a-vector-find-faster-than-a-set-with-find.html)

 boltar2003@boltar.world 08-07-2012 01:04 PM

Why is a vector find faster than a set with find()?

Hello

I'm storing a large (100K) list of integer id's in a set which I need to
frequenctly search for a given id. When I try doing this with a set using the
standard find() function it takes 3 times longer than doing the same with a
vector!

Why is this? Surely given that its ordered a set could be searched using
a binary chop whereas a vector must be a linear search. Even if the set is
searched linearly why does it take 3 times as long?

Thanks for any info

B2003

 Alain Ketterlin 08-07-2012 01:36 PM

Re: Why is a vector find faster than a set with find()?

boltar2003@boltar.world writes:

> I'm storing a large (100K) list of integer id's in a set which I need
> to frequenctly search for a given id. When I try doing this with a set
> using the standard find() function it takes 3 times longer than doing
> the same with a vector!

It is kind of surprising with such a large number of items. Linear scan
is certainly faster with small vectors/sets, but 10^5 should be enough
to make algorithmic complexity be the major factor.

What do you count exactly? Does your timer include filling up the set?
Can you post some (simplified) code exhibiting the phenomenon?

> Why is this? Surely given that its ordered a set could be searched
> using a binary chop whereas a vector must be a linear search. Even if
> the set is searched linearly why does it take 3 times as long?

Here is a plausible explanation. A vector stores integers in a
contiguous area of memory, and recent processors have pretty efficient
hardware prefetchers, which give their best when scanning an array
(basically what you are doing when traversing a vector). STL set, on the
other hand, allocates a new block for each element, so find needs to
follow pointers. There is no good reason for these pointers to exhibit
any kind of regularity, so prefetchers are inoperant, and you pay the
cache miss cost on every access (if set cells are about the size of a
cache line). Yes, data locality matters...

Still, it would be spectacular to observe this effect on 10^5 items.

-- Alain.

 boltar2003@boltar.world 08-07-2012 01:45 PM

Re: Why is a vector find faster than a set with find()?

On Tue, 07 Aug 2012 15:36:19 +0200
Alain Ketterlin <alain@dpt-info.u-strasbg.fr> wrote:
>boltar2003@boltar.world writes:
>
>> I'm storing a large (100K) list of integer id's in a set which I need
>> to frequenctly search for a given id. When I try doing this with a set
>> using the standard find() function it takes 3 times longer than doing
>> the same with a vector!

>
>It is kind of surprising with such a large number of items. Linear scan
>is certainly faster with small vectors/sets, but 10^5 should be enough
>to make algorithmic complexity be the major factor.
>
>What do you count exactly? Does your timer include filling up the set?
>Can you post some (simplified) code exhibiting the phenomenon?

No, doesn't include filling up but thats so fast it would hardly register.
I traced the delay down to the search of the set and timed that. Tried a
vector and it was 3 times faster. Its annoying because I wanted to use a
generic function with find() in it but now I'm having to explicitely check for
a set and use the sets built in find() which is about 200 times faster and I'm
presuming does do a binary chop.

>(basically what you are doing when traversing a vector). STL set, on the
>other hand, allocates a new block for each element, so find needs to
>follow pointers. There is no good reason for these pointers to exhibit

Ah , didn't realise that. I assume a set was the same internally as a vector
except kept ordered. I thought only lists and queues used pointers.

>Still, it would be spectacular to observe this effect on 10^5 items.

My code was nothing more than:

myiterator = find(mydata.begin(),mydata.end(),val);

So it would be fairly easy for someone to reproduce.

B2003

 Joe Greer 08-07-2012 02:17 PM

Re: Why is a vector find faster than a set with find()?

boltar2003@boltar.world wrote in news:jvr3p9\$b7s\$1@speranza.aioe.org:

> Hello
>
> I'm storing a large (100K) list of integer id's in a set which I need
> to frequenctly search for a given id. When I try doing this with a set
> using the standard find() function it takes 3 times longer than doing
> the same with a vector!
>
> Why is this? Surely given that its ordered a set could be searched
> using a binary chop whereas a vector must be a linear search. Even if
> the set is searched linearly why does it take 3 times as long?
>
> Thanks for any info
>
> B2003
>
>

If I had to guess (and I do :) ), I would guess that it has to do with the
CPU's cache and read prediction logic making things fly for the list of
integers. I suspect that things would start to change if you were dealing
with a list of pointers to an object instead. If the a class were big
enough, I suspect even a list of a non-trivial class would also start to
slow down.

joe

 Alain Ketterlin 08-07-2012 02:23 PM

Re: Why is a vector find faster than a set with find()?

boltar2003@boltar.world writes:

> On Tue, 07 Aug 2012 15:36:19 +0200
> Alain Ketterlin <alain@dpt-info.u-strasbg.fr> wrote:
>>boltar2003@boltar.world writes:
>>
>>> I'm storing a large (100K) list of integer id's in a set which I need
>>> to frequenctly search for a given id. When I try doing this with a set
>>> using the standard find() function it takes 3 times longer than doing
>>> the same with a vector!

>>
>>It is kind of surprising with such a large number of items. Linear scan
>>is certainly faster with small vectors/sets, but 10^5 should be enough
>>to make algorithmic complexity be the major factor.
>>
>>What do you count exactly? Does your timer include filling up the set?
>>Can you post some (simplified) code exhibiting the phenomenon?

>
> No, doesn't include filling up but thats so fast it would hardly
> register.

Not sure. Inserting into a set may involve allocating memory, and that
is costly.

> I traced the delay down to the search of the set and timed that. Tried
> a vector and it was 3 times faster. Its annoying because I wanted to
> use a generic function with find() in it but now I'm having to
> explicitely check for a set and use the sets built in find() which is
> about 200 times faster and I'm presuming does do a binary chop.

You lost me. What is 200 times faster and what is 3 times longer? Do you
mean: myset.find(...) is fast, but find(myset.begin(),myset.end(),...)
is slow? Yes, obviously (both functions do not do the same thing).

OK, forget all this nonsense about cache lines etc. Of course
myset.find() is faster than find(myset.begin(), myset.end(),...). If you
want to stay generic, use a wrapper to std::find(), and specialize it
for sets.

-- Alain.

 boltar2003@boltar.world 08-07-2012 02:49 PM

Re: Why is a vector find faster than a set with find()?

On Tue, 07 Aug 2012 16:23:17 +0200
Alain Ketterlin <alain@dpt-info.u-strasbg.fr> wrote:
>> No, doesn't include filling up but thats so fast it would hardly
>> register.

>
>Not sure. Inserting into a set may involve allocating memory, and that
>is costly.

Well, it seems quick when I time it. *shrug*

>> I traced the delay down to the search of the set and timed that. Tried
>> a vector and it was 3 times faster. Its annoying because I wanted to
>> use a generic function with find() in it but now I'm having to
>> explicitely check for a set and use the sets built in find() which is
>> about 200 times faster and I'm presuming does do a binary chop.

>
>You lost me. What is 200 times faster and what is 3 times longer? Do you

myset.find() is about 200 times faster than find(myset.begin().
Using the latter is also about 3 times slower on a set than a vector.

>mean: myset.find(...) is fast, but find(myset.begin(),myset.end(),...)
>is slow?

Yup.

>Yes, obviously (both functions do not do the same thing).

Thats what I didn't realise. I assumed the generic find() would be
specialised internally for the type of container it was operating on. I
didn't realise it just did a dumb linear search on everything.

B2003

 Juha Nieminen 08-07-2012 02:59 PM

Re: Why is a vector find faster than a set with find()?

boltar2003@boltar.world wrote:
>>Yes, obviously (both functions do not do the same thing).

>
> Thats what I didn't realise. I assumed the generic find() would be
> specialised internally for the type of container it was operating on. I
> didn't realise it just did a dumb linear search on everything.

Care to show us a version of find() that searches faster when it's given
iterators to a std::set?

 boltar2003@boltar.world 08-07-2012 03:19 PM

Re: Why is a vector find faster than a set with find()?

On Tue, 7 Aug 2012 14:59:12 +0000 (UTC)
Juha Nieminen <nospam@thanks.invalid> wrote:
>boltar2003@boltar.world wrote:
>>>Yes, obviously (both functions do not do the same thing).

>>
>> Thats what I didn't realise. I assumed the generic find() would be
>> specialised internally for the type of container it was operating on. I
>> didn't realise it just did a dumb linear search on everything.

>
>Care to show us a version of find() that searches faster when it's given
>iterators to a std::set?

What?

B2003

 Jorgen Grahn 08-07-2012 04:18 PM

Re: Why is a vector find faster than a set with find()?

On Tue, 2012-08-07, Juha Nieminen wrote:
> boltar2003@boltar.world wrote:
>>>Yes, obviously (both functions do not do the same thing).

>>
>> Thats what I didn't realise. I assumed the generic find() would be
>> specialised internally for the type of container it was operating on. I
>> didn't realise it just did a dumb linear search on everything.

>
> Care to show us a version of find() that searches faster when it's given
> iterators to a std::set?

I don't think he claims there could be such a thing, or even that he's
criticizing our beloved STL! Hopefully he got enlightened instead.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

 Jorgen Grahn 08-07-2012 04:24 PM

Re: Why is a vector find faster than a set with find()?

On Tue, 2012-08-07, Alain Ketterlin wrote:
> boltar2003@boltar.world writes:
>
>> I'm storing a large (100K) list of integer id's in a set which I need
>> to frequenctly search for a given id. When I try doing this with a set
>> using the standard find() function it takes 3 times longer than doing
>> the same with a vector!

>
> It is kind of surprising with such a large number of items. Linear scan
> is certainly faster with small vectors/sets, but 10^5 should be enough
> to make algorithmic complexity be the major factor.

....
> Still, it would be spectacular to observe this effect on 10^5 items.

(Later postings showed that the OP didn't do tree-based lookups.)

FWIW, I expect that effect to be observable up to 20--100 items, on a
recent machine. I.e., if I really needed performance for semi-small
data sets, I'd do a benchmark before choosing std::map or similar.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

All times are GMT. The time now is 07:58 PM.