Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > vector vs map iterator

Reply
Thread Tools

vector vs map iterator

 
 
Jim Langston
Guest
Posts: n/a
 
      07-02-2008
> "xyz" <> wrote in message
> news:85f819a0-6c66-4f13-8e22-...
> On Jul 2, 3:49 pm, Joe Greer <jgr...@doubletake.com> wrote:
> > xyz <lavanyaredd...@gmail.com> wrote in news:fb7c6ce6-5af3-4e24-8a0b-
> > dc4940bbf...@25g2000hsx.googlegroups.com:
> >
> > > I have to run the simulation of a trace file around (7gb contains
> > > 116million entries)...
> > > presently i am using vector iterators to check the conditions in my
> > > program.....
> > > it is taking 2 days to finish whole simulation.....

> >
> > > my question are the map iterators are faster than vector
> > > iterators.....
> > > does it improve the performance....
> > > thanks to all

> >
> > I am not sure what operations you are performing on your iterators, but
> > vector iterators are about as fast as you get. Maps are generally a tree
> > structure of some sort and moving from one node to the next will
> > generally
> > require loading a value from memory whereas with a vector, it will
> > simply
> > add or subtract a fixed amount from the address already in the iterator.
> > This tends to be faster. Same thing applies to a list. Vectors also have
> > better locality of reference so when an item is read, the whole page
> > comes
> > into memory and you get the objects surrounding the target for free.
> > maps,
> > sets, lists etc are better if their semantics match what you want to do
> > better. That is, if you find yourself searching for objects a lot, then
> > maps and sets are more natural. Though even then, sorting the vector and
> > using the binary search algorithms may perform better. The other thing
> > to
> > consider is that maps, sets, and lists have more overhead per object.
> > With
> > 116M objects, that can add up.


> for example i am doing the operation as below in my program
> whenever i receive a packet or something ...i will check does it
> contain in my vector list....
> so if I contain around 2 million entreis in my vector and i received
> around 100 packets...
> then the computation would be 100 packets * 2 million iterations...
>
> as my list goes on incresing I will have more iterations..
> this is the problem i am facing with my simulation..


Now we're getting somewhere. So the speed isue is on lookups. Yes, a map
should be much faster on a lookup. It is a binary tree lookup which is
something like O(log n) (not sure) where searching for data through a vector
is O(N).

It sounds like, however, that you can get away with a set if your data is
your key, what you are looking for. There is not much difference for lookup
times as far as I know between a map and a set, but a set should use up a
little less memory as it doesn't have to store both the key and the data,
only the data which is the key.

Be aware, however, that a set or a map is going to take up more memory.
This may, or may not, be a problem for you. A set and map store their data
in chunks. Each entry has it's own memory. A vector stores it's data in
one huge block of memory. A vector's overhead, therefore, is the size of a
pointer. A sets overhead is the size of the pointer for each entry, plus
the key entry and overhead.

Normally this wouldn't concern me that much, but you say you are using 2
million entries and depending on the size of the entry thsi could be
substantial. Or not.

You might also want to look at a hash map or hash set (if it exists?) if
yoru data is string (text) data. I understand this increases lookup time
even a bit more.


 
Reply With Quote
 
 
 
 
Jim Langston
Guest
Posts: n/a
 
      07-02-2008
"xyz" <> wrote in message
news:004ef04f-96cb-4e78-a735-...
On Jul 2, 4:10 pm, "Jim Langston" <tazmas...@rocketmail.com> wrote:
> "xyz" <lavanyaredd...@gmail.com> wrote in message
>
> news:fb7c6ce6-5af3-4e24-8a0b-...
>
> >I have to run the simulation of a trace file around (7gb contains
> > 116million entries)...
> > presently i am using vector iterators to check the conditions in my
> > program.....
> > it is taking 2 days to finish whole simulation.....

>
> > my question are the map iterators are faster than vector
> > iterators.....
> > does it improve the performance....
> > thanks to all

>
> Performance for what? Insertions? Deletions? Insertions in middle? End?
> Beginning? Deletion is middle? End? Beginning? Lookups in beginning?
> End? Middle?
>
> Different containers (vector, set, map, etc...) are designed for different
> tasks and each has it's power and it's weakness.
>
> Maybe this link will
> help:http://www.linuxsoftware.co.nz/cppcontainers.html
> maybe it won't. Check the bottom anyway to determine which container to
> chose.
>
> Also, the wiki has a little bit about the speed of some of the
> containers:http://en.wikipedia.org/wiki/Standard_Template_Library
>
> Really, without knowing what you are trying to optmize it is hard to say.
>
> std::vector<MyClass> MyVector;
> /*... */
> MyVector[SomeCounter]
> should be a relatively fast operation, very similar to pointer math.
>
> std::map<Mykey, MyClass> MyMap;
> /* ... */
> MyMap.find( SomeKey );
> is a binary search lookup.
> MyMap[SomeKey]
> is also a binary key lookup, with the additon of possibly adding the key
> and
> data.
>
> Without knowing how you are using the vector it is hard to say. One thing
> I
> would hope, however is that you are preallocating enough memory for your
> vector so that it doesn't have to keep newing when it runs out of memory.
>
> I have no idea why your operation is taking 2 days, maybe it should be.
> Maybe it shouldn't. :But without knowing more about what you are actually
> doing anything we come up with is a shot in the da



as i said...if my checking element matches one of the element in my
vector list then i will collect the statistics...
if it doesnt matches any one of the vector elements then i will move
this checking elment to a new vector...

i hope u all undestand

-----

Yes, I read your other post. Yes, a map or set would definately be faster
for lookups. Read my other post.


 
Reply With Quote
 
 
 
 
Jerry Coffin
Guest
Posts: n/a
 
      07-02-2008
In article <85f819a0-6c66-4f13-8e22-eb11d431b773
@c58g2000hsc.googlegroups.com>, says...

[ ... ]

> for example i am doing the operation as below in my program
> whenever i receive a packet or something ...i will check does it
> contain in my vector list....
> so if I contain around 2 million entreis in my vector and i received
> around 100 packets...
> then the computation would be 100 packets * 2 million iterations...
>
> as my list goes on incresing I will have more iterations..
> this is the problem i am facing with my simulation..


[ and elsethread: ]

> as i said...if my checking element matches one of the element in my
> vector list then i will collect the statistics...
> if it doesnt matches any one of the vector elements then i will move
> this checking elment to a new vector...


Okay, from the sound of things, the issue is really with searches, not
with the iterators per se. From your description, you're currently using
a linear search, which is pretty slow, as you've pointed out. A binary
search should be much faster (logarithmic instead of linear complexity).

An std::set or std::map will use a binary search, but that does NOT
necessarily mean it's the best structure to use. I'm not _entirely_ sure
I follow what you're saying, but it _sounds_ like the searches are being
done in a fixed set of items -- i.e. you're _not_ (for example) adding
are removing items from that set during a single run.

If that's correct, then you're probably better off continuing to use a
vector, but sorting it and using binary_search or upper_bound (or
possibly lower_bound) to search it.

Given the number of items you're dealing with, you might also want to
try using an unordered_set instead -- this uses a hash map, so its speed
is normally nearly constant regardless of the number of items being
searched.

--
Later,
Jerry.

The universe is a figment of its own imagination.
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      07-02-2008
On Jul 2, 4:26 pm, "Jim Langston" <tazmas...@rocketmail.com> wrote:
> > "xyz" <lavanyaredd...@gmail.com> wrote in message
> > > With
> > > 116M objects, that can add up.

> > for example i am doing the operation as below in my program
> > whenever i receive a packet or something ...i will check does it
> > contain in my vector list....
> > so if I contain around 2 million entreis in my vector and i received
> > around 100 packets...
> > then the computation would be 100 packets * 2 million iterations...


> > as my list goes on incresing I will have more iterations..
> > this is the problem i am facing with my simulation..


> Now we're getting somewhere. So the speed isue is on lookups.
> Yes, a map should be much faster on a lookup. It is a binary
> tree lookup which is something like O(log n) (not sure) where
> searching for data through a vector is O(N).


> It sounds like, however, that you can get away with a set if
> your data is your key, what you are looking for. There is not
> much difference for lookup times as far as I know between a
> map and a set, but a set should use up a little less memory as
> it doesn't have to store both the key and the data, only the
> data which is the key.


> Be aware, however, that a set or a map is going to take up
> more memory. This may, or may not, be a problem for you. A
> set and map store their data in chunks. Each entry has it's
> own memory. A vector stores it's data in one huge block of
> memory. A vector's overhead, therefore, is the size of a
> pointer. A sets overhead is the size of the pointer for each
> entry, plus the key entry and overhead.


The size of several poniters (two or three) for each entry. If
table can be filled once, up front, then a sorted vector is
definitely the way to go (using std::lower_bound for lookup).
Or a hash table.

> Normally this wouldn't concern me that much, but you say you
> are using 2 million entries and depending on the size of the
> entry thsi could be substantial. Or not.


Note that if the implementation of std::string he's using
doesn't use the small string optimization, or his strings are to
long for it to optimize, a custom string class which doesn't use
dynamic memory could also make a significant difference. This
largely depends on the contents of the strings, however.

> You might also want to look at a hash map or hash set (if it
> exists?) if yoru data is string (text) data. I understand
> this increases lookup time even a bit more.


With a good hashing function, look up in a hash table is O(1).
With something like 2 million entries, the difference between
O(ln n) and O(1) can be significant. (In my own tests, hash
tables beat std::map or a sorted factor by a ratio of two to one
for 1200 entries, and almost five to one for a million entries.)

--
James Kanze (GABI Software) email:
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
 
Reply With Quote
 
Duane Hebert
Guest
Posts: n/a
 
      07-02-2008
>as i said...if my checking element matches one of the element in my
>vector list then i will collect the statistics...
>if it doesnt matches any one of the vector elements then i will move
>this checking elment to a new vector...


If you add elements to a vector, it's going to be forced to maintain
them contiguously so it will sometime need to copy the whole
thing. This can cause unexpected results unless you're able to
reserve enough memory.

If your slowdown is due to searches and growing the vectors then
you could try a map (or set) and benchmark it.


 
Reply With Quote
 
Duane Hebert
Guest
Posts: n/a
 
      07-03-2008

"James Kanze" <> wrote in message
news:40e49767-091e-406d-92c0-...

>With a good hashing function, look up in a hash table is O(1).
>With something like 2 million entries, the difference between
>O(ln n) and O(1) can be significant. (In my own tests, hash
>tables beat std::map or a sorted factor by a ratio of two to one
>for 1200 entries, and almost five to one for a million entries.)



True but he's saying that his code is bottlenecked in the
search but then if the object isn't found, he's pushing it
into a vector. It's not clear if it's the same vector.

If it is and he needs to keep it sorted, inserting in the middle
of the vector may offset any value from the hash table.
I guess it would depend on percentage of misses or
whatever. Just appending at the end may involve
reallocation.

While vectors are often the best choice, there are cases
where we have to look at the implementation and maintaining
contiguous values can cause overhead that you can't estimate
by complexity measures. The OP should benchmark different
strategies. My guess would be that your idea or a map will
probably be the best.


 
Reply With Quote
 
Michael DOUBEZ
Guest
Posts: n/a
 
      07-03-2008
Duane Hebert a écrit :
> "James Kanze" <> wrote in message
> news:40e49767-091e-406d-92c0-...
>
>> With a good hashing function, look up in a hash table is O(1).
>> With something like 2 million entries, the difference between
>> O(ln n) and O(1) can be significant. (In my own tests, hash
>> tables beat std::map or a sorted factor by a ratio of two to one
>> for 1200 entries, and almost five to one for a million entries.)

>
>
> True but he's saying that his code is bottlenecked in the
> search but then if the object isn't found, he's pushing it
> into a vector. It's not clear if it's the same vector.
>
> If it is and he needs to keep it sorted, inserting in the middle
> of the vector may offset any value from the hash table.
> I guess it would depend on percentage of misses or
> whatever. Just appending at the end may involve
> reallocation.


If he is using a hash system (perhaps a std::tr1::unordered_map<>), the
the data structure and the deferencing system are adapted: the hash can
associate the index in the vector, or the data can be put directly in
the hash container.

And in the case of a hash, the vector doesn't need to be sorted.

> While vectors are often the best choice, there are cases
> where we have to look at the implementation and maintaining
> contiguous values can cause overhead that you can't estimate
> by complexity measures. The OP should benchmark different
> strategies. My guess would be that your idea or a map will
> probably be the best.


A map only guarantee a O(log(n)) search and consumes at least 2 pointers
per node for the tree (this means a minimum of 16Mbytes of pointers
overhead); it is a hard hit in the balance.

A vector has a cost upon insertion. But this can be leveraged by having
a sorted vector/area and an unsorted vector/area; upon a trigger (size,
time, load, reserve depleted), all elements of the unsorted area are
inserted.

Hash are harder to tune. But very effective in search and insertion time.

--
Michael
 
Reply With Quote
 
Duane Hebert
Guest
Posts: n/a
 
      07-04-2008

"Michael DOUBEZ" <> wrote in message
news:486cce9d$0$24432$...
> A map only guarantee a O(log(n)) search and consumes at least 2 pointers
> per node for the tree (this means a minimum of 16Mbytes of pointers
> overhead); it is a hard hit in the balance.
>
> A vector has a cost upon insertion. But this can be leveraged by having a
> sorted vector/area and an unsorted vector/area; upon a trigger (size,
> time, load, reserve depleted), all elements of the unsorted area are
> inserted.
>
> Hash are harder to tune. But very effective in search and insertion time.


This sounds like an instance when a hard to tune hash may
be worth the trouble.

If the vectors don't need to be maintained sorted, that helps a
bunch but you still have to deal with the reallocation if you can't
reserve a size. Since he's talking about a large data set, this
could be an issue.

This is all pretty much speculation until it's benchmarked. Especially
considering that we don't know what the system limitations are.
Maybe 16 Mb of pointers isn't an issue in relation to maintaining
contiguous memory.


 
Reply With Quote
 
Michael DOUBEZ
Guest
Posts: n/a
 
      07-04-2008
Duane Hebert a écrit :
> "Michael DOUBEZ" <> wrote in message
> news:486cce9d$0$24432$...
>> A map only guarantee a O(log(n)) search and consumes at least 2 pointers
>> per node for the tree (this means a minimum of 16Mbytes of pointers
>> overhead); it is a hard hit in the balance.
>>
>> A vector has a cost upon insertion. But this can be leveraged by having a
>> sorted vector/area and an unsorted vector/area; upon a trigger (size,
>> time, load, reserve depleted), all elements of the unsorted area are
>> inserted.
>>
>> Hash are harder to tune. But very effective in search and insertion time.

>
> This sounds like an instance when a hard to tune hash may
> be worth the trouble.




> If the vectors don't need to be maintained sorted, that helps a
> bunch but you still have to deal with the reallocation if you can't
> reserve a size. Since he's talking about a large data set, this
> could be an issue.


It depends if insertion occur a lot or not.
If the size of the set of different element is logarithmic, I would
expect insertion to happen a lot at the beginning an less at the end; if
it exponential, well it is however hard to handle.

From tuning you can deduce this kind of parameters and in particular
the average number of element which can be reserved up front.

However, if he uses a hash. It is simpler to use it as a container and
not bother with a duplicated structure.

> This is all pretty much speculation until it's benchmarked. Especially
> considering that we don't know what the system limitations are.
> Maybe 16 Mb of pointers isn't an issue in relation to maintaining
> contiguous memory.


Yes. That is my I simply stated the tradeoff I saw without dismissing
any of them. It is true that, given the problematic of the OP, I don't
favor the map<> solution because it consumes a lot for a very small gain
(insertion and search in logarithm time). Now he may have other needs
that requires a map<> (such as having a sorted container).


--
Michael
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      07-04-2008
On Jul 3, 1:30 pm, "Duane Hebert" <s...@flarn2.com> wrote:
> "James Kanze" <james.ka...@gmail.com> wrote in message


> news:40e49767-091e-406d-92c0-...


> >With a good hashing function, look up in a hash table is O(1).
> >With something like 2 million entries, the difference between
> >O(ln n) and O(1) can be significant. (In my own tests, hash
> >tables beat std::map or a sorted factor by a ratio of two to one
> >for 1200 entries, and almost five to one for a million entries.)


> True but he's saying that his code is bottlenecked in the
> search but then if the object isn't found, he's pushing it
> into a vector. It's not clear if it's the same vector.


It's not clear at all what he does if he don't find the element.
(At one point, he says something about a "new" vector, but I'm
not sure what he's doing with it.)

> If it is and he needs to keep it sorted, inserting in the
> middle of the vector may offset any value from the hash table.


I'm afraid I don't understand. The hash table contains the
elements he's comparing against. Depending on what he's doing
he may want to insert the element he didn't find into the hash
table, or he may not. There's been no indication of having to
keep any sort of order (or at least, I didn't see it).

--
James Kanze (GABI Software) email:
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
can I use stl vector iterator to delete a vector of pointers? zl2k C++ 27 09-07-2010 11:47 AM
Initializing vector<vector<int> > and other vector questions... pmatos C++ 6 04-26-2007 05:39 PM
will an iterator to a map becomes invalid when an element is inserted into the map wolverine C++ 3 07-31-2006 12:24 PM
Free memory allocate by a STL vector, vector of vector, map of vector Allerdyce.John@gmail.com C++ 8 02-18-2006 12:48 AM
How to loop in a vector (set, map, etc) from iterator i to iteratorj Tony Young C++ 4 04-09-2005 12:59 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57