Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C++ (http://www.velocityreviews.com/forums/f39-c.html)
-   -   Fast Container (http://www.velocityreviews.com/forums/t805452-fast-container.html)

Thomas Lehmann 11-02-2011 03:21 PM

Fast Container
 
There are written so many articles but no article is really providing
the answer I'm looking for.

Given a scenario with 100000 data elements the day (500000 the week).
Potential we might have 3000-5000 updates for individual elements
each second.

Imagine a filter that decides on each update the a data element will
be added, updated or removed.

The final container which is sorted by a user defined criteria must be
able to handle this amount of data.

Info: basically each value of that sorted container is a pointer to the
real data.

Final scope: we display the data in a table which provides index based
positions (row)

NOW, what is the right way to implement a fast container for this?

Some of the problems I was confronted with:
- Using a std::vector deletion/insertion is expensive, is it?
- Using a std::set I have no index access. Or is there a way to handle this properly?
- Somebody told me something about a weighted balanced tree as possible solution.
- Somebody else told me to use standard containers only.

Victor Bazarov 11-02-2011 04:40 PM

Re: Fast Container
 
On 11/2/2011 11:21 AM, Thomas Lehmann wrote:
> There are written so many articles but no article is really providing
> the answer I'm looking for.
>
> Given a scenario with 100000 data elements the day (500000 the week).
> Potential we might have 3000-5000 updates for individual elements
> each second.
>
> Imagine a filter that decides on each update the a data element will
> be added, updated or removed.
>
> The final container which is sorted by a user defined criteria must be
> able to handle this amount of data.
>
> Info: basically each value of that sorted container is a pointer to the
> real data.
>
> Final scope: we display the data in a table which provides index based
> positions (row)
>
> NOW, what is the right way to implement a fast container for this?
>
> Some of the problems I was confronted with:
> - Using a std::vector deletion/insertion is expensive, is it?
> - Using a std::set I have no index access. Or is there a way to handle this properly?
> - Somebody told me something about a weighted balanced tree as possible solution.
> - Somebody else told me to use standard containers only.


You didn't say whether the order of the objects in a container is
important. If you can define deletion from your custom vector as
swapping with the last element and resizing by -1, it's fast. Insertion
at the end is not too bad, and causes reallocation only once in a while
(when the vector grows past its capacity), and you can kind of control
that by managing the capacity at "calm" times (shooter's rule: reload
when you have a chance, don't wait 'til you're empty).

Also, consider std::deque, it's appends are even faster than vector's.

V
--
I do not respond to top-posted replies, please don't ask

jacob navia 11-02-2011 05:09 PM

Re: Fast Container
 
Le 02/11/11 16:43, Leigh Johnston a écrit :
[snip]

if you want fast inserts/deletes
> anywhere but also want fast indexing then consider using my
> "segmented_array" container which can be found at:
>
> http://i42.co.uk/stuff/segmented_array.htm
>
> /Leigh


Actually this is a list of vectors. Why should be faster
than using list and vector?

As far as I can read from your code
you do not use those containers. Why?

Thanks for publishing that code, it is full of ideas.

jacob


Werner 11-02-2011 09:06 PM

Re: Fast Container
 
On Nov 2, 5:21*pm, Thomas Lehmann
<thomas.lehmann.priv...@googlemail.com> wrote:
> There are written so many articles but no article is really providing
> the answer I'm looking for.
>
> Given a scenario with 100000 data elements the day (500000 the week).
> Potential we might have 3000-5000 updates for individual elements
> each second.
>
> Imagine a filter that decides on each update the a data element will
> be added, updated or removed.
>
> The final container which is sorted by a user defined criteria must be
> able to handle this amount of data.
>
> Info: basically each value of that sorted container is a pointer to the
> real data.
>
> Final scope: we display the data in a table which provides index based
> positions (row)
>
> NOW, what is the right way to implement a fast container for this?
>
> Some of the problems I was confronted with:
> - Using a std::vector deletion/insertion is expensive, is it?
> - Using a std::set I have no index access. Or is there a way to handle this properly?
> - Somebody told me something about a weighted balanced tree as possible solution.
> - Somebody else told me to use standard containers only.


I think you can use a combination of list and vector.
The list is used for fast insertions, deletions etc of the data
elements,
and the vector contains iterators that are sorted iaw. criteria as
defined by the actual data elements, and not the iterator (list
unsorted).

The vector always remains sorted (insertion of iterator via
lower_bound).

You have the benefit of random access iterators for fast access on
sorted sequence.
You have the benefit of fast insertion, removal.

Adding data elements happen by pushing back onto the list, and adding
the iterator to the sorted sequence (vector).
Removing data elements happen by finding the item in the (sorted)
vector
and using the iterator to erase the data element from the list. The
"referring" iterators in the vector remain valid.

Kind regards,

Werner



Werner 11-02-2011 09:11 PM

Re: Fast Container
 
On Nov 2, 11:06*pm, Werner <wer...@gmail.com> wrote:

> I think you can use a combination of list and vector.
> The list is used for fast insertions, deletions etc of the data
> elements,
> and the vector contains iterators that are sorted iaw. criteria as
> defined by the actual data elements, and not the iterator (list
> unsorted).


typedef std::list<DataElement> ElementList;
typedef std::vector<ElementList::iterator> ElementRefSequence;
ElementList elementList;
ElementRefSequence elementRefSequence;


Thomas Lehmann 11-03-2011 02:01 PM

Re: Fast Container
 
Well, I did say that it is a sorted container.

Thomas Lehmann 11-03-2011 02:12 PM

Re: Fast Container
 
Maybe I'm doing something wrong but comparing to vector
and deque your segmented_array is not faster.

- I'm compiling on Windows Ultimate 64 Bit
- I've tried with 64 and 1024 segments.
- 200000 elements (with push_back)
- took 5ms (vector took 2ms, deque took 5ms)
- The code looks same for deque and vector:

<code>
BOOST_AUTO_TEST_CASE(SegmentedArrayAddValues)
{
const unsigned int limit = 200000;

SegmentedArrayTest<int>* container = new SegmentedArrayTest<int>();
container->createContainer();

// creating content
for (unsigned int ix = 0; ix < limit; ++ix)
container->addValue(ix);

BOOST_CHECK_EQUAL(container->size(), (size_t)limit);

// cleanup
delete container;
}
</code>

Victor Bazarov 11-03-2011 02:13 PM

Re: Fast Container
 
On 11/3/2011 10:01 AM, Thomas Lehmann wrote:
> Well, I did say that it is a sorted container.


You did, I wasn't paying attention, sorry.

Still, take a look at 'std::deque'.

V
--
I do not respond to top-posted replies, please don't ask

Nick Keighley 11-03-2011 03:45 PM

Re: Fast Container
 
On Nov 2, 3:21*pm, Thomas Lehmann
<thomas.lehmann.priv...@googlemail.com> wrote:

I'm feeling I've missed something...

> There are written so many articles but no article is really providing
> the answer I'm looking for.
>
> Given a scenario with 100000 data elements the day (500000 the week).
> Potential we might have 3000-5000 updates for individual elements
> each second.


how are "elements" identified?

> Imagine a filter that decides on each update [that] a data element will
> be added, updated or removed.
>
> The final container which is sorted by a user defined criteria must be
> able to handle this amount of data.
>
> Info: basically each value of that sorted container is a pointer to the
> real data.
>
> Final scope: we display the data in a table which provides index based
> positions (row)


"final scope"? So when it's sorted it goes in a "table" that's
indexed?

> NOW, what is the right way to implement a fast container for this?


use a relational database.

I was thinking std::map but I don't know how your elements are
identified. std::map has reasonable performance for insertion and
deletion- anywhere in the container. And you can choose what to index
it with. at the end you can copy it into whatever your user finds
conventient.

> Some of the problems I was confronted with:
> - Using a std::vector deletion/insertion is expensive, is it?


yes. Elements are always contiuous so if you delete say the 50th
element of a 100 element vector its going to have to move elements
51-99 down one. At 500000 elements that gets expensive.

> - Using a std::set I have no index access. Or is there a way to handle this properly?


so you want to be able to index it? How are these indexes assigned?
Time?

> - Somebody told me something about a weighted balanced tree as possible solution.


some sort of tree. (which is what a std::map usually is under the
hood). B-tree? (that's not a binary tree).

> - Somebody else told me to use standard containers only.


why? I mean its a good place to start but is it an absolute
constraint- and why?

Do you have any performance criteria?



All times are GMT. The time now is 01:35 PM.

Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57