Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > STL: how to get the sequence number of a newly added item into a set

Reply
Thread Tools

STL: how to get the sequence number of a newly added item into a set

 
 
Jerry Coffin
Guest
Posts: n/a
 
      05-27-2008
In article <g1fabi$rc$(E-Mail Removed)>, http://www.velocityreviews.com/forums/(E-Mail Removed)
says...

[ ... ]

> > > A better way would have been if the insert() and find() functions would
> > > optionally deliver the distance value on the fly

> >
> > They would have to traverse the tree to do that.

>
> Yes, of course! Therefore the traversion does not need
> to be done again in distance() ! That's the point I'm trying
> to make clear. If you reread my inital posting then you will see that
> I said that I need the distance value immediately after an insert()
> or find() call... Do you see what I mean?


I see what you mean, but I think you're wrong. Consider (for example),
inserting an item that's going to be in the right subtree. insert()
compares the new item to the root node, finds that the new item is
larger than the root, and proceeds to the node to the right of the root.
In doing so, it has ignored the _entire_ left sub-tree.

When you use distance to find the position of the new node in the tree,
it cannot ignore the left subtree. Instead, it traverses the entire left
subtree counting each node as it goes. It then proceeds to count through
the right subtree until it reaches the point at which the new node was
inserted.

> Currently the tree is traversed twice: once in insert() and find(),
> and the second time in distance(). The second traversion is
> IMO unneccessary if insert() and find() would store this value
> somewhere internally, and distance() then just returns that value
> w/o traversing the tree.


See above. The traversals are entirely _different_ from each other. The
traversal involved in an insertion or deletion is NOT sufficient to
determine the overall position of the new node in the tree.

You _could_ determine an _extremely_ rough estimate of the overall
position in the tree with only a little extra work. A set, map, etc., is
normally implemented as a balanced binary tree (e.g. red-black or AVL
tree). To maintain balancing, such a tree stores some information about
the _relative_ depths of the left and right subtrees in each node. They
limit the difference in height between the left and right subtrees
(though r-b trees and AVL trees impose slightly different restrictions).

Based upon that height difference, and the depth of the tree you _did_
traverse, you could very quickly compute upper and lower bounds for the
overall position of the new node in the tree -- but those bounds may
easily differ from each other by a factor of 2 or so. Getting the
correct number is going to be linear.

[ ... ]

> I just want to make this last point clear: I just have pointed out
> IMO a shortcoming, performance issues or missing features
> of the language that I in the field have encountered.
> It's of course up to the language and library designers to
> understand the problem and to fix or implement the functionality.
> I'm just informing the people who are more involved in such issues.
> If I were the designer of the library I would do it the way I described above.


Maybe so -- but if you did, I doubt much of anybody would use your
library. In fact, except for this specific situation, even YOU probably
wouldn't use it. Making things work as you seem to want would involve a
_large_ slowdown in common operations in the hope of achieving a _small_
speedup for a relatively rare sequence of operations.

--
Later,
Jerry.

The universe is a figment of its own imagination.
 
Reply With Quote
 
 
 
 
Frank Birbacher
Guest
Posts: n/a
 
      05-27-2008
Hi!

bilgekhan schrieb:
>> As others have pointed out, "sequence numbers" aren't all that useful in
>> sets. It sounds like you would be better served by using a std::vector
>> and sorting it (using std::sort,) then you can do lookups with
>> std::lower_bound, upper_bound, equal_range, and binary_search and
>> getting the "sequence number" is simply a matter of a little pointer
>> math.

>
> I need to get the seqnbr for each item that gets added to
> the sorted collection, so then especially the sorting would be an overkill.


Your answer doesn't make sense to me. I guess you misunderstood the
advise. The advise propses what I had done if I were you:

DO NOT USE A STD::SET.

It doesn't fit your needs. The fact that you need the "index" of some
element shows it.

Replace the std::set<Whatever> by std::vector<Whatever> and try if you
can get along.

> FYI: I'm doing this for the table-lookup routine (ie. a dictionary) of
> a special data compressor for data records with common structure;
> ie. it's not general purpose.

[snip]
> If I were the designer of the library I would do it the way I described above.


Maybe all people writing a non general purpose data compressor with a
table-lookup routine will use your library then.

I probably wouldn't. Don't be offensed. It is just that std::set will
never have a "index_last_added" method. It's by design. If you need a
function like that you need to implement a new set class of your own.
Having done that come back and tell us how easy it was to determine the
exact size of a tree branch which is never visited or how to keep the
counts up-to-date which tell you the exact size, how you then managed to
keep the insert in O(log n). I think this is too complicated compared to
a sorted vector solution.

Frank
 
Reply With Quote
 
 
 
 
Daniel Pitts
Guest
Posts: n/a
 
      05-27-2008
Frank Birbacher wrote:
> Hi!
>
> bilgekhan schrieb:
>>> As others have pointed out, "sequence numbers" aren't all that useful
>>> in sets. It sounds like you would be better served by using a
>>> std::vector and sorting it (using std::sort,) then you can do lookups
>>> with std::lower_bound, upper_bound, equal_range, and binary_search
>>> and getting the "sequence number" is simply a matter of a little
>>> pointer math.

>>
>> I need to get the seqnbr for each item that gets added to
>> the sorted collection, so then especially the sorting would be an
>> overkill.

>
> Your answer doesn't make sense to me. I guess you misunderstood the
> advise. The advise propses what I had done if I were you:
>
> DO NOT USE A STD::SET.
>
> It doesn't fit your needs. The fact that you need the "index" of some
> element shows it.
>
> Replace the std::set<Whatever> by std::vector<Whatever> and try if you
> can get along.
>
>> FYI: I'm doing this for the table-lookup routine (ie. a dictionary) of
>> a special data compressor for data records with common structure;
>> ie. it's not general purpose.

> [snip]
>> If I were the designer of the library I would do it the way I
>> described above.

>
> Maybe all people writing a non general purpose data compressor with a
> table-lookup routine will use your library then.
>
> I probably wouldn't. Don't be offensed. It is just that std::set will
> never have a "index_last_added" method. It's by design. If you need a
> function like that you need to implement a new set class of your own.
> Having done that come back and tell us how easy it was to determine the
> exact size of a tree branch which is never visited or how to keep the
> counts up-to-date which tell you the exact size, how you then managed to
> keep the insert in O(log n). I think this is too complicated compared to
> a sorted vector solution.
>
> Frank

I agree with your point, but feel like pointing out that it is possible
(almost trivial) to have every node on a tree keep track of how many
children it has. An insert operation would increment the value on the
return-trip if the insert was successful.

My question to the OP is, do you need to update the insert-position of
the elements *after* the last inserted element? say I have an existing set:

<1, 5, 6> so, 5 and 6 are at position 1 and 2 respectively...
I insert 3:
<1, 3, 5, 6> and determine that 3 is at position 1, do I need to update
the associated data with 5 and 6 to say they are now at position 2 and 3
respectively?

Also to the OP:
If you design your library for your specific use-cases, but in such a
way that it can easily be extended, you'll find yourself much happier.
--
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
 
Reply With Quote
 
Daniel Pitts
Guest
Posts: n/a
 
      05-28-2008
Daniel T. wrote:
> Daniel Pitts <(E-Mail Removed)> wrote:
>
>> ...it is possible (almost trivial) to have every node on a tree
>> keep track of how many children it has. An insert operation would
>> increment the value on the return-trip if the insert was successful.

>
> I'm not so sure if an extra unsigned long per element can be considered
> "trivial".

I said "almost trivial", which refers to difficulty, not space complexity.

Also, it could easily be a compile-time decision if you needed it or not.

--
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
 
Reply With Quote
 
bilgekhan
Guest
Posts: n/a
 
      05-28-2008
"Daniel Pitts" wrote:
> Frank Birbacher wrote:
> >
> > Don't be offensed. It is just that std::set will never have a
> > "index_last_added" method. It's by design. If you need a
> > function like that you need to implement a new set class of your own.
> > Having done that come back and tell us how easy it was to determine the
> > exact size of a tree branch which is never visited or how to keep the
> > counts up-to-date which tell you the exact size, how you then managed to
> > keep the insert in O(log n). I think this is too complicated compared to
> > a sorted vector solution.
> >

> I agree with your point, but feel like pointing out that it is possible
> (almost trivial) to have every node on a tree keep track of how many
> children it has. An insert operation would increment the value on the
> return-trip if the insert was successful.
>
> My question to the OP is, do you need to update the insert-position of
> the elements *after* the last inserted element? say I have an existing set:
>
> <1, 5, 6> so, 5 and 6 are at position 1 and 2 respectively...
> I insert 3:
> <1, 3, 5, 6> and determine that 3 is at position 1, do I need to update
> the associated data with 5 and 6 to say they are now at position 2 and 3
> respectively?


From point of application code it is sufficient to know the position of
just only the *last* successful insert() operation and of the
*last* successful find() operation.
Ie. in the application code there is no need to keep track of the positions
of the other items; just that of the latest successful insert() or find().

 
Reply With Quote
 
bilgekhan
Guest
Posts: n/a
 
      05-28-2008
"Daniel Pitts" wrote:
> Frank Birbacher wrote:
> >
> > Don't be offensed. It is just that std::set will never have a
> > "index_last_added" method. It's by design. If you need a
> > function like that you need to implement a new set class of your own.
> > Having done that come back and tell us how easy it was to determine the
> > exact size of a tree branch which is never visited or how to keep the
> > counts up-to-date which tell you the exact size, how you then managed to
> > keep the insert in O(log n). I think this is too complicated compared to
> > a sorted vector solution.
> >

> I agree with your point, but feel like pointing out that it is possible
> (almost trivial) to have every node on a tree keep track of how many
> children it has. An insert operation would increment the value on the
> return-trip if the insert was successful.
>
> My question to the OP is, do you need to update the insert-position of
> the elements *after* the last inserted element? say I have an existing set:
>
> <1, 5, 6> so, 5 and 6 are at position 1 and 2 respectively...
> I insert 3:
> <1, 3, 5, 6> and determine that 3 is at position 1, do I need to update
> the associated data with 5 and 6 to say they are now at position 2 and 3
> respectively?


From point of application code it is sufficient to know the position of
just only the *last* successful insert() operation and of the
*last* successful find() operation.
Ie. in the application code there is no need to keep track of the positions
of the other items; just that of the latest successful insert() or find().

In the example above, after inserting 3 into this set
it just shall be possible to get the position
of this latest new item (ie. here the position of 3, ie. 1).
There is no need for the library to keep track or to update
anything defined in the application code that references items of the set.
Ie. there are no such references (at least not in my code).

 
Reply With Quote
 
Kai-Uwe Bux
Guest
Posts: n/a
 
      05-28-2008
Daniel T. wrote:

> Daniel Pitts <(E-Mail Removed)> wrote:
>
>> ...it is possible (almost trivial) to have every node on a tree
>> keep track of how many children it has. An insert operation would
>> increment the value on the return-trip if the insert was successful.

>
> I'm not so sure if an extra unsigned long per element can be considered
> "trivial".


It's not really "extra": Any balanced tree has to store some additional
information. This information can be _traded_ for the unsigned long since
size information can be used to rebalance. Then, alignment and memory
quantization from the allocator will very likely imply that

2 pointers + 1 unsigned long + user data

is simply indistinguishable from

2 pointers + balancing info + user data



Best

Kai-Uwe Bux
 
Reply With Quote
 
bilgekhan
Guest
Posts: n/a
 
      05-28-2008
"bilgekhan" wrote:
> "Daniel Pitts" wrote:
> > Frank Birbacher wrote:
> > >
> > > Don't be offensed. It is just that std::set will never have a
> > > "index_last_added" method. It's by design. If you need a
> > > function like that you need to implement a new set class of your own.
> > > Having done that come back and tell us how easy it was to determine the
> > > exact size of a tree branch which is never visited or how to keep the
> > > counts up-to-date which tell you the exact size, how you then managed to
> > > keep the insert in O(log n). I think this is too complicated compared to
> > > a sorted vector solution.
> > >

> > I agree with your point, but feel like pointing out that it is possible
> > (almost trivial) to have every node on a tree keep track of how many
> > children it has. An insert operation would increment the value on the
> > return-trip if the insert was successful.
> >
> > My question to the OP is, do you need to update the insert-position of
> > the elements *after* the last inserted element? say I have an existing set:
> >
> > <1, 5, 6> so, 5 and 6 are at position 1 and 2 respectively...
> > I insert 3:
> > <1, 3, 5, 6> and determine that 3 is at position 1, do I need to update
> > the associated data with 5 and 6 to say they are now at position 2 and 3
> > respectively?

>
> From point of application code it is sufficient to know the position of
> just only the *last* successful insert() operation and of the
> *last* successful find() operation.
> Ie. in the application code there is no need to keep track of the positions
> of the other items; just that of the latest successful insert() or find().
>
> In the example above, after inserting 3 into this set
> it just shall be possible to get the position
> of this latest new item (ie. here the position of 3, ie. 1).
> There is no need for the library to keep track or to update
> anything defined in the application code that references items of the set.
> Ie. there are no such references (at least not in my code).


Besides the above functionality a furthergoing super solution
would be if the traversing of the container would be possible
for all these 3 cases:
1) traverse in default (sorted) order
2) traverse in raw (physical) order
3) traverse in insert order

The 1st one is of course already possible.

The 2nd case could also efficiently be realized in application code
if only the above mentioned functionality would be possible. Ie. one then
would need to store and update the returned pos of the inserted item
in a std::vector<int> at position pos. But before doing this one would need
to move (reversely) all items from pos to the end by 1 position,
and while doing this incrementing by 1 those items that have a
value >= the pos to insert...

The 3rd could be done for example like this:
cursor c; // stores last walked pos etc.
while ((it = s.next(c)) != s.end())
do_something with *it

Ie. one would need to implement a "next(cursor& c)" method in the library...

These are just some ideas of mine...
At the moment I unfortunately don't have time to implement these
IMO useful functionalities myself, besides this I don't know the STL
that well, much less its internals. It seems one would need to
extend the underlying code of the RB or AVL tree and add
a uint data member to each block and update it with each insert, delete etc.
Long time ago I had worked on Btree implementations and I can imagine
that the most complicated case would be when a block would need
to be splitted or when two blocks need to be joined...

People interessted in implementing the above features
can also take a look at this interessting project:

"STL containers map, set, and list with fast array access" by Ray Virzi
"Source code for STL compliant container classes that add
fast indexing capability to existing container types":
http://www.codeproject.com/KB/stl/indexlist.aspx

 
Reply With Quote
 
bilgekhan
Guest
Posts: n/a
 
      05-28-2008
<This a fix to my prev. posting>

"bilgekhan" wrote:
> "Daniel Pitts" wrote:
> > Frank Birbacher wrote:
> > >
> > > Don't be offensed. It is just that std::set will never have a
> > > "index_last_added" method. It's by design. If you need a
> > > function like that you need to implement a new set class of your own.
> > > Having done that come back and tell us how easy it was to determine the
> > > exact size of a tree branch which is never visited or how to keep the
> > > counts up-to-date which tell you the exact size, how you then managed to
> > > keep the insert in O(log n). I think this is too complicated compared to
> > > a sorted vector solution.
> > >

> > I agree with your point, but feel like pointing out that it is possible
> > (almost trivial) to have every node on a tree keep track of how many
> > children it has. An insert operation would increment the value on the
> > return-trip if the insert was successful.
> >
> > My question to the OP is, do you need to update the insert-position of
> > the elements *after* the last inserted element? say I have an existing set:
> >
> > <1, 5, 6> so, 5 and 6 are at position 1 and 2 respectively...
> > I insert 3:
> > <1, 3, 5, 6> and determine that 3 is at position 1, do I need to update
> > the associated data with 5 and 6 to say they are now at position 2 and 3
> > respectively?

>
> From point of application code it is sufficient to know the position of
> just only the *last* successful insert() operation and of the
> *last* successful find() operation.
> Ie. in the application code there is no need to keep track of the positions
> of the other items; just that of the latest successful insert() or find().
>
> In the example above, after inserting 3 into this set
> it just shall be possible to get the position
> of this latest new item (ie. here the position of 3, ie. 1).
> There is no need for the library to keep track or to update
> anything defined in the application code that references items of the set.
> Ie. there are no such references (at least not in my code).


Besides the above functionality a furthergoing super solution
would be if the traversing of the container would be possible
for all these 3 cases:
1) traverse in default (sorted) order
2) traverse in insert order
3) traverse in raw (physical) order

The 1st one is of course already possible.

The 2nd case could also efficiently be realized in application code
if only the above mentioned functionality would be possible. Ie. one then
would need to store and update the returned pos of the inserted item
in a std::vector<uint> at position pos. But before doing this one would need
to move (reversely) all items from pos to the end by 1 position,
and while doing this incrementing by 1 those items that have a
value >= the pos to insert...
But one would need a direct access (array access) to the items of the set via a uint handle...

The 3rd could be done for example like this:
cursor c; // stores last walked pos etc.
while ((it = s.next(c)) != s.end())
do_something with *it

Ie. one would need to implement a "next(cursor& c)" method in the library...

These are just some ideas of mine...
At the moment I unfortunately don't have time to implement these
IMO useful functionalities myself, besides this I don't know the STL
that well, much less its internals. It seems one would need to
extend the underlying code of the RB or AVL tree and add
a uint data member to each block and update it with each insert, delete etc.
Long time ago I had worked on Btree implementations and I can imagine
that the most complicated case would be when a block would need
to be splitted or when two blocks need to be joined...

People interessted in implementing the above features
can also take a look at this interessting project:

"STL containers map, set, and list with fast array access" by Ray Virzi
"Source code for STL compliant container classes that add
fast indexing capability to existing container types":
http://www.codeproject.com/KB/stl/indexlist.aspx

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
Solution for posterity: GridView, Datakeys, and "Item has already been added. Key in dictionary: 'CategoryID' Key being added: 'CategoryID'" ASP .Net 2 11-02-2006 04:48 AM
WebControl - CollectionEditor Problem. Changing id property of new added collection item causes not adding item to collection - Sergio ASP .Net Web Controls 0 05-29-2006 06:20 AM
embedding python: simulating sequence get item / set item methods Johannes Zellner Python 1 01-17-2006 01:09 AM
Set bgcolor of a newly added table row? Martin Javascript 2 11-19-2004 06:45 PM



Advertisments