Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Containers and sorting objects vs. pointers

Reply
Thread Tools

Containers and sorting objects vs. pointers

 
 
Matthias Kaeppler
Guest
Posts: n/a
 
      07-15-2005
Hi,

in my program, I have to sort containers of objects which can be 2000
items big in some cases. Since STL containers are based around copying
and since I need to sort these containers quite frequently, I thought
it'd be a better idea to manage additional containers which are
initialized with pointers to the objects in the primary containers and
sort those (only pointers have to be copied around then).

However, that also means if I have 2000 objects, I have to create *two*
containers holding 2000 objects each, the first with the real objects
and the second with pointers to those.

My question is:
If those objects I'm talking about are between 8 and 16 bytes, would it
be faster to simply perform the sort algos on the objects themselves or
is it better to create that additional container with pointers and
continue working with that?

--
Matthias Kaeppler
 
Reply With Quote
 
 
 
 
pven
Guest
Posts: n/a
 
      07-15-2005
Hi,

I think the best approach is to sort with the primary container itself
with your object structures.

Also, if you are planning to use an additional container that stores
the pointer to the objects in the primary container, ensure that the
objects in the primary container are on the heap.

The reason for this is, When a container rearranges itself (lets say
you keep adding objects and the container needs to move the objects to
a new memory location which is bigger), it will shift objects around by
calling the copy constructor of your object.

So after such an operation, your pointers in the other container might
become invalid.

 
Reply With Quote
 
 
 
 
pven
Guest
Posts: n/a
 
      07-15-2005
> Also, if you are planning to use an additional container that stores
> the pointer to the objects in the primary container, ensure that the
> objects in the primary container are on the heap.
>
> The reason for this is, When a container rearranges itself (lets say
> you keep adding objects and the container needs to move the objects to
> a new memory location which is bigger), it will shift objects around by
> calling the copy constructor of your object.
>
> So after such an operation, your pointers in the other container might
> become invalid.


My bad,

The above logic applies only if you are using a vector container.

After a little thought, I suggest you use a list container for storing
the objects, Sorting and re-arranging will be efficent. List is similar
to linked list datastructure..

 
Reply With Quote
 
Mercator
Guest
Posts: n/a
 
      07-15-2005
Matthias Kaeppler wrote:
> My question is:
> If those objects I'm talking about are between 8 and 16 bytes, would it
> be faster to simply perform the sort algos on the objects themselves or
> is it better to create that additional container with pointers and
> continue working with that?


C++ has time functions which help you finding the answer. Performance
assertions without measurement are moot. See also:
http://www.informit.com/guides/conte...lus&seqNum=156

 
Reply With Quote
 
Matthias Kaeppler
Guest
Posts: n/a
 
      07-15-2005
pven wrote:
> My bad,
>
> The above logic applies only if you are using a vector container.
>
> After a little thought, I suggest you use a list container for storing
> the objects, Sorting and re-arranging will be efficent. List is similar
> to linked list datastructure..


Yep, that's exactly how I do it right now; objects
in a linked list, pointers in an std::vector (the
latter mostly because IIRC std:artition doesn't
work on lists).
 
Reply With Quote
 
Matthias Kaeppler
Guest
Posts: n/a
 
      07-15-2005
Mercator wrote:
> Matthias Kaeppler wrote:
>
>>My question is:
>>If those objects I'm talking about are between 8 and 16 bytes, would it
>>be faster to simply perform the sort algos on the objects themselves or
>>is it better to create that additional container with pointers and
>>continue working with that?

>
>
> C++ has time functions which help you finding the answer. Performance
> assertions without measurement are moot. See also:
> http://www.informit.com/guides/conte...lus&seqNum=156
>


That's great, thanks. I have been looking for
something like this for quite some time.
Probably that'll answer my question most precisely
(whereas I tend to agree with the other poster
that it's probably less overhead to sort the real
objects, since they can be copied with a trivial
copy constructor and are quite lean).

Regards,
Matthias
 
Reply With Quote
 
Bart
Guest
Posts: n/a
 
      07-15-2005
Matthias Kaeppler wrote:
> Hi,
>
> in my program, I have to sort containers of objects which can be 2000
> items big in some cases. Since STL containers are based around copying
> and since I need to sort these containers quite frequently, I thought
> it'd be a better idea to manage additional containers which are
> initialized with pointers to the objects in the primary containers and
> sort those (only pointers have to be copied around then).
>
> However, that also means if I have 2000 objects, I have to create *two*
> containers holding 2000 objects each, the first with the real objects
> and the second with pointers to those.
>
> My question is:
> If those objects I'm talking about are between 8 and 16 bytes, would it
> be faster to simply perform the sort algos on the objects themselves or
> is it better to create that additional container with pointers and
> continue working with that?


As with anything performance-related, there's no point in discussing
this without performing some tests, but I suspect the sorting algorithm
would only copy the objects when required. Why do you think you would
incur any penalty over copying them yourself?

 
Reply With Quote
 
Stuart MacMartin
Guest
Posts: n/a
 
      07-15-2005
Sorry, I'm not impressed with the answers so far.

2000 is a small number. Try just sorting the container (use std::sort
and provide either an operator< or a predicate). If this is not a
performance problem, don't worry about it.

If it is a performance problem, first check that it's implemented
right:

1. Make sure a < b < c ==> a < c and !(b < a). f your < operator or
predicate is at all complicated you should add debugging tests to be
sure. Otherwise std::sort() might underrun and give the impression of
performance problem.

2. Make sure you aren't running on Windows with condition of a sorted
list + one unsorted item at the end. With only 2000 items this
shouldn't be an issue, but if you had 100X this amount you may find
yourself waiting minutes or hours for the sort to finish.

3. If the implementation is correct and you still have a problem, there
are two places your sort will be spending time:
a) comparing
b) swapping.

(b) can be significant with large objects, especially if the object
owns lots of additional data. There can be lots of swapping while you
sort. As a general rule I never sort anything other than vectors of
pointers or trivial objects, but I tend to deal with large numbers of
objects (10s of thousands to millions). Here's where that heap comment
someone else made came into play:

If you add an item to a vector, every item in the vector can be moved.
If you're going to have an auxiliary list of pointers, you must do one
of four things:

1) Allocate all items and never add another, or at least reserve() the
required space. Now the pointers are fixed.
2) Every time you add an item, toss your sorted list, and create it
again as needed
3) Use a linked list or other structure to hold your master list of
objects. This keeps your objects from moving around
4) Allocate objects on the heap, and only have vectors of pointers.

I assume you don't have a problem copying objects in general because
you're willing to have a vector of them by value. But this is the one
time when you might consider using a linked list over some other data
structure.

Stuart

 
Reply With Quote
 
Stuart MacMartin
Guest
Posts: n/a
 
      07-15-2005
Three other little points:

1. Sorting a singly linked list is really inefficent.
Depending on the sort algorithm, sorting a
doubly linked list can be inefficient

2. If the objects are tight enough that you haven't used a heap
but just have them in a vector, chances are sorting the vector
will be good enough. If not, revisit your other decisions.

3. The comparison operator makes a big difference
the runtime of your sort:
a) Make it as tight as possible
b) Use sort() with an inlined operator< or a function object
to make sure the comparison is inlined
c) Never treat two objects as equal in your ordering.
If you can have several equivalent objects, like "red" vs.
"black",
your comparison should return &obj1 < &obj2 if they are
both red or both black.

Stuart

 
Reply With Quote
 
Matthias Kaeppler
Guest
Posts: n/a
 
      07-16-2005
Thanks Stuart, that was quite insightful.
I am now simply holding a vector with the objects directly. Previously I
had used a doubly linked list to hold the objects and a vector to hold
the pointers.

Here is a predicate function for sorting File objects by name ascending:

struct FileLessNameA
: std::binary_function<const File&,const File&,bool> {
bool operator() (const File& lhs, const File& rhs) const {
std::string path1( lhs.get_name().c_str() );
std::string path2( rhs.get_name().c_str() );
boost::algorithm::to_lower( path1 );
boost::algorithm::to_lower( path2 );
return path1 < path2;
}
};

I am converting to all lowercase because I don't want FOO to be less
than foo.

PS: A File object is actually nothing more than a
boost::filesystem:ath and a reference counted smart pointer to a
Gnome::Vfs::FileInfo object. I think it's maybe 8-16 bytes in size.

--
Matthias Kaeppler
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Are sequence containers not a subset of general containers? Sebastian Mach C++ 5 10-06-2012 07:54 PM
pointers, pointers, pointers... cerr C Programming 12 04-07-2011 11:17 PM
Containers of iterators vs. containers of references clark.coleman@att.net C++ 7 01-25-2008 01:37 PM
Questions about pointers to objects and pointers to functions Marc Thrun C Programming 15 10-04-2005 05:47 PM
sorting with STL containers containing objects of different subclasses Koen C++ 1 06-30-2003 03:51 PM



Advertisments