Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > SortedSet faster than Collections.sort()

Reply
Thread Tools

SortedSet faster than Collections.sort()

 
 
Timo Nentwig
Guest
Posts: n/a
 
      02-24-2005
Hi!

Is a TreeSet faster than port-sorting an ArrayList? According to some rough
benchmarking I did it is by the factor of Collections.sort() but this
benchmark was not done under reliable circumstances.

Why can't I set an initial size for a TreeSet?


 
Reply With Quote
 
 
 
 
John C. Bollinger
Guest
Posts: n/a
 
      02-24-2005
Timo Nentwig wrote:

> Hi!
>
> Is a TreeSet faster than port-sorting an ArrayList? According to some rough
> benchmarking I did it is by the factor of Collections.sort() but this
> benchmark was not done under reliable circumstances.


TreeSet has to maintain its internal order, including keeping its tree
balanced, with every addition. This can be tremendously slower than
putting the same objects in a List and sorting them once, and we have
seen it to be so, given enough elements in the set. On the other hand,
it's already sorted whenever you need it.

If you change your Set reasonably often relative to iterating over it or
using the methods peculiar to SortedSet then a List that you sort at
need might be better. Likewise if you only sometimes need it to be in
order. Also consider that in many situations you don't really need
sorting at all. For example, finding the minimum element can be done in
O(N) by simply examining each element -- if that's all you need, and you
only need it a small number of times (fewer than log(N)) then sorting
the elements is wasteful.

> Why can't I set an initial size for a TreeSet?


What would it mean? You can't have the tree without the elements.


--
John Bollinger
http://www.velocityreviews.com/forums/(E-Mail Removed)
 
Reply With Quote
 
 
 
 
Alan Krueger
Guest
Posts: n/a
 
      02-25-2005
John C. Bollinger wrote:
> TreeSet has to maintain its internal order, including keeping its tree
> balanced, with every addition. This can be tremendously slower than
> putting the same objects in a List and sorting them once, and we have
> seen it to be so, given enough elements in the set.


Not sure how this can be true. TreeSet Javadoc says, "This
implementation provides guaranteed log(n) time cost for the basic
operations (add, remove and contains)." This implies O(N log N) time to
insert N items. Collections.sort Javadoc says, "This algorithm offers
guaranteed n log(n) performance."

> If you change your Set reasonably often relative to iterating over it or
> using the methods peculiar to SortedSet then a List that you sort at
> need might be better.


Doubtful, since adding an item to a TreeSet would take O(log N) time;
adding to a sorted list would be no faster and the add-and-resort for an
unsorted list would certainly be slower.
 
Reply With Quote
 
Chris Smith
Guest
Posts: n/a
 
      02-25-2005
Alan Krueger <(E-Mail Removed)> wrote:
> John C. Bollinger wrote:
> > TreeSet has to maintain its internal order, including keeping its tree
> > balanced, with every addition. This can be tremendously slower than
> > putting the same objects in a List and sorting them once, and we have
> > seen it to be so, given enough elements in the set.

>
> Not sure how this can be true. TreeSet Javadoc says, "This
> implementation provides guaranteed log(n) time cost for the basic
> operations (add, remove and contains)." This implies O(N log N) time to
> insert N items. Collections.sort Javadoc says, "This algorithm offers
> guaranteed n log(n) performance."


The difference is in the definition of "n". The time to perform some
set of add/remove/contains operations on a TreeSet would actually be on
the order of m * log(n), where 'm' is the total number of operations and
n is the average number of elements in the TreeSet. If all operations
are add operations as in your example, then this turns out to be the
same as n * log(n), but only because we can demonstrate that m == n / 2.

However, if m is not on the order of n, then the fact doesn't hold.
Furthermore, in reality, the number of operations on a long-lived data
structure is rarely on the order of the average size, but is instead
much greater. If the average size of the tree is on the order of the
square root of the number of operations, then the total time of the task
becomes O(n^2 * log(n)), which is looking considerably worse. If the
number of ordered traversals is constant or otherwise doesn't scale with
the size of the problem, then an explicit sort per traversal is still
only O(n * log(n)), and the TreeSet is looking pretty bad by comparison.

> Doubtful, since adding an item to a TreeSet would take O(log N) time;
> adding to a sorted list would be no faster and the add-and-resort for an
> unsorted list would certainly be slower.


Adding to a sorted list is definitely O(log(n)), but adding to an
unsorted list can be constant-time. Sorting at need can then be cheaper
than working with the SortedSet data structure, as explained above.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
Reply With Quote
 
el goog
Guest
Posts: n/a
 
      02-25-2005
> Adding to a sorted list is definitely O(log(n)),

What do you mean by this? It's not true if your list is implemented
using an array or linked list, at least if you want to maintain the
list in sorted order.

 
Reply With Quote
 
Chris Smith
Guest
Posts: n/a
 
      02-25-2005
el goog <(E-Mail Removed)> wrote:
> > Adding to a sorted list is definitely O(log(n)),

>
> What do you mean by this? It's not true if your list is implemented
> using an array or linked list, at least if you want to maintain the
> list in sorted order.


You should read that as "O(log(n)) or worse", which doesn't impact the
original point at all.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
Reply With Quote
 
el goog
Guest
Posts: n/a
 
      02-26-2005
> You should read that as "O(log(n)) or worse", which doesn't impact
the
> original point at all.


OK, I see your intent - you mean Theta(log(n)). You shouldn't use
big-Oh
notation with lower bounds. If you think about the definition, you'll
see
that "O(log(n)) or worse" is a vacuous statement.

I don't disagree with your point, just your abuse of notation.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Bug: SortedSet gives warning. John Carter Ruby 2 03-04-2005 03:34 AM
Comment : ASP.NET Performs 10 Times Faster than J2EE =?Utf-8?B?Sm9obiBQYXVsLiBBIChNQ1AgSUQjIDMwMTUxNzYp?= ASP .Net 6 07-07-2004 12:46 PM
Re-sorting a SortedSet Larry Coon Java 5 06-05-2004 12:48 AM
Reverse iterator for a SortedSet ? Sasha Java 3 01-13-2004 08:58 PM
Anything faster than stat() ? Ken Tucker Perl 1 07-08-2003 06:29 AM



Advertisments