Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Java VM 1.6.0 Sorting Collections

Reply
Thread Tools

Java VM 1.6.0 Sorting Collections

 
 
petek1976
Guest
Posts: n/a
 
      06-10-2010
I have been having some trouble with the performance of Java. In my
code I have narrowed it down to the sort:

For instance:


import java.util.*;
public class SortTest
{
public static void main(String args[])
{
final int vecSize = 1000000;
ArrayList<Double> vec = new ArrayList<Double>(vecSize);
final int numItr = 10;
ArrayList<Double> times = new ArrayList<Double>(numItr);
for (int i=0; i<numItr; ++i)
{
for (int k=0; k<vecSize; ++k)
{
vec.add(k,Math.random());
}
long startTime = System.nanoTime();
java.util.Collections.sort(vec);
long endTime = System.nanoTime();
times.add(i,(endTime-startTime)*1.e-9);
vec = new ArrayList<Double>(vecSize);
}
double avg=0.0;
for (Double val:times)
{
avg += val;
}
avg /= times.size();
System.out.println("To sort " + vec.size() + " elements " +
numItr + " times took " + avg + " seconds on average");

}
}



This prints about 0.58 seconds on average. How can I optimize this
reasonably? Note I am only timing the sort function. Using an array
instead of ArrayList is not an option. I tried Vector but it didn't
help. The equivalent C++ code (using vector) runs in about 0.37
seconds when built with optimization (g++ version 4.2.1 using -03
optimization ). I am running on a MacBook Pro 2.66 GHz Intel Core i7
with 4 GB of ram. I also tried this example on Linux and saw no
difference. What is causing over 40% difference in speed? I must be
doing something wrong in my java code.


#include <iostream>
#include <vector>
#include <cstdlib>
#include <algorithm>
#include <sys/time.h>
#include <numeric>
using namespace std;

template <class T>
void FillRand(T start, T end)
{
while (start != end)
{
*start++ = double(rand())/RAND_MAX;
}
}

class TimeStamp
{
public:
void start()
{
gettimeofday(&st_val,0);
}
void stop()
{
gettimeofday(&end_val,0);
}
double elapsedSec() const
{
return end_val.tv_sec-st_val.tv_sec +
(end_val.tv_usec-st_val.tv_usec)*1.e-6;
}
private:
timeval st_val;
timeval end_val;
};

int main()
{
vector<double> vec(1000000);
const long num_itr = 10;
vector<double> time_stamps(num_itr);
TimeStamp ts;
for (long i=0; i<num_itr; ++i)
{
FillRand(vec.begin(),vec.end());
ts.start();
std::sort(vec.begin(),vec.end());
ts.stop();
time_stamps[i] = ts.elapsedSec();
}
double avg=std::accumulate(time_stamps.begin(),time_stamp s.end(),
0.0)/time_stamps.size();
cout << "To sort " << vec.size() << " elements took " << avg << "
seconds on average\n";
}
 
Reply With Quote
 
 
 
 
markspace
Guest
Posts: n/a
 
      06-10-2010
petek1976 wrote:

> This prints about 0.58 seconds on average. How can I optimize this
> reasonably? Note I am only timing the sort function. Using an array
> instead of ArrayList is not an option.



Unfortunately I think using an array of double is the only option. I
didn't see anything obviously wrong with your code.

I got about a 450% speed-up by switching to double[] instead of
ArrayList<Double>. Why is using double[] proscribed? If it's the need
for a List, you can roll your own.
 
Reply With Quote
 
 
 
 
Tom Anderson
Guest
Posts: n/a
 
      06-10-2010
On Wed, 9 Jun 2010, petek1976 wrote:

> This prints about 0.58 seconds on average. How can I optimize this
> reasonably?Note I am only timing the sort function. Using an array
> instead of ArrayList is not an option. I tried Vector but it didn't
> help. The equivalent C++ code (using vector) runs in about 0.37
> seconds


In terms of the java vs C++ comparison:

1. Java's standard sort has to be stable, which means in practice it's a
mergesort, whereas STL's doesn't, so it can be a quicksort. Quicksort will
typically be faster than mergesort for random data.

2. Java handles the doubles as objects on the heap here, whereas C++, i
believe, will handle them as primitives (because it resolves templates at
compile time, essentially, whereas java just erases types and uses the
same code at runtime for all element types). Which means:

(a) Java's data takes up more memory, and so requires more cache and
memory bandwidth to work with (you have a million objects, so you're
looking at 8 MB just for the raw numbers; the object overhead probably
about quadruples that).

(b) Java has to do comparisons by making virtual (worse - interface!)
method calls, whereas C++ can just use a single machine instruction.
Java's runtime compiler stands a good chance of optimising that overhead
away; i have no idea if it will catch none, some, most, or all of it. You
could try dumping the compiled code to see what HotSpot is up to:

http://wikis.sun.com/display/HotSpot.../PrintAssembly

If performance is vital but you want a List interface, than as Mark
suggested, write a List<Double> implementation that wraps a double[]. You
can then use Arrays.sort to sort the content - for doubles, this is a
quicksort and uses direct comparisons, so it should be fast. AbstractList
makes this rather easy to implement. You could also try this guy's
DoubleArrayList:

http://pcj.sourceforge.net/

to which you could easily add internal sorting. It would be nice if there
was such a class in the Apache or Google collections libraries, but there
isn't.

tom

--
I'm angry, but not Milk and Cheese angry. -- Mike Froggatt
 
Reply With Quote
 
Jim Janney
Guest
Posts: n/a
 
      06-10-2010
petek1976 <> writes:

> I have been having some trouble with the performance of Java. In my
> code I have narrowed it down to the sort:
>
> For instance:
>
>
> import java.util.*;
> public class SortTest
> {
> public static void main(String args[])
> {
> final int vecSize = 1000000;
> ArrayList<Double> vec = new ArrayList<Double>(vecSize);
> final int numItr = 10;
> ArrayList<Double> times = new ArrayList<Double>(numItr);
> for (int i=0; i<numItr; ++i)
> {
> for (int k=0; k<vecSize; ++k)
> {
> vec.add(k,Math.random());
> }
> long startTime = System.nanoTime();
> java.util.Collections.sort(vec);
> long endTime = System.nanoTime();
> times.add(i,(endTime-startTime)*1.e-9);
> vec = new ArrayList<Double>(vecSize);
> }
> double avg=0.0;
> for (Double val:times)
> {
> avg += val;
> }
> avg /= times.size();
> System.out.println("To sort " + vec.size() + " elements " +
> numItr + " times took " + avg + " seconds on average");
>
> }
> }
>
>
>
> This prints about 0.58 seconds on average. How can I optimize this
> reasonably? Note I am only timing the sort function. Using an array
> instead of ArrayList is not an option. I tried Vector but it didn't
> help. The equivalent C++ code (using vector) runs in about 0.37
> seconds when built with optimization (g++ version 4.2.1 using -03
> optimization ). I am running on a MacBook Pro 2.66 GHz Intel Core i7
> with 4 GB of ram. I also tried this example on Linux and saw no
> difference. What is causing over 40% difference in speed? I must be
> doing something wrong in my java code.
>
>
> #include <iostream>
> #include <vector>
> #include <cstdlib>
> #include <algorithm>
> #include <sys/time.h>
> #include <numeric>
> using namespace std;
>
> template <class T>
> void FillRand(T start, T end)
> {
> while (start != end)
> {
> *start++ = double(rand())/RAND_MAX;
> }
> }
>
> class TimeStamp
> {
> public:
> void start()
> {
> gettimeofday(&st_val,0);
> }
> void stop()
> {
> gettimeofday(&end_val,0);
> }
> double elapsedSec() const
> {
> return end_val.tv_sec-st_val.tv_sec +
> (end_val.tv_usec-st_val.tv_usec)*1.e-6;
> }
> private:
> timeval st_val;
> timeval end_val;
> };
>
> int main()
> {
> vector<double> vec(1000000);
> const long num_itr = 10;
> vector<double> time_stamps(num_itr);
> TimeStamp ts;
> for (long i=0; i<num_itr; ++i)
> {
> FillRand(vec.begin(),vec.end());
> ts.start();
> std::sort(vec.begin(),vec.end());
> ts.stop();
> time_stamps[i] = ts.elapsedSec();
> }
> double avg=std::accumulate(time_stamps.begin(),time_stamp s.end(),
> 0.0)/time_stamps.size();
> cout << "To sort " << vec.size() << " elements took " << avg << "
> seconds on average\n";
> }


Take a look at the source for Collections.sort:

public static <T extends Comparable<? super T>> void sort(List<T> list) {
Object[] a = list.toArray();
Arrays.sort(a);
ListIterator<T> i = list.listIterator();
for (int j=0; j<a.length; j++) {
i.next();
i.set((T)a[j]);
}
}

You're copying the list to an array, sorting the array, and then
copying the array back to the list. You might try coding up a simple
quicksort that operates directly on the ArrayList to see what kind of
times that gives you.

--
Jim Janney
 
Reply With Quote
 
Jim Janney
Guest
Posts: n/a
 
      06-10-2010
Jim Janney <> writes:

> You're copying the list to an array, sorting the array, and then
> copying the array back to the list. You might try coding up a simple
> quicksort that operates directly on the ArrayList to see what kind of
> times that gives you.


The evil approach would be to use reflection to get the elementData
field in ArrayList, and call Arrays.sort directly on that.

--
Jim Janney
 
Reply With Quote
 
Lew
Guest
Posts: n/a
 
      06-10-2010
petek1976 wrote:
> This prints about 0.58 seconds on average. How can I optimize this
> reasonably? Note I am only timing the sort function. Using an array
> instead of ArrayList is not an option. I tried Vector but it didn't
>


One would expect 'Vector' to be slightly slower than 'ArrayList'.

> help. The equivalent C++ code (using vector) runs in about 0.37
>


Java programs typically run at 50-105% of the speed of optimized C++
programs. What optimizations did you hand the JVM?

One would expect that you at least specified "-server", yes?

> seconds when built with optimization (g++ version 4.2.1 using -03
> optimization ). I am running on a MacBook Pro 2.66 GHz Intel Core i7
> with 4 GB of ram. I also tried this example on Linux and saw no
> difference. What is causing over 40% difference in speed? I must be
> doing something wrong in my java [sic] code.
>


HotSpot takes something like 10,000 visits to the same code to compile
to native code. Other optimizations may or may not take that many
iterations.

Run your loop 100,000 times or so without timing it, then repeat it
under timing. This should prime the optimizer.

Microbenchmarks are deceptive and usually not indicative of real-world
performance.

--
Lew
 
Reply With Quote
 
Tom Anderson
Guest
Posts: n/a
 
      06-10-2010
On Thu, 10 Jun 2010, Jim Janney wrote:

> Jim Janney <> writes:
>
>> You're copying the list to an array, sorting the array, and then
>> copying the array back to the list. You might try coding up a simple
>> quicksort that operates directly on the ArrayList to see what kind of
>> times that gives you.

>
> The evil approach would be to use reflection to get the elementData
> field in ArrayList, and call Arrays.sort directly on that.


Your ideas are intriguing to me, and i wish to subscribe to your
newsletter.

tom

--
They Set Up Us The Revolution - Now We Have Set Up It Them Back
 
Reply With Quote
 
Jim Janney
Guest
Posts: n/a
 
      06-10-2010
Lew <> writes:

> petek1976 wrote:
>> This prints about 0.58 seconds on average. How can I optimize this
>> reasonably? Note I am only timing the sort function. Using an array
>> instead of ArrayList is not an option. I tried Vector but it didn't
>>

>
> One would expect 'Vector' to be slightly slower than 'ArrayList'.
>
>> help. The equivalent C++ code (using vector) runs in about 0.37
>>

>
> Java programs typically run at 50-105% of the speed of optimized C++
> programs. What optimizations did you hand the JVM?
>
> One would expect that you at least specified "-server", yes?
>
>> seconds when built with optimization (g++ version 4.2.1 using -03
>> optimization ). I am running on a MacBook Pro 2.66 GHz Intel Core i7
>> with 4 GB of ram. I also tried this example on Linux and saw no
>> difference. What is causing over 40% difference in speed? I must be
>> doing something wrong in my java [sic] code.
>>

>
> HotSpot takes something like 10,000 visits to the same code to compile
> to native code. Other optimizations may or may not take that many
> iterations.
>
> Run your loop 100,000 times or so without timing it, then repeat it
> under timing. This should prime the optimizer.
>
> Microbenchmarks are deceptive and usually not indicative of real-world
> performance.


Assuming n log(n) performance for the sorting algorithm, sorting a
million-element list once gives you well over a million visits to the
comparison code, and presumably to the inner loops of the sort.

--
Jim Janney
 
Reply With Quote
 
Lew
Guest
Posts: n/a
 
      06-10-2010
Jim Janney wrote:
> Assuming n log(n) performance for the sorting algorithm, sorting a
> million-element list once gives you well over a million visits to the
> comparison code, and presumably to the inner loops of the sort.
>


Good point, but if he doesn't have "-server" then it might not even do
that optimization.

--
Lew
 
Reply With Quote
 
Jim Janney
Guest
Posts: n/a
 
      06-10-2010
Tom Anderson <> writes:

> On Thu, 10 Jun 2010, Jim Janney wrote:
>
>> Jim Janney <> writes:
>>
>>> You're copying the list to an array, sorting the array, and then
>>> copying the array back to the list. You might try coding up a simple
>>> quicksort that operates directly on the ArrayList to see what kind of
>>> times that gives you.

>>
>> The evil approach would be to use reflection to get the elementData
>> field in ArrayList, and call Arrays.sort directly on that.

>
> Your ideas are intriguing to me, and i wish to subscribe to your
> newsletter.
>


"Why you should buy what you hate"

http://online.wsj.com/article/SB1000...265955016.html

--
Jim Janney
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Sorting list vs sorting vector boltar2003@boltar.world C++ 2 07-06-2010 09:40 AM
Quick sorting through large image collections barman@jhu.edu Digital Photography 3 05-07-2007 08:16 PM
fired event Sorting which wasn't handled - sorting and SelectedIndexChanged Jason ASP .Net Web Controls 0 10-04-2006 02:19 PM
Sorting collections based on nested collections Doug Poland Java 9 09-27-2003 10:46 PM
InnerProperty Persistance for Collections containing other Collections mutex ASP .Net Building Controls 0 07-27-2003 02:45 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57