Java VM 1.6.0 Sorting Collections

P

petek1976

I have been having some trouble with the performance of Java. In my
code I have narrowed it down to the sort:

For instance:


import java.util.*;
public class SortTest
{
public static void main(String args[])
{
final int vecSize = 1000000;
ArrayList<Double> vec = new ArrayList<Double>(vecSize);
final int numItr = 10;
ArrayList<Double> times = new ArrayList<Double>(numItr);
for (int i=0; i<numItr; ++i)
{
for (int k=0; k<vecSize; ++k)
{
vec.add(k,Math.random());
}
long startTime = System.nanoTime();
java.util.Collections.sort(vec);
long endTime = System.nanoTime();
times.add(i,(endTime-startTime)*1.e-9);
vec = new ArrayList<Double>(vecSize);
}
double avg=0.0;
for (Double val:times)
{
avg += val;
}
avg /= times.size();
System.out.println("To sort " + vec.size() + " elements " +
numItr + " times took " + avg + " seconds on average");

}
}



This prints about 0.58 seconds on average. How can I optimize this
reasonably? Note I am only timing the sort function. Using an array
instead of ArrayList is not an option. I tried Vector but it didn't
help. The equivalent C++ code (using vector) runs in about 0.37
seconds when built with optimization (g++ version 4.2.1 using -03
optimization ). I am running on a MacBook Pro 2.66 GHz Intel Core i7
with 4 GB of ram. I also tried this example on Linux and saw no
difference. What is causing over 40% difference in speed? I must be
doing something wrong in my java code.


#include <iostream>
#include <vector>
#include <cstdlib>
#include <algorithm>
#include <sys/time.h>
#include <numeric>
using namespace std;

template <class T>
void FillRand(T start, T end)
{
while (start != end)
{
*start++ = double(rand())/RAND_MAX;
}
}

class TimeStamp
{
public:
void start()
{
gettimeofday(&st_val,0);
}
void stop()
{
gettimeofday(&end_val,0);
}
double elapsedSec() const
{
return end_val.tv_sec-st_val.tv_sec +
(end_val.tv_usec-st_val.tv_usec)*1.e-6;
}
private:
timeval st_val;
timeval end_val;
};

int main()
{
vector<double> vec(1000000);
const long num_itr = 10;
vector<double> time_stamps(num_itr);
TimeStamp ts;
for (long i=0; i<num_itr; ++i)
{
FillRand(vec.begin(),vec.end());
ts.start();
std::sort(vec.begin(),vec.end());
ts.stop();
time_stamps = ts.elapsedSec();
}
double avg=std::accumulate(time_stamps.begin(),time_stamps.end(),
0.0)/time_stamps.size();
cout << "To sort " << vec.size() << " elements took " << avg << "
seconds on average\n";
}
 
M

markspace

petek1976 said:
This prints about 0.58 seconds on average. How can I optimize this
reasonably? Note I am only timing the sort function. Using an array
instead of ArrayList is not an option.


Unfortunately I think using an array of double is the only option. I
didn't see anything obviously wrong with your code.

I got about a 450% speed-up by switching to double[] instead of
ArrayList<Double>. Why is using double[] proscribed? If it's the need
for a List, you can roll your own.
 
T

Tom Anderson

This prints about 0.58 seconds on average. How can I optimize this
reasonably?Note I am only timing the sort function. Using an array
instead of ArrayList is not an option. I tried Vector but it didn't
help. The equivalent C++ code (using vector) runs in about 0.37
seconds

In terms of the java vs C++ comparison:

1. Java's standard sort has to be stable, which means in practice it's a
mergesort, whereas STL's doesn't, so it can be a quicksort. Quicksort will
typically be faster than mergesort for random data.

2. Java handles the doubles as objects on the heap here, whereas C++, i
believe, will handle them as primitives (because it resolves templates at
compile time, essentially, whereas java just erases types and uses the
same code at runtime for all element types). Which means:

(a) Java's data takes up more memory, and so requires more cache and
memory bandwidth to work with (you have a million objects, so you're
looking at 8 MB just for the raw numbers; the object overhead probably
about quadruples that).

(b) Java has to do comparisons by making virtual (worse - interface!)
method calls, whereas C++ can just use a single machine instruction.
Java's runtime compiler stands a good chance of optimising that overhead
away; i have no idea if it will catch none, some, most, or all of it. You
could try dumping the compiled code to see what HotSpot is up to:

http://wikis.sun.com/display/HotSpotInternals/PrintAssembly

If performance is vital but you want a List interface, than as Mark
suggested, write a List<Double> implementation that wraps a double[]. You
can then use Arrays.sort to sort the content - for doubles, this is a
quicksort and uses direct comparisons, so it should be fast. AbstractList
makes this rather easy to implement. You could also try this guy's
DoubleArrayList:

http://pcj.sourceforge.net/

to which you could easily add internal sorting. It would be nice if there
was such a class in the Apache or Google collections libraries, but there
isn't.

tom
 
J

Jim Janney

petek1976 said:
I have been having some trouble with the performance of Java. In my
code I have narrowed it down to the sort:

For instance:


import java.util.*;
public class SortTest
{
public static void main(String args[])
{
final int vecSize = 1000000;
ArrayList<Double> vec = new ArrayList<Double>(vecSize);
final int numItr = 10;
ArrayList<Double> times = new ArrayList<Double>(numItr);
for (int i=0; i<numItr; ++i)
{
for (int k=0; k<vecSize; ++k)
{
vec.add(k,Math.random());
}
long startTime = System.nanoTime();
java.util.Collections.sort(vec);
long endTime = System.nanoTime();
times.add(i,(endTime-startTime)*1.e-9);
vec = new ArrayList<Double>(vecSize);
}
double avg=0.0;
for (Double val:times)
{
avg += val;
}
avg /= times.size();
System.out.println("To sort " + vec.size() + " elements " +
numItr + " times took " + avg + " seconds on average");

}
}



This prints about 0.58 seconds on average. How can I optimize this
reasonably? Note I am only timing the sort function. Using an array
instead of ArrayList is not an option. I tried Vector but it didn't
help. The equivalent C++ code (using vector) runs in about 0.37
seconds when built with optimization (g++ version 4.2.1 using -03
optimization ). I am running on a MacBook Pro 2.66 GHz Intel Core i7
with 4 GB of ram. I also tried this example on Linux and saw no
difference. What is causing over 40% difference in speed? I must be
doing something wrong in my java code.


#include <iostream>
#include <vector>
#include <cstdlib>
#include <algorithm>
#include <sys/time.h>
#include <numeric>
using namespace std;

template <class T>
void FillRand(T start, T end)
{
while (start != end)
{
*start++ = double(rand())/RAND_MAX;
}
}

class TimeStamp
{
public:
void start()
{
gettimeofday(&st_val,0);
}
void stop()
{
gettimeofday(&end_val,0);
}
double elapsedSec() const
{
return end_val.tv_sec-st_val.tv_sec +
(end_val.tv_usec-st_val.tv_usec)*1.e-6;
}
private:
timeval st_val;
timeval end_val;
};

int main()
{
vector<double> vec(1000000);
const long num_itr = 10;
vector<double> time_stamps(num_itr);
TimeStamp ts;
for (long i=0; i<num_itr; ++i)
{
FillRand(vec.begin(),vec.end());
ts.start();
std::sort(vec.begin(),vec.end());
ts.stop();
time_stamps = ts.elapsedSec();
}
double avg=std::accumulate(time_stamps.begin(),time_stamps.end(),
0.0)/time_stamps.size();
cout << "To sort " << vec.size() << " elements took " << avg << "
seconds on average\n";
}


Take a look at the source for Collections.sort:

public static <T extends Comparable<? super T>> void sort(List<T> list) {
Object[] a = list.toArray();
Arrays.sort(a);
ListIterator<T> i = list.listIterator();
for (int j=0; j<a.length; j++) {
i.next();
i.set((T)a[j]);
}
}

You're copying the list to an array, sorting the array, and then
copying the array back to the list. You might try coding up a simple
quicksort that operates directly on the ArrayList to see what kind of
times that gives you.
 
J

Jim Janney

Jim Janney said:
You're copying the list to an array, sorting the array, and then
copying the array back to the list. You might try coding up a simple
quicksort that operates directly on the ArrayList to see what kind of
times that gives you.

The evil approach would be to use reflection to get the elementData
field in ArrayList, and call Arrays.sort directly on that.
 
L

Lew

petek1976 said:
This prints about 0.58 seconds on average. How can I optimize this
reasonably? Note I am only timing the sort function. Using an array
instead of ArrayList is not an option. I tried Vector but it didn't

One would expect 'Vector' to be slightly slower than 'ArrayList'.
help. The equivalent C++ code (using vector) runs in about 0.37

Java programs typically run at 50-105% of the speed of optimized C++
programs. What optimizations did you hand the JVM?

One would expect that you at least specified "-server", yes?
seconds when built with optimization (g++ version 4.2.1 using -03
optimization ). I am running on a MacBook Pro 2.66 GHz Intel Core i7
with 4 GB of ram. I also tried this example on Linux and saw no
difference. What is causing over 40% difference in speed? I must be
doing something wrong in my java [sic] code.

HotSpot takes something like 10,000 visits to the same code to compile
to native code. Other optimizations may or may not take that many
iterations.

Run your loop 100,000 times or so without timing it, then repeat it
under timing. This should prime the optimizer.

Microbenchmarks are deceptive and usually not indicative of real-world
performance.
 
T

Tom Anderson

The evil approach would be to use reflection to get the elementData
field in ArrayList, and call Arrays.sort directly on that.

Your ideas are intriguing to me, and i wish to subscribe to your
newsletter.

tom
 
J

Jim Janney

Lew said:
petek1976 said:
This prints about 0.58 seconds on average. How can I optimize this
reasonably? Note I am only timing the sort function. Using an array
instead of ArrayList is not an option. I tried Vector but it didn't

One would expect 'Vector' to be slightly slower than 'ArrayList'.
help. The equivalent C++ code (using vector) runs in about 0.37

Java programs typically run at 50-105% of the speed of optimized C++
programs. What optimizations did you hand the JVM?

One would expect that you at least specified "-server", yes?
seconds when built with optimization (g++ version 4.2.1 using -03
optimization ). I am running on a MacBook Pro 2.66 GHz Intel Core i7
with 4 GB of ram. I also tried this example on Linux and saw no
difference. What is causing over 40% difference in speed? I must be
doing something wrong in my java [sic] code.

HotSpot takes something like 10,000 visits to the same code to compile
to native code. Other optimizations may or may not take that many
iterations.

Run your loop 100,000 times or so without timing it, then repeat it
under timing. This should prime the optimizer.

Microbenchmarks are deceptive and usually not indicative of real-world
performance.

Assuming n log(n) performance for the sorting algorithm, sorting a
million-element list once gives you well over a million visits to the
comparison code, and presumably to the inner loops of the sort.
 
L

Lew

Jim said:
Assuming n log(n) performance for the sorting algorithm, sorting a
million-element list once gives you well over a million visits to the
comparison code, and presumably to the inner loops of the sort.

Good point, but if he doesn't have "-server" then it might not even do
that optimization.
 
R

Roedy Green

Using an array
instead of ArrayList is not an option

Why? You can always convert an ArrayList to an array and back again.
It might not buy you anything, but code outside that would not even
know about it.
 
R

Roedy Green

ArrayList<Double> vec = new ArrayList<Double>(vecSize);

Arraylist of Double is HUGELY fatter than array of double because
there is the object header overhead for every item in the ArrayList.

The sort comparison routine with the ArrayList has to find the values
to compare with extra indirection. That is mostly what consumes the
cycles in a sort.
 
T

Tom Anderson

Good point, but if he doesn't have "-server" then it might not even do
that optimization.

This document:

http://java.sun.com/performance/reference/whitepapers/tuning.html

is so old it's virtually scripture (last revised in 2005, for 1.5.0_06, i
think), so i don't know how applicable it is to the OP's JVM, but it
alleges that:

Most computers that have at least 2 CPU's and at least 2 GB of physical
memory are considered server-class machines which means that by default
the settings are:

* The -server compiler

If 'CPUs' means 'cores', and the OP's machine is reasonably recent,
there's a strong chance that -server is on already. Doesn't hurt to turn
it on explicitly, of course. I know i do!

tom
 
R

Roedy Green

The evil approach would be to use reflection to get the elementData
field in ArrayList, and call Arrays.sort directly on that.

You are still using Doubles rather than doubles, so you won't get the
full benefit of a true array.
 
J

Joshua Cranmer

1. Java's standard sort has to be stable, which means in practice it's a
mergesort, whereas STL's doesn't, so it can be a quicksort. Quicksort
will typically be faster than mergesort for random data.

Arrays.sort() will use quicksort for integral types.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top