java versus C or C++ for number crunching

J

johnmortal.forums

This is the sort of question that I hope won't start an unhappy
discussion, but I wanted to know whether there are any well accepted
tests comparing java to C++ (or C) for doing extensive number
crunching (e.g. multiplying 100,000 vectors in three space by various
matrics, and maybe even a lot of trig used to generate those matrices,
so lots of addition and multiplication). I am using C++ for a number
crunching intensive project because I have been so insistently
informed that Java is slow at number crunching and because a discrete
fast fourier procedure I wrote really did seem surprisely slow in Java
(but hey, maybe thats my fault, though I did use the fast version of
the transform). But if Java is not really lagging on number crunching
I would love to switch as I like Java so much. I could write my own
little test program easily enough, but everytime someone posts that
they have done so it seems like there are a lot of explanations posted
about why whatever they wrote is a bad test. Are there any accepted
"good tests" on number crunching that have been run recently?

Thank you
-John
 
K

Kenneth P. Turvey

This is the sort of question that I hope won't start an unhappy
discussion, but I wanted to know whether there are any well accepted
tests comparing java to C++ (or C) for doing extensive number crunching
(e.g. multiplying 100,000 vectors in three space by various matrics, and
maybe even a lot of trig used to generate those matrices, so lots of
[Snip]

Some basic guidelines..

C++ will be faster with object creation and destruction .. a lot faster
in my experience. If you are going to be creating and destroying many
objects the math won't matter.

Java is faster, or as fast in math on primitive types.

Java on Intel will be much slower than C on trig functions. This problem
can be reduced a bit by having a wrapper class that does the trig calls
using JNI, but it will still be slower than C. On other hardware
platforms this isn't a problem. On some Intel it might not be a problem
based on the results of an experiment recently conducted in this forum on
another thread (see Sines and Cosines).
 
P

Peter Duniho

C++ will be faster with object creation and destruction .. a lot faster
in my experience. If you are going to be creating and destroying many
objects the math won't matter.

Is this really true?

In C#, allocations are _much_ faster than in C++, because of the heap
management differences between the two. Because C#'s advantage comes
primarily from the implementation of its garbage collection system, I had
just assumed that Java had a similar garbage collection implementation and
thus shared a similar advantage over C++ in that respect.

C# incurs some extra overhead relative to C++ memory management when it
eventually has to clean up objects, but this rarely impacts performance,
as the collection only happens when there's memory pressure and/or idle
moments. The net is that code that does a lot of allocations usually
performs better in C# than C++ (though most often there's very little
practical difference).

It surprises me to hear that Java is significantly _slower_ than C++.
That would imply that it's got the worst of both worlds: the reclaiming
overhead of a garbage collecting memory manager, and the allocation
overhead of a free-list based memory manager.

Surely that's not actually the case?

Pete
 
M

Mark Thornton

Kenneth said:
This is the sort of question that I hope won't start an unhappy
discussion, but I wanted to know whether there are any well accepted
tests comparing java to C++ (or C) for doing extensive number crunching
(e.g. multiplying 100,000 vectors in three space by various matrics, and
maybe even a lot of trig used to generate those matrices, so lots of
[Snip]

Some basic guidelines..

C++ will be faster with object creation and destruction .. a lot faster
in my experience. If you are going to be creating and destroying many
objects the math won't matter.

It depends. Java can often be faster with multi threaded code --- the
standard C/C++ allocators have to use locking around every alloc/free
whereas Java allocators are often lock free even on multiprocessors. If
your storage structure (and object lifetime) is sufficiently complex
that your C++ code uses reference counting, then Java's garbage
collector can be a lot faster (reference counting with locking is
relatively slow).

Array access may be slower in Java if the JVM can't eliminate bounds
checks.
Java is faster, or as fast in math on primitive types.

Java on Intel will be much slower than C on trig functions. This problem
If your arguments are in the range +- PI/4 then the difference is not so
great. Larger arguments are slower, but then the result starts to
diverge as well (which may or may not matter to you). Try sin(PI).

What quality of C/C++ compiler do you have available? When I could last
be bothered to run tests it wasn't all that hard to beat Microsoft's
then current compiler. Intel's best was usually a bit in front.

Mark Thornton
 
M

Mark Thornton

Peter said:
Is this really true?

In C#, allocations are _much_ faster than in C++, because of the heap
management differences between the two. Because C#'s advantage comes
primarily from the implementation of its garbage collection system, I
had just assumed that Java had a similar garbage collection
implementation and thus shared a similar advantage over C++ in that
respect.

C# incurs some extra overhead relative to C++ memory management when it
eventually has to clean up objects, but this rarely impacts performance,
as the collection only happens when there's memory pressure and/or idle
moments.
In big computational tasks you don't have idle moments and eventually
you do have to clean up. So you need to consider the overall cost of
allocation and deallocation. As with C#, an allocation in Java is pretty
trivial (not much more than a pointer increment). But garbage collection
is not free.
It surprises me to hear that Java is significantly _slower_ than C++.
It isn't, but it does depend on the memory use patterns. Tasks that can
be performed with a strict stack allocation pattern favour C++, those
with complex lifetimes (and especially if multithreaded) favour Java (or
C#).

Mark
 
A

Arne Vajhøj

This is the sort of question that I hope won't start an unhappy
discussion, but I wanted to know whether there are any well accepted
tests comparing java to C++ (or C) for doing extensive number
crunching (e.g. multiplying 100,000 vectors in three space by various
matrics, and maybe even a lot of trig used to generate those matrices,
so lots of addition and multiplication). I am using C++ for a number
crunching intensive project because I have been so insistently
informed that Java is slow at number crunching and because a discrete
fast fourier procedure I wrote really did seem surprisely slow in Java
(but hey, maybe thats my fault, though I did use the fast version of
the transform). But if Java is not really lagging on number crunching
I would love to switch as I like Java so much. I could write my own
little test program easily enough, but everytime someone posts that
they have done so it seems like there are a lot of explanations posted
about why whatever they wrote is a bad test. Are there any accepted
"good tests" on number crunching that have been run recently?

I am not aware of any general accepted tests.

In fact I doubt that it is possible to create such a test, because
what is a good test depends on the problem that needs to be solved.

Forget all the crap from mid-90's about Java being interpreted
and slow etc..

The JIT compiler used in modern JVM's are quite good.

That said, then I would still expect C/C++ to be slightly faster than
Java for your usage. Java checks array indexes - C/C++ does not. And
in general I doubt that sufficient time has been spent optimizing
floating point i JVM's. Floating point is not a big usage area
for Java. Fortran and C still dominates that area.

Whether you will be willing to spend time to track down various
memory overwrites and memory leaks in C/C++ to gain let us guess 10-20%
in performance is something you will have to decide on.

Arne
 
A

Arne Vajhøj

Kenneth said:
C++ will be faster with object creation and destruction .. a lot faster
in my experience. If you are going to be creating and destroying many
objects the math won't matter.

No.

All experience show that GC is more efficient than explicit
deallocation at the cost of poorer real time characteristics.

Arne
 
A

Arne Vajhøj

Peter said:
Is this really true?

In C#, allocations are _much_ faster than in C++, because of the heap
management differences between the two. Because C#'s advantage comes
primarily from the implementation of its garbage collection system, I
had just assumed that Java had a similar garbage collection
implementation and thus shared a similar advantage over C++ in that
respect.

C# incurs some extra overhead relative to C++ memory management when it
eventually has to clean up objects, but this rarely impacts performance,
as the collection only happens when there's memory pressure and/or idle
moments. The net is that code that does a lot of allocations usually
performs better in C# than C++ (though most often there's very little
practical difference).

It surprises me to hear that Java is significantly _slower_ than C++.
That would imply that it's got the worst of both worlds: the reclaiming
overhead of a garbage collecting memory manager, and the allocation
overhead of a free-list based memory manager.

Surely that's not actually the case?

Nope.

I think Java and .NET GC are very similar.

Arne
 
M

Mark Thornton

Arne said:
Java for your usage. Java checks array indexes - C/C++ does not. And

for (int i=0; i<a.length; i++)
... a ...

The server JVM will eliminate the bounds check in cases like this
(assuming 'a' isn't declared volatile). More generally whenever the loop
range is not changed within the loop.
in general I doubt that sufficient time has been spent optimizing
floating point i JVM's.
Nevertheless it is quite good at it. I think scalar SSE2 instructions
are used for example.

Mark
 
P

Peter Duniho

In big computational tasks you don't have idle moments and eventually
you do have to clean up.

I understand that. But that's a special case. For a very broad class of
algorithms, that caveat doesn't apply and the generalization stated --
"C++ will be faster with object creation and destruction" -- would not be
valid.

Also, while there is overhead associated with collecting objects, it can
be relatively inexpensive, especially if no heap compaction is required
(many allocation patterns lend themselves to that situation).

It's certainly true that one can come up with scenarios in which
C++ handles memory management faster than C# (and I guess from your
comments, Java). But the converse is true as well, and IMHO there's no
valid generalization that correctly describes the relative performance
characteristics of those languages. The best one can say is "it depends".
It isn't, but it does depend on the memory use patterns. Tasks that can
be performed with a strict stack allocation pattern favour C++, those
with complex lifetimes (and especially if multithreaded) favour Java (or
C#).

Okay, that makes more sense (and is different from what I was replying to).

Thanks,
Pete
 
P

Patricia Shanahan

This is the sort of question that I hope won't start an unhappy
discussion, but I wanted to know whether there are any well accepted
tests comparing java to C++ (or C) for doing extensive number
crunching (e.g. multiplying 100,000 vectors in three space by various
matrics, and maybe even a lot of trig used to generate those matrices,
so lots of addition and multiplication). I am using C++ for a number
crunching intensive project because I have been so insistently
informed that Java is slow at number crunching and because a discrete
fast fourier procedure I wrote really did seem surprisely slow in Java
(but hey, maybe thats my fault, though I did use the fast version of
the transform). But if Java is not really lagging on number crunching
I would love to switch as I like Java so much. I could write my own
little test program easily enough, but everytime someone posts that
they have done so it seems like there are a lot of explanations posted
about why whatever they wrote is a bad test. Are there any accepted
"good tests" on number crunching that have been run recently?

There is only one test that can accurately predict the performance of
your code - running your code.

Here's what I would do in your situation:

1. Extract from a few of your programs pieces of code that are
relatively small but take a high proportion of the run time.

2. Write programs around those pieces of code that set up typical test
data and check the results. These programs should also be dominated by
the code from step 1.

3. Re-implement a step 2 program in Java. Compare the new performance.
If it is good enough for your purposes, repeat for each of the programs.
If one of the jobs does not run well enough, rerun it on each new major
release of Java, but stick with C++.

If, on the other hand, Java does well enough on each of the tests, then
start writing some of your new programs in Java.

This procedure is not designed to answer some great absolute "Is Java
good for number crunching?" question. It is designed to answer the
question of whether you would get performance you like if you switched
to Java for the programs you are writing.

Patricia
 
K

Kenneth P. Turvey

I understand that. But that's a special case. For a very broad class
of algorithms, that caveat doesn't apply and the generalization stated
-- "C++ will be faster with object creation and destruction" -- would
not be valid.

Also, while there is overhead associated with collecting objects, it can
be relatively inexpensive, especially if no heap compaction is required
(many allocation patterns lend themselves to that situation).

It's certainly true that one can come up with scenarios in which C++
handles memory management faster than C# (and I guess from your
comments, Java). But the converse is true as well, and IMHO there's no
valid generalization that correctly describes the relative performance
characteristics of those languages. The best one can say is "it
depends".

I can't really give you the details on why these algorithms work out to
be faster in C, but I can give you my experience. The one thing that
might be an issue is that structures that might have been allocated on
the stack in C, end up in the heap in Java.

In my experience, and I'm strictly talking about code that doesn't really
ever have a time when it isn't doing something here, C++ will be much
faster if the code creates a lot of objects. I can't really give you the
reasons behind this, but only the results. I also can't really say
anything about C#, since I haven't programmed on that platform.
 
K

Kenneth P. Turvey

All experience show that GC is more efficient than explicit deallocation
at the cost of poorer real time characteristics.

That may be the consensus, but I know that in my experience (primarily
evolutionary computation and image processing) Java has not performed as
well as C when one starts to create many objects.

There are many good reasons to choose Java, but performance isn't usually
one of them. YMMV
 
P

Peter Duniho

I can't really give you the details on why these algorithms work out to
be faster in C, but I can give you my experience. The one thing that
might be an issue is that structures that might have been allocated on
the stack in C, end up in the heap in Java.

Could very well be. And C# doesn't have that limitation, since it has the
concept of non-reference (value) types, which can be allocated on the
stack.

That said, it seems to me that your original statement could use
refinement. Specifically, you didn't qualify the general "C++ will be
faster" statement with "in my experience". Only the "a lot faster", which
implies to me that you're saying C++ is always faster, and in your
experience it's always faster by "a lot".

I believe in fact, especially given what else others have written here,
that it's likely that it's not true that C++ is always faster. I can
easily believe that for a certain class of algorithms, C++ is always
faster with respect to memory management, but that's a lot different from
saying that it's always faster.

That's all I'm trying to say.

Pete
 
P

Peter Duniho

[...]
There are many good reasons to choose Java, but performance isn't usually
one of them. YMMV

I agree that performance isn't usually the reason one chooses Java. And
especially in the context of this thread, I believe that's true ("number
crunching").

However, there are actually valid performance-based reasons for using an
environment like Java where a framework is provided. It is often the case
that application code spends very little time executing the code delivered
with the application. The API to which the application was written is
where most of the execution is done, and if that API has a
high-performance implementation, then one can often get better performance
using that API than trying to write it oneself (especially for a given
amount of effort).

I can't speak to any specific decision anyone's made along those lines
with respect to Java (I'm far too inexperienced with Java to have any
first-hand exposure to that sort of thing), but I have experience with
other APIs in which counter-intuitively it improved performance to code to
a framework/API that at first glance seems to add overhead. Because the
performance advantage from use thoroughly tested and optimized
implementations of costly operations exceeds the overhead of whatever's
required in order to use the framework/API, the net is a gain.

Again, I'm not sure any of that is relevant in this thread. It's just
that your comment brought it to mind, and I can't help but mention it. :)

Pete
 
S

Stefan Ram

Peter Duniho said:
for a certain class of algorithms, C++ is always
faster with respect to memory management,

Quotations regarding memory management:

»Your essay made me remember an interesting phenomenon I
saw in one system I worked on. There were two versions of
it, one in Lisp and one in C++. The display subsystem of
the Lisp version was faster. There were various reasons,
but an important one was GC: the C++ code copied a lot of
buffers because they got passed around in fairly complex
ways, so it could be quite difficult to know when one
could be deallocated. To avoid that problem, the C++
programmers just copied. The Lisp was GCed, so the Lisp
programmers never had to worry about it; they just passed
the buffers around, which reduced both memory use and CPU
cycles spent copying.«

<[email protected]>

A lot of us thought in the 1990s that the big battle would
be between procedural and object oriented programming, and
we thought that object oriented programming would provide
a big boost in programmer productivity. I thought that,
too. Some people still think that. It turns out we were
wrong. Object oriented programming is handy dandy, but
it's not really the productivity booster that was
promised. The real significant productivity advance we've
had in programming has been from languages which manage
memory for you automatically.

http://www.joelonsoftware.com/articles/APIWar.html

Regarding the topic of this thread:

»Java running faster than C«

http://paulbuchheit.blogspot.com/2007/06/java-is-faster-than-c.html

»Java theory and practice: Urban performance legends,
revisited Allocation is faster than you think, and getting
faster«

http://www.ibm.com/developerworks/java/library/j-jtp04223.html

»Java vs. C benchmark«

http://www.stefankrause.net/wp/?p=4
http://www.stefankrause.net/wp/?p=6

»Performance of Java versus C++«

http://www.idiom.com/~zilla/Computer/javaCbenchmark.html

»The Computer Language Benchmarks Game«

http://shootout.alioth.debian.org/

»How many times faster or smaller are the Java 6 -server
programs than the corresponding C GNU gcc programs?«

http://shootout.alioth.debian.org/debian/java.php
 
L

Lew

Kenneth said:
In my experience, and I'm strictly talking about code that doesn't really
ever have a time when it isn't doing something here, C++ will be much
faster if the code creates a lot of objects. I can't really give you the
reasons behind this, but only the results. I also can't really say
anything about C#, since I haven't programmed on that platform.

Have you actually measured these speed differences, or is this just a fuzzy
feeling you have?
 
L

Lew

Mark said:
It depends. Java can often be faster with multi threaded code --- the
standard C/C++ allocators have to use locking around every alloc/free
whereas Java allocators are often lock free even on multiprocessors. If
your storage structure (and object lifetime) is sufficiently complex
that your C++ code uses reference counting, then Java's garbage
collector can be a lot faster (reference counting with locking is
relatively slow).

To which of Java's several garbage collectors does your comment apply?

Young generation collections in Java are very fast, influenced only by the
number of live objects; dead ones do not add to the GC time.
 
K

Kenneth P. Turvey

Have you actually measured these speed differences, or is this just a
fuzzy feeling you have?

I haven't measured them, but they are clearly apparent in the algorithms
I've worked on. Usually the Java program takes longer than the C program
by a multiple greater than 2.

Now, if you program in Java in much the same way you would in C, that is
you don't create any objects, Java is actually faster much of the time.
If however, you program in a way that is natural in Java and best
describes the algorithm you are implementing, you'll find that Java is
much slower than C.

This should all be prefaced with, "In my experience.. ".
 
M

Mark Thornton

Peter said:
I understand that. But that's a special case. For a very broad class
of algorithms, that caveat doesn't apply and the generalization stated
-- "C++ will be faster with object creation and destruction" -- would
not be valid.

The original question related to "Number Crunching" which usually falls
into that special case.

Mark Thornton
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top