Why is Java so slow????

  • Thread starter Java Performance Export
  • Start date
J

Jack Marsh

Java said:
=========================================================================

import java.util.Date;

class t1
{
static public void main(String[] argv)
{
int lim = new Integer(argv[0]);
int nbench = new Integer(argv[1]);
int b;
for (b=0; b < nbench; b++) {
System.err.println("Bench " + b);
Date start = new Date();
mytest(lim);
Date now = new Date();
System.err.println("Took " + ((now.getTime() -
start.getTime())/1000) + " seconds");
}
}

static public void mytest(int lim)
{
int i;
for (i=0; i < lim; i++)
System.out.println("This is line " + i);
}
}
try this instead version of mytest, it wont be as fast as C but it
should be much faster than your code. On my XP computer it was a little
over 10x faster than your version. I used StringBuffer rather than
StringBuilder because StringBuffer has a length() method.

static public void mytest(int lim)
{
final int bufferSize = 100000;
StringBuffer b = new StringBuffer(bufferSize+100);

for (int i=0; i < lim; i++)
{
b.append("This is line " + i +"\n");
if ( b.length() >bufferSize) {
System.out.print(b.toString());
b = new StringBuffer(bufferSize+100);
}
}
// tidy up
if ( b.length() > 0) {
System.out.print(b.toString());
}

}
 
M

Mark Space

Bent said:
This could be because your console window needs to scroll less?


System.setOut(new PrintStream(System.out));
might work, after a fashion :)

I don't think this works the way you think it will.

First, System.out already is a PrintStream. When a PrintStream is
created, the constructor just takes the existing object, and stores it
in its "out" instance variable. So you end up with a new PrintStream,
that contains another PrintStream (the original System.out object) which
already has the autoflush variable set to true.

I haven't tried it, but I bet that is what the debugger would say.

Note that this is weird too:
OutputStream os = new BufferedOutputStream( System.out );

Guess what this makes?

Same as your idea, it makes a BufferedOutputStream that simply wraps the
existing PrintStream which is System.out (autoflush still set to true!).
System.out already wraps a BufferedOutputStream, however. And that
BufferedOutputStream wraps a FileOutputStream. Finally, something that
doesn't wrap something else! I think FileOutputStream handles all the
real work of writing characters to the OS.

Why is wrapping with a second BufferedOutputStream so much faster? I
dunno, but I'm guessing it does buffer, and it does cut down the calls
to PrintStream. PrintStream will still autoflush though, just perhaps
less often.
 
R

Roedy Green

I'm wondering if anyone can help me understand why my Java is being
very slow compared to an equivalent program written in "C".

But it isn't. It is slower to start. It has a bigger footprint, but
with Jet, Java optimisation beats hand coded assembler. Java is a
much more orderly language than C. So many of C's seat-of-the-pants
tricks with freewheeling pointers and overlaying different data in the
same cells inhibit optimisation.

see http://mindprod.com/jgloss/jet.html
 
B

Bent C Dalager

I don't think this works the way you think it will.

First, System.out already is a PrintStream. When a PrintStream is
created, the constructor just takes the existing object, and stores it
in its "out" instance variable. So you end up with a new PrintStream,
that contains another PrintStream (the original System.out object) which
already has the autoflush variable set to true.

From reading PrintStream's javadoc, what the wrapping PrintStream
ought to be doing is handle its own buffer and its own (absence of)
autoflush so that whatever data goes through to the wrapped one comes
in much larger chunks. That the wrapped one then autoflushes these
large chunks shouldn't really matter.

Of course, this may not actually be what happens :)
Why is wrapping with a second BufferedOutputStream so much faster? I
dunno, but I'm guessing it does buffer, and it does cut down the calls
to PrintStream. PrintStream will still autoflush though, just perhaps
less often.

The problem is that System.out's PrintStream autoflushes most things,
so the trick would be to wrap it into something that does /not/
autoflush but waits until some reasonably-sized buffer is full before
passing it on. My assumption was that a default PrintStream would do
this because it says that it defaults to not auto-flushing. A wrapping
BufferedOutputStream would basically do the same thing.

Of course, I haven't actually tried either of them so it's all
speculation.

Cheers,
Bent D
 
N

Nigel Wade

Java said:
Hi all,

I'm wondering if anyone can help me understand why my Java is being
very slow compared to an equivalent program written in "C".

I'm simply trying to print out the first N integers like

"This is line <nnnn>"

as a simple benchmark.

My Java version is over 60 times slower than my "C" version and I
would like to establish a lower bound on how long the very fastest
Java version could take, by applying every possible performance
speedup availalbe in the Java environment.

I've profiled with "-Xrunhprof" and looked at the output (below) and
was surprised by what I saw. Over 50 different methods are involved
before I arrive at the point where 80% of the cumulative CPU usage for
the run is accounted for! What the heck is this stuff?????

Is this really happening, and is there a way to get around it?

My client is threatening to implement in "C" and I am trying to talk
him out of it.

I'd be very curious to see how this equivalent benchmark peforms on
others' environments.

There may well be other issues with a benchmark which, on the surface, appears
to be a very simple text manipulation program.

As a simple test on a mainstream Linux installation (RedHat 4) running the
standard GNU diff utility with the default UTF8 character encoding is 40X
slower than the same diff if LANG=C is set first. Java uses UTF, C uses ASCII.
Not basing the entire benchmark on character/string manipulation might be a
good starting point.

There is no such animal as a simple benchmark, the simpler it is (or appears to
be) the more likely it is to be misleading or downright wrong. At the end of
the day the only meangingful benchmark is the actual code you need to run, or
as near an approximation to that code as you can manage.
 
J

Java Performance Expert

Hey Lew,
could you do me a favor and run the above as:

$ time java-server -cp build/classes testit.TimePrin
$ time ./timepr

and post the results. Note the prepending of "time"

This will show the mix of user CPU and real time and I think this will
clear up our mystery.

Thx.

JH
 
L

Lew

Java said:
Hey Lew,
could you do me a favor and run the above as:

$ time java-server -cp build/classes testit.TimePrin
$ time ./timepr

and post the results. Note the prepending of "time"

This will show the mix of user CPU and real time and I think this will
clear up our mystery.

$ time java-server -cp build/classes testit.TimePrin
Elapsed: 88.367 secs.

real 1m29.279s
user 0m7.269s
sys 0m8.294s

$ time ./timepr
Elapsed: 80.000000 secs.

real 1m19.347s
user 0m2.652s
sys 0m8.116s

Still not the 60:1 the OP reported.

If String encoding really does involve 40:1 performance degradation for Java,
that renders this particular benchmark moot. I'm going to recode it not to
use Strings and run again.
 
B

bugbear

Java said:
The useful conclusion that can be drawn is that a test generator
written in "C" will run faster, and so be more desirable to use, than
one written in Java, to the extent that it is desirable for each and
every component to run as fast as possible

Unlikely.

I would recommend identifying bottlenecks,
and putting any analysis and optimization effort,
either in environment, algorithm, communication protocol,,
implementation langugage
etc into those bottlenecks.

In a finite developement schedule, there is not enough time
to mindlessly make every component run as fast as possible.

I remember someone bragging that he'd optimised
some startup code (in a daemon module) to be 5 times
faster. He was confused when I wasn't impressed.

BugBear
 
J

Java Performance Expert

I would recommend identifying bottlenecks,
and putting any analysis and optimization effort,
either in environment, algorithm, communication protocol,,
implementation langugage
etc into those bottlenecks.

hi BugBear,

what I'm tyring to do is convince management that there isn't anything
we cannot do as fast in Java as in "C" and this initial exercise was
a simple example. It is more about managing the risk of being "up
against a wall" with regard to the ability to speed up any arbitrary
portion of the system.

I understand I have the JNI and native methods at my disposal as
well and this will probably be the life-saver.

FYI, I've rid my example of all references to String and it is
faster again by two. My latest version is below. Still far from
the "C" version in speed but it I think it is good enough.

Thx

Larry (a.k.a. the Java Hound)



import java.io.BufferedOutputStream;
import java.io.DataOutputStream;
import java.util.Date;

class t1
{
static byte[] prompt = "This is line ".getBytes();
static byte[] prompt2 = "\n".getBytes();

static public void main(String[] argv)
{
int lim = new Integer(argv[0]);
int nbench = new Integer(argv[1]);
int b;
for (b=0; b < nbench; b++) {
System.err.println("Bench " + b);
Date start = new Date();

try {
mytest(lim);
}
catch ( Exception e) {
System.err.println("Exception occurred");
System.err.println(e.toString());
}

Date now = new Date();
System.err.println("Took " + ((now.getTime() -
start.getTime())/1000) + " seconds");
}
}

static public void mytest(int lim) throws Exception
{
int i;
BufferedOutputStream bos = new
BufferedOutputStream(System.out, 1000000);
DataOutputStream dos = new DataOutputStream(bos);
for (i=0; i < lim; i++) {

// byte[] ibytes = new Integer(i).toString().getBytes();

writebytes(prompt, dos);
writebytes(iconv(i), dos);
writebytes(prompt2, dos);
}
dos.flush();
}

static public void writebytes(byte[] arr, DataOutputStream dos)
throws Exception
{
int n = arr.length;
int i;
for ( i=0; i < n; i++ ) {
dos.writeByte(arr);
}
}

static byte[] iconv(int i)
{
byte[] digs = new byte[20];
int ndig = 0;
while ( i >= 10 ) {
digs[ndig] = (byte) (48 + i % 10);
ndig++;
i = i / 10;
}
digs[ndig] = (byte) (48 + i);
ndig++;

byte[] result = new byte[ndig];
int dig;
int j = 0;
for (dig=ndig-1; dig>= 0; dig--) {
result[j] = digs[dig];
j++;
}
return result;
}
}
 
P

Patricia Shanahan

Java Performance Expert wrote:
....
what I'm tyring to do is convince management that there isn't anything
we cannot do as fast in Java as in "C" and this initial exercise was
a simple example. It is more about managing the risk of being "up
against a wall" with regard to the ability to speed up any arbitrary
portion of the system.
....

I'm afraid you have undertaken a mission that is doomed to failure. It
is vanishingly rare for programming language X to be at least as fast as
programming language Y for all applications.

In this case, I think part of the problem is a test I vaguely remember
seeing in a C runtime support library, back in the mists of the 1980's
when I was a compiler writer. The I/O library changed its buffering
strategy for standard out depending on whether it was going to a TTY or not.

Patricia
 
M

Mark Space

Bent said:
From reading PrintStream's javadoc, what the wrapping PrintStream
ought to be doing is handle its own buffer and its own (absence of)
autoflush so that whatever data goes through to the wrapped one comes
in much larger chunks. That the wrapped one then autoflushes these
large chunks shouldn't really matter.

Right. The new PrintStream doesn't autoflush. But the old one that it
wraps does. I don't think a PrintStream does any buffering itself, so
the first PrintStream will just pass the writes down to the the wrapped
PrintStream, which will do auto-flushes.

Set this up in an IDE and then run it with the debugger. Poke the
internal variables (especially the inherited one "out") and you'll see.
It's actually quite simple and un-mystical.
Of course, this may not actually be what happens :)


The problem is that System.out's PrintStream autoflushes most things,
so the trick would be to wrap it into something that does /not/
autoflush but waits until some reasonably-sized buffer is full before
passing it on. My assumption was that a default PrintStream would do
this because it says that it defaults to not auto-flushing. A wrapping
BufferedOutputStream would basically do the same thing.

Yes. Somewhere above there's a discussion on using the FileDescriptors,
which I didn't know where available.

PrintStream System.out = new PrintStream( new BufferedOutputStream(
FileDescriptor.out), true );

This is how I think the System.out variable gets set up. Obviously you
can do the same thing with FileDescriptor.out just setting the auto
flush parameter to false.

However testing this, for me the above code (with auto-flush set to
false) was a tad slower than:

BufferedOutputStream os = new BufferedOutputStream( System.out );

I don't know why this is. Possibly the JVM is doing some optimization
tricks on it's own PrintStream to make it faster. So using the special
optimized one is faster than double buffering all IO. Or it could be
bad testing on my part, other random factors, etc. Not really sure.

Also important to remember: println() is very, very slow. print() is a
little faster, even when printing newlines and with autoflush enabled.
All versions of write() are much faster by almost an order of magnitude,
and the fastest still is the three argument version, in some cases by a
factor of 2x over the other write()'s. I'm not sure why, but I think it
may have to do with virtual method overhead.

That's my one nickel summary of what's been discussed here.
 
M

Mark Space

Java said:
static byte[] iconv(int i)
{
byte[] digs = new byte[20];
int ndig = 0;
while ( i >= 10 ) {
digs[ndig] = (byte) (48 + i % 10);
ndig++;
i = i / 10;
}
digs[ndig] = (byte) (48 + i);
ndig++;

byte[] result = new byte[ndig];
int dig;
int j = 0;
for (dig=ndig-1; dig>= 0; dig--) {
result[j] = digs[dig];
j++;
}
return result;
}
}

Did this code buy you any speed improvements? I got very little
performance hit when I added converting an int to a string (and the
resultant encoding for going from a String to a byte[]) using the
regular Integer.toString(i).getBytes() method.

Just curious if it's worth it to use optimizations like this.
 
M

Mark Rafn

Funny username for this question ;)
I'm wondering if anyone can help me understand why my Java is being
very slow compared to an equivalent program written in "C".

Because the program is not equivalent.
I'm simply trying to print out the first N integers like
"This is line <nnnn>"
as a simple benchmark.

Too simple. There are different assumptions C and Java make about character
encoding, stdout buffering, and likely other things that mean the programs are
not equivalent.
My Java version is over 60 times slower than my "C" version and I
would like to establish a lower bound on how long the very fastest
Java version could take, by applying every possible performance
speedup availalbe in the Java environment.

For the simple version of this program, I get about 10:1 when I replace
system.out with 'new PrintStream("/dev/null")' to get similar buffering. I
suspect I could get closer to parity if I wanted to deal with bytes instead of
unicode characters, as C does.
My client is threatening to implement in "C" and I am trying to talk
him out of it.

If your client is printing millions of lines of pure ASCII to /dev/null,
then C is probably the better choice. If she wants something else, then
a different evaluation would be perhaps desired.

If you want to show performance strength of Java, pick a benchmark that the
VM can optimize better. I like the recursive fibonacci calculator, where
the VM's ability to do dynamic optimization at runtime beats out gcc's
static compiler optimation. (Yes, I know the algorithm is silly, but so is
any micro-benchmark):

[dagon tmp]$ time java -server -cp . Fib 45
fibonacci 45 is 1134903170
real 0m8.729s
user 0m8.400s
sys 0m0.000s

[markrafn tmp]$ time ./Fib 45
fibonacci 45 is 1134903170
real 0m13.650s
user 0m13.530s
sys 0m0.000s


Fib.java:
public class Fib {
/** get fibonacci number N */
public static long fib(int fibNum) {
if (fibNum < 3)
return 1;
return (fib(fibNum-1) + fib(fibNum-2));
}

public static void main(String[] args) {
int fibNum = Integer.parseInt(args[0]);
System.out.println("fibonacci " + fibNum + " is " + fib(fibNum));
}
}

Fib.c:
#include<stdio.h>

long fib(int num) {
if (num < 3)
return 1;
return (fib(num-1) + fib(num-2));
}

int main(int argc, char **argv, char **envv) {
int num = atoi(argv[1]);
printf("fibonacci %d is %ld\n", num, fib(num));
}
 
M

Mark Space

Java said:
static byte[] iconv(int i)
{
byte[] digs = new byte[20];

I poked around the Java sources and did some experimentation. Removing
object creation does seem to help. So does removing casting at runtime,
including casting primitives.

Here's my own version, based loosely on the Integer.toString(int) method
source.

/** Converts a POSITIVE integer to a byte [], with an emphasis on speed.
*
* @param buff The start, length and ASCII values are stored in this
* buffer.
* The buffer must be therefore at least the size of Integer.MAX_VALUE +
* 2.
* buff.start stores the offset of the first ASCII character.
* buff.length
* stores
* the length of the ASCII character string. Strings are written to the
* end of the buffer.ascii array.
* @param i MUST BE POSITIVE. This is not tested by the method. Stuff
* will explode in spectacular ways if you pass this routine a negative
* integer.
*/
static void fastItoS2( AsciiByteBuff buff, int i ) {
int index = buff.ascii.length - 1;
int q = i;
int r;
for(;;) {
r = q % 10;
q = q / 10;
buff.ascii[index--] = digits[r];
if( q == 0 )
break;
}
buff.start = (index + 1);
buff.slength = (buff.ascii.length - 1 - index);
}

private static class AsciiByteBuff {
public int start;
public int slength;
public byte [] ascii;
}

private static byte [] digits = { '0', '1', '2', '3', '4', '5', '6',
'7', '8', '9' };


Here is the driver for this routine:


static public void write3fastItoS2( String[] args ) throws
IOException {

int lim = DEFAULT_LINES;

if( args != null && args.length > 0 ) {
try {
lim = Integer.parseInt( args[0] );
} catch( NumberFormatException ex ) {
System.err.print( ex + "\nUsing default of " +
DEFAULT_LINES );
}
if( lim < 0 ) {
lim = DEFAULT_LINES;
System.err.println( "Lim less than 0.\n"
+ "Using defalut of " + DEFAULT_LINES );
}
}

BufferedOutputStream os = new BufferedOutputStream( System.out );

String message2 = "This is line ";
byte[] mbuff = message2.getBytes();
int mlength = mbuff.length;

AsciiByteBuff ibuff = new AsciiByteBuff();
ibuff.ascii = new byte
[Integer.toString(Integer.MIN_VALUE).length()
+ 2 ];

for( int i = 0; i < lim; i++ ) {
// os.write( mbuff );
os.write( mbuff, 0, mlength );
fastItoS2( ibuff, i );
os.write(ibuff.ascii, ibuff.start, ibuff.slength );
os.write('\n');
}
os.close();
}


I think that's right, I had to repair a couple of lines after the email
editor wrapped them.

Good luck.
 
M

Mark Thornton

Bent said:
Trivially rewriting it to use a wrapped FileOutputStream yields:

kandidat:~/tmp bcd$ time java -server -classpath . TimePrin
Elapsed: 5.223 secs.

real 0m5.410s
user 0m4.679s
sys 0m0.521s

This value of 5.41s is reasonably close to C's 4.68s, but the Java
program needed to be written specifically to output to file whileas C
achieved this performance by simple redirection.

That time comparison is what I would expect. As to the Java's default
buffering behaviour, it isn't unreasonable it just produces better
results in different circumstances. The C code output, viewed in an IDE,
may provide its output in 512 byte chunks which can equally annoying as
the slower speed for the Java default. I would say that the Java default
is less surprising than the C case. Both need changes when the default
is not what you want.

Mark Thornton
 
S

Steve Wampler

Mark said:
I poked around the Java sources and did some experimentation. Removing
object creation does seem to help. So does removing casting at runtime,
including casting primitives.

Here's my own version, based loosely on the Integer.toString(int) method
source.

Sweet! Here's the timings using this code (put into the Java benchmark
framework posted earlier by Java Performance Expert), on my machine,
for 50,000,000 lines of output. The timing for the C version are included
for comparision:

->time timepr 50000000 | cat >/dev/null
Elapsed: 16.930000 secs.
timepr 50000000 16.44s user 0.50s system 95% cpu 17.650 total
cat > /dev/null 0.10s user 0.57s system 3% cpu 17.649 total

->time java -server TimePrin3 50000000 1 | cat >/dev/null
Bench 0
Took 10.608 seconds
java -server TimePrin3 50000000 1 9.26s user 0.60s system 91% cpu 10.737 total
cat > /dev/null 0.17s user 0.68s system 7% cpu 10.699 total

C: gcc 4.1.1, -O4
Java: 1.6.0u3
CentOS4: dual 2GHz Opteron, 2GB ram
 
L

Lew

Steve said:
Sweet! Here's the timings using this code (put into the Java benchmark
framework posted earlier by Java Performance Expert), on my machine,
for 50,000,000 lines of output. The timing for the C version are included
for comparision:

->time timepr 50000000 | cat >/dev/null
Elapsed: 16.930000 secs.
timepr 50000000 16.44s user 0.50s system 95% cpu 17.650 total
cat > /dev/null 0.10s user 0.57s system 3% cpu 17.649 total

->time java -server TimePrin3 50000000 1 | cat >/dev/null
Bench 0
Took 10.608 seconds
java -server TimePrin3 50000000 1 9.26s user 0.60s system 91% cpu
10.737 total
cat > /dev/null 0.17s user 0.68s system 7% cpu 10.699 total

C: gcc 4.1.1, -O4
Java: 1.6.0u3
CentOS4: dual 2GHz Opteron, 2GB ram

Take that! You naysayers who assert, "Java can never be as fast as C!"
 
C

Crouchez

Why is Java so slow?

It's not usually that slow - it depends how you use it.

It's usually going to be a bit slower than C and C++ because it uses C and
C++ code to get access to the OS core libraries.

In the example you gave it is doing some peripheral stuff even before it
passes the bytes to the native display print so inherently it is going to
add an overhead.

Java is a bridge to the native OS, it takes a small extra amount of time to
cross the bridge but it's negligible most of the time and it makes up for
the fact because it's intended to be ported to other platforms.
 
M

massiccio

Lew,
On my system, it takes 9.21 seconds (Java) and 1.1 seconds (C)

Any ideas?

thanks
JH

Try to pass the "-Xcomp" option to the JVM (it will compile all the
code at startup -- by default at the beginning the VM runs in
interpreted mode). Furthermore, you should run the same test a few
times (let say inside a for loop): at the beginning the JVM "warms up"
and optimizes the code it executes, then it will run at full speed.

Michele
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,073
Messages
2,570,538
Members
47,195
Latest member
RedaMahuri

Latest Threads

Top