I went ahead and tried that...unfortunately I can't get gcvt() to work
right in stdlib...maybe someone knows a better way to get the time out.
I changed your code around a bit, increased the nuimber of loops in the inner
loop, and re-arranged the code to allow the JIT to work without dependng on
on-stack replacement. I also added a case where I explicitly coded a
bound-check in addition to whatever the compiler provided for me (I won't post
code here for reasons which will become apparent).
Comparing the following compilers:
gcc is the MinGW version of gcc 3.4.2 running at -O3 (no
other optimisation-related options).
msvc is the C++ compiler from MS Visual Studio running
with all time-releted optimisations turned on, bounds checking
turned off, and optimising for a P4 processor.
java is the "server" JVM from JDK 1.5.0_6
I got times (averaged over the last 10 of 20 runs -- which allows the JIT to
get its act together):
gcc: 278.4 138.2
msvc: 210.3 0.0
java: 332.6 209.2
The first column is the time for the loop with explicit bounds checks, the
second column is the time without explict bounds checks.
Some things to note.
One is that I was surprised that only MSVC managed to optimise out the useless
loop -- normally java -server is inconveniently good at optimising away
benchmark code. However, it is clear that we need a better benchmark.
Another is that the (hypothetical) two consecutive sets of bounds checks in the
java version don't get conflated into one check, nor made to vanish by the
processors internal caching and pipelining.
There's another important point -- in this loop, we are not really doing
anything /with/ the data, and the next array access is dependent on the value
retreived in the previous step. This will not allow the processors pipeline to
work correctly, and is also extremely unrepresentative.
So I added a couple of lines of code to sum each value as it was pulled out of
the array, and then return the overall (meaningless) sum to the test harness
where it was used in a conditional. That not only keeps the optimiser honest,
but also provides something for the processors arithmetic units to do while it
is waiting for the next element of the array to be pulled out of (probably) L1
cache. Fortunately, the test array is small enough that we don't need to worry
about the random access into the array providing a seriously false load to the
L1 and L2 caches, or the benchmark would be completely useless.
I will append the C++ and Java code to the end of this message.
With the changes, the times I saw were:
gcc: 272.4 207.2
msvc: 342.6 205.2
java: 380.6 211.3
(Again, that's the average of the last 10 runs -- I'll append the raw data to
this message too)
Notice:
There is very little difference between the runtimes for the three compilers
when there are no explicit bounds checks.
Java is still not managing to fold the explicit and implicit bounds checks
together, and neither is the processor managing to do so. I find this very
surprising. Maybe the JIT is emitting code to tell the CPU to expect the
explicit tests to fail, and thus buggering up the speculative execution that it
would otherwise be able to do. The MS compiler seems as if it might be doing
something similar, but GNU isn't. But that is pure (and not very willl
informed, to be honest) speculation, so who knows...
Java without explicit bounds checks is about as fast in this test as in the
previous one, thus supporting my feeling that the previous version wasn't
making representative use of the CPUs pipelining and out-of-order execution.
So, broadly speaking, I think the effect of Java's bounds checking can be
characterised as somewhere between trivial and zero. Of course, any such
assertion must also note that this is only one micro-benchmark. I have
attempted to make it at least minimally realistic, but for real purposes you
would need to measure real code working on real data, with real access
patterns.
-- chris
============== TestClass.java =============--
public class TestClass
{
int
nextPointer(int[] array, int point)
{
return array[point];
}
int
timeitWithoutCheck(int[] array)
{
int pointer = 0;
int sum = 0;
for (int i = 0; i < 100000000; i++)
{
pointer = nextPointer(array, pointer);
sum += pointer;
}
return sum; // meaningless
}
int
timeitWithCheck(int[] array, int size)
{
int pointer = 0;
int sum = 0;
for (int i = 0; i < 100000000; i++)
{
if (pointer < 0) return 0;
if (pointer >= size) return 0;
pointer = nextPointer(array, pointer);
sum += pointer;
}
return sum; // meaningless
}
public static void
main(String[] args)
{
TestClass test = new TestClass();
int[] array = new int[3000];
for (int i = 0; i < 3000; i++)
array
= (int)(Math.random() * 3000d) % 3000;
for (int times = 0; times < 20; times++) // interp vs. comp
{
long start;
start = System.currentTimeMillis();
int s1 = test.timeitWithCheck(array, 3000);
long with = System.currentTimeMillis() - start;
start = System.currentTimeMillis();
int s2 = test.timeitWithoutCheck(array);
long without = System.currentTimeMillis() - start;
System.out.println(with + "\t" + without);
if (s1 != s2)
System.out.println("more fun fooling the optimiser");
}
}
}
============ TestClass.cpp ==================
#include <stdio.h>
#include <stdlib.h>
#include <sys/timeb.h>
int
nextPointer(int *array, int point)
{
return array[point];
}
int
timeitWithoutCheck(int *array)
{
int pointer = 0;
int sum = 0;
for (int i = 0; i < 100000000; i++)
{
pointer = nextPointer(array, pointer);
sum += pointer;
}
return sum; // meaningless
}
int
timeitWithCheck(int *array, int size)
{
int pointer = 0;
int sum = 0;
for (int i = 0; i < 100000000; i++)
{
if (pointer < 0) return 0;
if (pointer >= size) return 0;
pointer = nextPointer(array, pointer);
sum += pointer;
}
return sum; // meaningless
}
long
currentTimeMillis()
{
struct timeb times;
ftime(×);
return times.time * 1000 + times.millitm;
}
int
main(int argc, char **argv)
{
int* array = new int[3000];
for (int i = 0; i < 3000; i++)
array = (int)(rand()) % 3000;
for (int times = 0; times < 20; times++) // interp vs. comp
{
long start;
start = currentTimeMillis();
int s1 = timeitWithCheck(array, 3000);
long with = currentTimeMillis() - start;
start = currentTimeMillis();
int s2 = timeitWithoutCheck(array);
long without = currentTimeMillis() - start;
printf("%ld\t%ld\n", with, without);
if (s1 != s2)
printf("more fun fooling the optimiser\n");
}
}
================ Raw data =====================
GCC -O3:
450 211
270 261
310 200
281 200
280 201
280 200
271 210
270 211
280 200
281 200
271 210
270 210
270 210
271 210
270 201
280 200
281 200
270 211
270 210
271 210
MS VS2003 + all time opts
521 210
340 201
340 210
341 200
341 200
341 200
440 201
350 201
340 210
341 210
341 200
350 201
350 200
341 210
341 200
341 210
340 211
340 200
341 210
341 210
JAVA -server:
471 220
371 210
371 210
381 200
370 211
370 211
380 200
381 200
381 200
381 210
421 250
381 200
380 211
370 210
381 200
381 200
381 210
371 210
370 211
370 211
======================================