K
Kent Johnson
I rarely find myself acting as an apologist for Java, and I understand
that the point Alex is making is that Python's performance for this
operation is quite good, and that the OP should post some code, but this
is really too unfair a comparison for me not to say something.
There are two major differences between these two programs:
- The Java version is doing a character by character copy; the Python
program reads the entire file into a buffer in one operation.
- The Java program is converting the entire file to and from Unicode;
the Python program is copying the literal bytes.
Here is a much more comparable Java program (that will fail if the file
size is over 2^31-1):
import java.io.*;
public class Copy {
public static void main(String[] args) throws IOException {
File inputFile = new File("/usr/share/dict/web2");
int bufferSize = (int)inputFile.length();
File outputFile = new File("/tmp/acopy");
FileInputStream in = new FileInputStream(inputFile);
FileOutputStream out = new FileOutputStream(outputFile);
byte buffer[] = new byte[bufferSize];
int len=bufferSize;
while (true)
{
len=in.read(buffer,0,bufferSize);
if (len<0 )
break;
out.write(buffer,0,len);
}
in.close();
out.close();
}
}
Here are the results I get with this program and Alex's Python program
on my G4-400 Mac:
kent% time java Copy
0.440u 0.320s 0:00.96 79.1% 0+0k 9+3io 0pf+0w
kent% time python Copy.py
0.100u 0.120s 0:00.31 70.9% 0+0k 2+4io 0pf+0w
The Python program is still substantially faster (3x), but with nowhere
near the margin Alex saw.
Kent
that the point Alex is making is that Python's performance for this
operation is quite good, and that the OP should post some code, but this
is really too unfair a comparison for me not to say something.
There are two major differences between these two programs:
- The Java version is doing a character by character copy; the Python
program reads the entire file into a buffer in one operation.
- The Java program is converting the entire file to and from Unicode;
the Python program is copying the literal bytes.
Here is a much more comparable Java program (that will fail if the file
size is over 2^31-1):
import java.io.*;
public class Copy {
public static void main(String[] args) throws IOException {
File inputFile = new File("/usr/share/dict/web2");
int bufferSize = (int)inputFile.length();
File outputFile = new File("/tmp/acopy");
FileInputStream in = new FileInputStream(inputFile);
FileOutputStream out = new FileOutputStream(outputFile);
byte buffer[] = new byte[bufferSize];
int len=bufferSize;
while (true)
{
len=in.read(buffer,0,bufferSize);
if (len<0 )
break;
out.write(buffer,0,len);
}
in.close();
out.close();
}
}
Here are the results I get with this program and Alex's Python program
on my G4-400 Mac:
kent% time java Copy
0.440u 0.320s 0:00.96 79.1% 0+0k 9+3io 0pf+0w
kent% time python Copy.py
0.100u 0.120s 0:00.31 70.9% 0+0k 2+4io 0pf+0w
The Python program is still substantially faster (3x), but with nowhere
near the margin Alex saw.
Kent
Alex said:OK, could you provide a simple toy example that meets these conditions
-- does lot of identical disk-intensive I/O "in batch" -- and the
execution speed measured (and on what platform) for what Python and Java
implementations, please?
For example, taking a trivial Copy.java from somewhere on the net:
import java.io.*;
public class Copy {
public static void main(String[] args) throws IOException {
File inputFile = new File("/usr/share/dict/web2");
File outputFile = new File("/tmp/acopy");
FileReader in = new FileReader(inputFile);
FileWriter out = new FileWriter(outputFile);
int c;
while ((c = in.read()) != -1)
out.write(c);
in.close();
out.close();
}
}
and I observe (on an iBook 800, MacOSX 10.3.5):
kallisti:~ alex$ java -version
java version "1.4.2_05"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_05-141.3)
Java HotSpot(TM) Client VM (build 1.4.2-38, mixed mode)
-r--r--r-- 1 root wheel 2486825 12 Sep 2003 /usr/share/dict/web2
kallisti:~ alex$ time java Copy
real 0m7.058s
user 0m5.820s
sys 0m0.390s
versus:
kallisti:~ alex$ time python2.4 Copy.py
real 0m0.296s
user 0m0.080s
sys 0m0.170s
with Python 2.4 beta 1 for the roughly equivalent:
inputFile = file("/usr/share/dict/web2", 'r')
outputFile = file("/tmp/acopy", 'w')
outputFile.write(inputFile.read())
inputFile.close()
outputFile.close()
which isn't all that far from highly optimized system commands:
kallisti:~ alex$ time cp /usr/share/dict/web2 /tmp/acopy
real 0m0.167s
user 0m0.000s
sys 0m0.040s
kallisti:~ alex$ time cat /usr/share/dict/web2 >/tmp/acopy
real 0m0.149s
user 0m0.000s
sys 0m0.090s
I'm sure the Java version can be optimized easily, too -- I just grabbed
the first thing I saw off the net. But surely this example doesn't
point to any big performance issue with Python disk I/O wrt Java. So,
unless you post concrete examples yourself, the smallest the better,
it's going to be pretty difficult to understand where your doubts are
coming from!
Alex