Java vs C++ speed (IO & Sorting)

B

Bo Persson

Razii said:
This time I used the new java.nio package for reading and writing.
Result (on my comp) are really pathetic for c++ :)

Also, since the program spends most of the time in reading and
writing the file, I removed sorting from both java and C++ version
(it's irrelevant to IO test anyway).

(for one bible.txt)

Time for reading and writing files: 94 ms (java)
Time for reading and writing files: 78 ms (java)
Time for reading and writing files: 63 ms (java)

Time for reading and writing file: 156 ms (c++)
Time for reading and writing file: 156 ms (c++)
Time for reading and writing file: 156 ms (++)
===== C++ Version =======
#include <ctime>
#include <fstream>
#include <iostream>
int main(int argc,char *argv[])
{

std::ifstream src("bible.txt");
std::eek:fstream dst("output.txt");
clock_t start=clock();
dst << src.rdbuf();

If you want do double the speed here, replace the line above with:

while(src.good())
{
char Buffer[1000];

src.read(Buffer, sizeof Buffer);
dst.write(Buffer, src.gcount());
}

clock_t endt=clock();

std::cout <<"Time for reading and writing file: " <<
double(endt-start)/CLOCKS_PER_SEC * 1000 << " ms\n";
return 0;
}


Benchmarks are hard.


Bo Persson
 
R

Razii

I've tried with that codes and the results changed a bit:

Just a bit? :) It changed like 3 to 4 times faster for the java
version on your linux.
both programs via the linux "time" command the average
complete execution time was:

If there is "time" command (or any freeware download) for windows, let
me know. I willing to test it. For now, the C++ version is two times
slower than java on my comp.
 
R

Razii

dst << src.rdbuf();

If you want do double the speed here, replace the line above with:

while(src.good())
{
char Buffer[1000];

src.read(Buffer, sizeof Buffer);
dst.write(Buffer, src.gcount());
}

Yes, it did improve the speed (even doubled for 3 meg file), but for
43 meg file it was still slower than java version.

And for 30x (119 meg) file, I got times like

Time for reading and writing file: 7390 ms (c++)
Time for reading and writing file: 5594 ms (c++)
Time for reading and writing file: 3969 ms (c++)

30x file (119 meg) file with java

Time for reading and writing files: 2219 ms (java)
Time for reading and writing files: 2156 ms (java)
Time for reading and writing files: 2250 ms (java)
Time for reading and writing files: 2453 ms (java)

What's the explanation? There is something wrong somewhere for c++
version.
 
R

Razii

For the basic operations carried out in this example code, are you seriously
suggesting that it would be better to have JDK 1.6.x rather than JDK 1.5.x?

I suggested nothing. You are putting words into mouth. I gave him the
latest version number.
I needed to move up from, say, JDK 1.4 to try make _this_ Java program competitive.

Huh? Why? Isn't 1.4 4 or 5 years old? (I rememer I had 1.3 when I was
here in 2001). Don't you think JIT technology would have
changed/improved in that many years?
 
B

Bo Persson

Razii said:
dst << src.rdbuf();

If you want do double the speed here, replace the line above with:

while(src.good())
{
char Buffer[1000];

src.read(Buffer, sizeof Buffer);
dst.write(Buffer, src.gcount());
}

Yes, it did improve the speed (even doubled for 3 meg file), but for
43 meg file it was still slower than java version.

And for 30x (119 meg) file, I got times like

Time for reading and writing file: 7390 ms (c++)
Time for reading and writing file: 5594 ms (c++)
Time for reading and writing file: 3969 ms (c++)

30x file (119 meg) file with java

Time for reading and writing files: 2219 ms (java)
Time for reading and writing files: 2156 ms (java)
Time for reading and writing files: 2250 ms (java)
Time for reading and writing files: 2453 ms (java)

What's the explanation? There is something wrong somewhere for c++
version.

Yes, obviously the default file buffering is not optimal. Let us "fix"
that:

char Cache[150000000];

int main()
{

std::ifstream src("bible.txt");
std::eek:fstream dst("output.txt");

dst.rdbuf()->pubsetbuf(Cache, sizeof Cache);

clock_t start=clock();

//etc.


This actually moves the physical disk write to the end of the main
function - after we have displayed the result. Doesn't happen in the
Java version, does it?


Bo Persson
 
R

Razii

Yes, obviously the default file buffering is not optimal. Let us "fix"
that:

char Cache[150000000];

int main()
{

std::ifstream src("bible.txt");
std::eek:fstream dst("output.txt");

dst.rdbuf()->pubsetbuf(Cache, sizeof Cache);

clock_t start=clock();

//etc.

For 119 meg file, I got times like

C:\>CopyFile
Time for reading and writing file: 3750 ms
C:\>CopyFile
Time for reading and writing file: 3718 ms
C:\>CopyFile
Time for reading and writing file: 3703 ms
C:\>CopyFile
Time for reading and writing file: 3703 ms
C:\>CopyFile
Time for reading and writing file: 3766 ms


for Java it was

Time for reading and writing files: 2219 ms (java)
Time for reading and writing files: 2156 ms (java)
Time for reading and writing files: 2250 ms (java)
Time for reading and writing files: 2453 ms (java)


The compiler options were C:\>cl /O2 CopyFile.cpp

Why the difference?

I used this

#include <ctime>
#include <fstream>
#include <iostream>

char Cache[150000000];

int main(int argc,char *argv[])
{

std::ifstream src("bible3.txt");
std::eek:fstream dst("output.txt");
clock_t start=clock();
dst.rdbuf()->pubsetbuf(Cache, sizeof Cache);
dst << src.rdbuf();
clock_t endt=clock();

std::cout <<"Time for reading and writing file: " <<
double(endt-start)/CLOCKS_PER_SEC * 1000 << " ms\n";
return 0;
}

Is that what you meant?
 
S

stan

Razii said:
Since you have a very thin skin, you will always find something that
bothers you and gets under your skin, whether I am here or not. I
suggest you learn the filers that come with your newsreader client.
Add me to ignore and live happy ever after. How about that?

Or, you could learn some manners and find an appropriate group to post
your wisdom into. Otherwise you'll just have to put up with people who
won't ignore your rudeness. For the record, you know nothing about me.
After 30 years in the military, my skin is anything but thin. But you
did make a funny that I will pass on your observation as people who do
know me will find it pretty funny.
 
A

adramolek

Or, you could learn some manners and find an appropriate group to post
your wisdom into. Otherwise you'll just have to put up with people who
won't ignore your rudeness. For the record, you know nothing about me.
After 30 years in the military, my skin is anything but thin. But you
did make a funny that I will pass on your observation as people who do
know me will find it pretty funny.

LOL @ STAN TAPPIN ON DA FISH TANK GLASSS!
 
R

Razii

#include <ctime>
#include <fstream>
#include <iostream>

char Cache[150000000];

int main(int argc,char *argv[])
{

std::ifstream src("bible3.txt");
std::eek:fstream dst("output.txt");
clock_t start=clock();
dst.rdbuf()->pubsetbuf(Cache, sizeof Cache);
dst << src.rdbuf();
clock_t endt=clock();

std::cout <<"Time for reading and writing file: " <<
double(endt-start)/CLOCKS_PER_SEC * 1000 << " ms\n";
return 0;
}

Another problem is that with very large zip file (800 meg) the above
c++ version doesn't copy file. The output was
C:\>CopyFile
Time for reading and writing file: 0 ms

though it worked for 119 meg txt file
 
R

Razii

Or, you could learn some manners and find an appropriate group to post
your wisdom into.

Or you can stop ruining the thread by continuously whining like a
10-year old.
Otherwise you'll just have to put up with people who won't ignore your
rudeness. For the record, you know nothing about me.

You don't bother me... keep on whining (even though the thread is
right no topic).
my skin is anything but thin.

No it isn't. Whining about a topic in an unmoderated USENET newsgroup
is something that a newbie with thin skin does. If you knew anything
about USENET, you would know you (or anyone else) have no control on
what is posted here. Only a fool will waste time by whining about a
thread instead of ignoring it.
 
B

Bo Persson

Razii said:
Yes, obviously the default file buffering is not optimal. Let us
"fix" that:

char Cache[150000000];

int main()
{

std::ifstream src("bible.txt");
std::eek:fstream dst("output.txt");

dst.rdbuf()->pubsetbuf(Cache, sizeof Cache);

clock_t start=clock();

//etc.

For 119 meg file, I got times like

C:\>CopyFile
Time for reading and writing file: 3750 ms
C:\>CopyFile
Time for reading and writing file: 3718 ms
C:\>CopyFile
Time for reading and writing file: 3703 ms
C:\>CopyFile
Time for reading and writing file: 3703 ms
C:\>CopyFile
Time for reading and writing file: 3766 ms


for Java it was

Time for reading and writing files: 2219 ms (java)
Time for reading and writing files: 2156 ms (java)
Time for reading and writing files: 2250 ms (java)
Time for reading and writing files: 2453 ms (java)


The compiler options were C:\>cl /O2 CopyFile.cpp

Why the difference?

I used this

#include <ctime>
#include <fstream>
#include <iostream>

char Cache[150000000];

int main(int argc,char *argv[])
{

std::ifstream src("bible3.txt");
std::eek:fstream dst("output.txt");
clock_t start=clock();
dst.rdbuf()->pubsetbuf(Cache, sizeof Cache);
dst << src.rdbuf();
clock_t endt=clock();

std::cout <<"Time for reading and writing file: " <<
double(endt-start)/CLOCKS_PER_SEC * 1000 << " ms\n";
return 0;
}

Is that what you meant?

No, now you removed the read buffer. :)

#include <ctime>
#include <fstream>
#include <iostream>

char Cache[150000000];

int main()
{

std::ifstream src("bible.txt");
std::eek:fstream dst("output.txt");

dst.rdbuf()->pubsetbuf(Cache, sizeof Cache);

clock_t start=clock();

// dst << src.rdbuf();
while(src.good())
{
char Buffer[1000];

src.read(Buffer, sizeof Buffer);
dst.write(Buffer, src.gcount());
}

clock_t endt=clock();

std::cout <<"Time for reading and writing file: " <<
double(endt-start)/CLOCKS_PER_SEC * 1000 << " ms\n";
return 0;
}

That gets me about 800 ms on my machine. It turns to 5800 ms if I add
a dst.close() before the final clock() call. Totally I/O-bound - has
nothing to do with the languages involved. If you have a faster hard
disk, I bet you will get 700 instead of 800 ms.

The problem with

// dst << src.rdbuf();

is that it reads the file character for character, looking for an EOF.
The read() and write() functions do not.


So, I shaved 50% off the execution time by using the Buffer and a more
efficient read(). Then got another 80% reduction by cheating in the
benchmark (moving the bulk of the work to the destructor). Note that I
wrote "fix" in the previous message.


Benchmarks are hard.


Bo Persson
 
J

James Kanze

This makes me curious: Could you elaborate?

Two big issues: the fact that you cannot easily implement
programming by contract in interfaces, and the fact that you
cannot easily separate the class definition from the definition
of the member functions.

For the first, you can use abstract classes instead of
interfaces, but the actual virtual functions still have to be
protected (private is better), and you loose the possibility of
multiple inheritance.

For the second, you can use interfaces to work around the
problem more or less well, and the C++ solution is also far from
ideal. (Much better are Ada or Modula-3.) But it's still more
awkward in Java than in C++.

The fact that you must dynamically link every class is also an
endless source of problems. In C++, of course, when you want
to be sure that the program works on site, you static link
almost everything---so that the version of the library used at
the deployment site cannot possibly be different from the one
you tested with. In Java, we even had a special name for this:
CLASSPATH hell.

There are also a lot of little things which get on ones nerves:
the fact that a constructor of a base class can call a function
in a derived class before the derived class' constructor has
even started, for example, has caused me grief in the past, and
the fact that there's absolutely no way to crash the program
when you know it's failed can be pretty frustrating as well (and
also means that you cannot use the language in critical
systems).
 
J

James Kanze

Or their Internet connection was down for a couple of days:).
(Note the discussions concerning the relative merits of C++ or
Java are NOT the most important part of my activity.)
Maybe you should seek that information in a group where it's
on topic. People acting like adults don't feed the trolls,
just as they don't tap on fish tanks.

It's true that this thread started with something very much like
a troll:_). And that there have been some pretty outrageous
statements. My claim rests that Java is less effective than C++
in large applications. That doesn't mean that it doesn't have a
place, or that it's useless, nor that you can't do any good work
with it---there are specific types of applications where I would
choose it before C++. The fact remains that in large scale
server software (where I'm most active), it was tried and found
less effective than C++.

When I speak of "effective", of course, I'm talking about
programmer productivity. Total productivity, over the entire
life cycle. And in an organization with a good development
process. (Java is probably more tolerant with regards to a bad
process. But given a bad process, neither language gives
adequate quality in the domains I work in.)
 
J

James Kanze

[/QUOTE]
Anyone making a statement that language X is better than Y is
an idiot, no language is always the best. However all
languages have certain characteristics that makes them more
suitable for some purpose than other languages, but in many
cases the best language for a task is the one you know best.

Or that the team knows best.

I'd reformulate your statement to the effect that every language
has some serious drawbacks. The importance of any particular
drawback very much depends on the application domain. And of
course, if the team has been using the language for a while,
they've developed the work-arounds for the specific weaknesses
which cause problems in their domain.

That doesn't mean that there's no point in discussing such
weaknesses.
When discussing the merits of different programming languages
I think that comp.programming might be a good place to start.
Asking in a group for a specific language is a sure way to be
told that the language discussed in the group is the best.

And cross-posting to both a C++ and a Java group is a good way
to start a flame war:).
 
J

James Kanze


[...]
Plus, before Java's bignum was reimplemented in pure Java it
was simply a wrapper for GMP (if I recall correctly), so it's
quite silly for a Java programmer/troll to ask whether that
functionality exists in C++.

The whole argument is a troll. There are advantages to having a
specific functionality as part of the standard (it's always
there, and always in the same fashion), and there are advantages
to having it as a separate third party add on (different
versions can exist, optimized for different uses). And of
course, how much an advantage any particular library is depends
on what you're doing---Java has a definite advantage with its
GUI library, but it doesn't concern my server applications
(which are started by a cronjob, and run without any terminal
attached).

All other things being equal, I'd generally prefer a standard
library. But they never are, so it always depends.

Another discussion can be had concerning what should be part of
the library, and what should be part of the language. Java
suffers with regards to threading, for example, because the
threading is part of the language, and not a third party library
(e.g. Posix).
 
R

Razii

So its obvious there is some problem with the timing functions because
the c++ program is taking longer than 3 sec but less than 28, after
testing both programs via the linux "time" command the average
complete execution time was:

Adding the line dst.close();

before clock_t endt=clock(); for c++ fixes the timing issue.

Thanks to Bo Persson for the hint :)
 
R

Razii

That gets me about 800 ms on my machine. It turns to 5800 ms if I add
a dst.close() before the final clock() call.

the timing is not valid without dst.close(), so it should be

dst.close();
clock_t endt=clock();

In any case, though the speed problem is fixed, this version still
doesn't copy a 800 meg zip file.
 
S

Steve Wampler

Arved said:
Note that I am not discounting the value of keeping up with new versions of
compilers and VMs...feature additions and modifications, bug fixes etc. But
I'd be pretty disappointed if I needed to move up from, say, JDK 1.4 to try
make _this_ Java program competitive.

I suspect that the performance improvements are due more to improvements in the
virtual machine and not so much in language improvements. Think of it this way.
Wouldn't you expect your C++ code to run faster on a real machine that is several
years newer than another? The JVM with JDK1.6 is much improved over older ones.

Someone who cares could probably test this using the 'old' class files with both the
old and new JVMs.
 
R

Razii

The whole argument is a troll.

How can an argument be a troll? A person is a troll, never an
argument. :)
There are advantages to having a
specific functionality as part of the standard (it's always
there, and always in the same fashion), and there are advantages
to having it as a separate third party add on (different
versions can exist, optimized for different uses)

You can have standard as well many other different versions, both.
Another discussion can be had concerning what should be part of
the library, and what should be part of the language. Java
suffers with regards to threading, for example, because the
threading is part of the language, and not a third party library

You make a lot of assertions without explanation. How is threading
being part of language means the language is suffering because of it?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,176
Messages
2,570,950
Members
47,503
Latest member
supremedee

Latest Threads

Top