Java vs C++ speed (IO & Sorting)

L

Lew

There is nothing wrong with my installation. Server VM doesn't ship
with JRE. It only comes with full JDK download.

The following is from the readme file in the Sun JDK:

On Microsoft Windows platforms, the JDK includes both
the Java HotSpot(TM) Server VM and Java HotSpot Client VM.
However, the Java SE Runtime Environment for Microsoft Windows
platforms includes only the Java HotSpot Client VM. Those wishing
to use the Java HotSpot Server VM with the Java SE Runtime
Environment may copy the JDK's jre\bin\server folder to a
bin\server directory in the Java SE Runtime Environment. Software
vendors may redistribute the Java HotSpot Server VM with their
redistributions of the Java SE Runtime Environment.

Nevertheless, you will note that the "server VM" is not in your PATH, as I
mentioned. Your own output shows that. The statement that some sort of
"server VM" has to be in the PATH is false.

Only the JDK version of 'java' itself need be in the PATH. IOW, the JDK
version of 'java' is different from the JRE version (on Windows only!). The
JDK 'java' is equally capable of running as a client VM. The distinction is
not "server" vs. "client" but "JDK version" vs. "JRE version". Again, this is
only a Windows issue. On my Linux box the two versions are identical.

Again, "-server" is an option to the 'java' executable. It is the 'java'
executable that must be in the PATH, in this case the JDK version, not some
separate "server" or "client" version. Even on your box, the same JDK 'java'
version is in your PATH whether you run it with "-server" or "-client".
 
S

Steve Sobol

["Followup-To:" header set to comp.lang.java.programmer.]
Nevertheless, you will note that the "server VM" is not in your PATH, as I
mentioned. Your own output shows that. The statement that some sort of
"server VM" has to be in the PATH is false.

....which brings up a question. What's the diff between the "server" and
"client" VM's?
 
M

Mirek Fidler

I would try that if some Java expert (which I am never claimed to be)
gives me a hint on how to use TreeMap without calling Integer
constructor a zillion times :)

or maybe some other trick is there? let me think...

Hm, I guess this pretty much sums it up. Obviously, as long as you are
doing some simple things mostly implemented in library, Java can be
fast.... But when you are going to do anything serious, you can pretty
fast hit the limits...

Mirek
 
L

Lew

If you use Integer.valueOf() instead, it'll re-use cached values. Some lower
values will be cached anyway.

Meanwhile, go ahead and benchmark Java with the construction. It's a fair
benchmark.
 
S

Steve Wampler

Razii said:
the contents of the String class cannot be changed. any method that
you might expect to modify the contents actually returns a new
instance.

if you don't like this behaviour you might think of sub-classing
String to provide methods that do alter the contents. but this isn't
possible because the class is "final".

Or you can use the StringBuilder class instead of String. You
should get a (very) slight improvement over the existing code by using
StringBuilder instead of a character array.
 
B

Bo Persson

Razii said:
You mean String s;

forloop { s += 'char'; }

It will be way slower because String in java are immutable and with
each loop, a new string object will be created.

So there really ARE cases where Java is slower than C++?

GASP!


Bo Persson
 
R

Razii

Meanwhile, go ahead and benchmark Java with the construction. It's a fair
benchmark.

Or you don't have to use Collection if speed is important. I got some
idea how to this without map. Let me try.
 
A

abubakarm

Man stop writing code like this. You dont know c++. Do this: create a
website called javavscpp.com. Put a heading on the home page "A
challenge to all c++ devs/compilers in the world". Write a description
of the task to be achieved, for example as you have written in the
original post about the text sorting in bible.txt. Write your own java
code there, mention if you prefer what jvm should be used. Leave the
cpp part to the cpp devs. Now announce the prize. Tell people you
would never program in java again if anyone posts a code that is
faster than the java code on the same hardware/os specs. Well I think
no one would give a shit if you never use java again, so i think this
wont work. Announce a $5000 prize for any one who would do the same
task in c++ faster than java. Do that.

..ab

This topic was on these newsgroups 7 years ago :)

http://groups.google.com/group/comp.lang.c++/msg/695ebf877e25b287

I said then: "How about reading the whole Bible, sorting by lines, and
writing the sorted book to a file?"

Who remember that from 7 years ago, one of the longest thread on this
newsgroup :)

The text file used for the bible is hereftp://ftp.cs.princeton.edu/pub/cs126/markov/textfiles/bible.txt

Back to see if anything has changed

(downloaded whatever is latest version from sun.java.com)

Time for reading, sorting, writing: 359 ms (Java)
Time for reading, sorting, writing: 375 ms (Java)
Time for reading, sorting, writing: 375 ms (Java)

Visual C++ express and command I used was cl IOSort.cpp /O2

Time for reading, sorting, writing: 375 ms (c++)
Time for reading, sorting, writing: 390 ms (c++)
Time for reading, sorting, writing: 359 ms (c++)

The question still is (7 years later), where is great speed advantage
you guys were claiming for c++?

------------------- Java Code -------------- (same as 7 years ago :)

import java.io.*;
import java.util.*;
public class IOSort
{
    public static void main(String[] arg) throws Exception
   {
            ArrayList ar = new ArrayList(5000);

            String line = "";

            BufferedReader in = new BufferedReader(
                new FileReader("bible.txt"));
            PrintWriter out  = new PrintWriter(new BufferedWriter(
                new FileWriter("output.txt")));

            long start = System.currentTimeMillis();
            while (true)
            {
                line = in.readLine();
                if (line == null)
                  break;
                if (line.length() == 0)
                 continue;
                ar.add(line);
            }

            Collections.sort(ar);
            int size = ar.size();
            for (int i = 0; i < size; i++)
            {
                out.println(ar.get(i));
            }
            out.close();
            long end = System.currentTimeMillis();
           System.out.println("Time for reading, sorting, writing: "+
(end - start) + " ms");
   }

}

--------- C++ Code ---------------

#include <fstream>
#include<iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <ctime>
using namespace ::std;

int main()
{
   vector<string> buf;
   string linBuf;
   ifstream inFile("bible.txt");
   clock_t start=clock();
   buf.reserve(50000);

   while(getline(inFile,linBuf)) buf.insert(buf.end(), linBuf);
   sort(buf.begin(), buf.end());
   ofstream outFile("output.txt");
   copy(buf.begin(),buf.end(),ostream_iterator<string>(outFile,"\n"));
clock_t endt=clock();
   cout <<"Time for reading, sorting, writing: " << endt-start << "
ms\n";
   return 0;

}
 
R

Razii

Man stop writing code like this. You dont know c++.

You are a fool. I didn't write it. Pete Becker did. I just made a
mistake in copy and paste because he was using windows.h for time.
Do this: create a website called javavscpp.com.

Are you sending me money to pay for the site?
Leave the cpp part to the cpp devs. Now announce the prize.

You are an idiot.
 
S

stan

Razii said:
There is nothing wrong with my installation. Server VM doesn't ship
with JRE. It only comes with full JDK download.

The following is from the readme file in the Sun JDK:

On Microsoft Windows platforms, the JDK includes both
the Java HotSpot(TM) Server VM and Java HotSpot Client VM.
However, the Java SE Runtime Environment for Microsoft Windows
platforms includes only the Java HotSpot Client VM. Those wishing
to use the Java HotSpot Server VM with the Java SE Runtime
Environment may copy the JDK's jre\bin\server folder to a
bin\server directory in the Java SE Runtime Environment. Software
vendors may redistribute the Java HotSpot Server VM with their
redistributions of the Java SE Runtime Environment.

Actually according to Sun when you install the jdk you actually get 2
jre setups. They call one private and install in in Program Files/jdk../jre/bin
while the other is called public and installed in Program Files/jre. In
addition the jre can install a java.exe in the system32 directory. So
the effect is that your path setup can be sensitive. If you put the jdk
after system32 in your path, the public java will be called and it
doesn't know where to find th server jvm.dll. If you put the jdk before
the system32 the the jdk java gets called and it knows where the server
jvm.dll is installed.

I guess Sun has their reasons for such a suboptimal and duplicated
installation, but I'm not sure (nor do I care) what they are. At any
rate I think you can simply copy the server directory from the jdk to
the jre directory and things should work but I don't know what happens
for updates. Or you could simply remove the java.exe in the system32
directory and let the system find the one in the jdk path. Or I guess
you could put the jdk earlier in your path than system32.

As for myself, I do most of my windows stuff from a cygwin rxvt terminal
running bash and I have a alias that calls the jdk java specifically.
 
S

stan

Razii said:
Or you don't have to use Collection if speed is important. I got some
idea how to this without map. Let me try.

Why not just write one, post it to the Jave news group and ask if anyone
has ideas on how to improve it?
 
R

Razii

Or you can use the StringBuilder class instead of String. You
should get a (very) slight improvement over the existing code by using
StringBuilder instead of a character array.

Yes, StringBuilder was faster by 200 ms on my comp.

C:\>java -server -Xmx256m Find zion
Number of zion: 95
Time: 735 ms


----------new version---
import java.util.*;

public class Find{

public static void main(String[] arg)
{
if (arg.length == 0)
{
System.out.println("Enter text on command line to search");
System.exit(-1);
}

final int len = 50000000;
String toSearch = arg[0];
StringBuilder s = new StringBuilder(len);

//create a string with 50 million chars from a-z

long start = System.currentTimeMillis();
randomString(s, len);
long end = System.currentTimeMillis();


int count = 0;
int index = s.indexOf(toSearch);
while (index != -1)
{
index++;
count++;
index = s.indexOf(toSearch, index);
}

System.out.println("Number of " + toSearch + ": " + count);

System.out.println("Time: " + (end - start) + " ms");
}


static void randomString(StringBuilder s, int len)
{
int n;
Random rd = new Random();
for( int i = 0 ; i+3 < len; i+=4 ){
n = rd.nextInt(2147483647);
s.append((char) (n % 26 + 97)); n /= 26;
s.append((char) (n % 26 + 97)); n /= 26;
s.append((char) (n % 26 + 97)); n /= 26;
s.append((char) (n % 26 + 97));
}

}
}
 
R

Razii

Hm, I guess this pretty much sums it up. Obviously, as long as you are
doing some simple things mostly implemented in library, Java can be
fast.... But when you are going to do anything serious, you can pretty
fast hit the limits...

By the way, your benchmark page doesn't say whether the output should
be sorted. It just say count the words. Is sorted output is criteria?
 
M

Mirek Fidler

By the way, your benchmark page doesn't say whether the output should
be sorted. It just say count the words. Is sorted output is criteria?

Well, you are right it is not explicitely said in original benchamark,
but every implemenation produces the sorted output, even if only C++/
STL gets it at no cost.

Means, either it is requirement, or D and U++ version are to be
fixed :)

(Factual note: IMO, sorting requirement would only improve things for
Java...)

Mirek
 
R

Razii

Anyway, Razii, there is a nice benchmark at the end of this page:

http://www.digitalmars.com/d/2.0/cppstrings.html

Maybe you could create and benchmark effective Java implementation. I
would be glad to add such version to the comparison here:

First, time starts in main(). It's a short text file and if you
include VM load time, the test is invalid. The VM load time would be
longer than task itself. If not, use JET compiler so there is no VM
load time.

Also, how are you calling your page "Strings" when this supposed
benchmark spends 90% of the time in I/O reading and writing data to
disk? To reduce I/O factor, time counting ends before the output is
printed. I modified the c++ version so it has internal time counter.
The time will be printed at the end of log.txt file.

Here are text file: "Alice in Wonderland." 160 kb
http://www.gutenberg.org/dirs/etext91/alice30.txt

bible.txt (3 meg)
http://www.cas.mcmaster.ca/~bill/strings/english/bible

And I made bible.txt into 40 meg, that is bible2.txt

c++ version compiled with VC++
cl /O2 /GL wc1.cpp /link /ltcg

C:\>java -server WordCount alice30.txt>log.txt
Time: 266 ms

C:\>java WordCount alice30.txt>log.txt
Time: 78 ms

C:\>wc1 alice30.txt>log2.txt
Time: 31 ms

For a short running program, java -server was much slower than java
client (due to load factor?). c++ version is 2 times faster than Java
client.

C:\>java -server WordCount bible.txt>log.txt
Time: 781 ms
C:\>java WordCount bible.txt>log.txt
Time: 625 ms
C:\>wc1 alice30.bible>log2.txt
Time: 578 ms

Time differences between java and c++ reduced with larger txt file,
bible.txt

C:\>java -server WordCount bible2.txt>log.txt
Time: 5297 ms
C:\>java WordCount bible2.txt>log.txt
Time: 5421 ms
C:\>wc1 alice30.bible2>log2.txt
Time: 5750 ms

C++ loses to both java client and server with 40 meg bible2.txt.

C:\>java -server WordCount alice30.txt bible.txt bible2.txt>log.txt
Time: 5687 ms
C:\>java WordCount alice30.txt bible.txt bible2.txt>log.txt
Time: 6218 ms
C:\>wc1 alice30.txt bible.txt bible2.txt>log2.txt
Time: 6531 ms

When all three files included together at command line, c++ is one sec
slower than java -server!


Bother java and c++ versions are below.

== JAVA ==
Also, posted here in case you can't read it here
http://pastebin.com/f827de83

//counts the words in a text file...
import java.io.*;
import java.util.*;

public class WordCount {

static Map<String, Integer> dictionary =
new HashMap <String, Integer> (14000);
static int tWords = 0;
static int tLines = 0;
static long tBytes = 0;

public static void main(String[] args)
throws Exception {

System.out.println("Lines\tWords\tBytes\tFile\n");

//TIME STARTS HERE
long start = System.currentTimeMillis();

for (int i = 0; i < args.length; i++) {

File file = new File(args);

if (!file.isFile()) {
continue;
}

int numLines = 0;
int numWords = 0;
long numBytes = file.length();
Integer I1 = new Integer(1);

BufferedReader input = new BufferedReader(new
InputStreamReader(new FileInputStream(args),
"ISO-8859-1"));

StreamTokenizer st = new StreamTokenizer(input);
st.ordinaryChar('/'); st.ordinaryChar('.');
st.ordinaryChar('-'); st.ordinaryChar('"');
st.ordinaryChar('\''); st.eolIsSignificant(true);

String s;

while (st.nextToken() != StreamTokenizer.TT_EOF) {

if (st.ttype == StreamTokenizer.TT_EOL) {
numLines++;
}
else if (st.ttype == StreamTokenizer.TT_WORD) {
numWords++;
s = st.sval;

if (dictionary.containsKey(s)) {

Integer ii = dictionary.get(s);
dictionary.put(s, ++ii);
} else {
dictionary.put(s, I1);
}
}
}

System.out.println(
numLines + "\t" + numWords + "\t" + numBytes + "\t" +
args);
tLines += numLines;
tWords += numWords;
tBytes += numBytes;
}

//only converting it to TreepMap so the result appear
//ordered, I could have moved this part
//down to printing phase (i.e. not include it in time).

TreeMap<String, Integer> tp = new TreeMap<String, Integer>
(dictionary);

//TIME ENDS HERE
long end = System.currentTimeMillis();


System.out.println("---------------------------------------");

if (args.length > 1) {
System.out.println(
tLines + "\t" + tWords + "\t" + tBytes + "\tTotal");
System.out.println("---------------------------------------");
}

Iterator it = tp.entrySet().iterator();

while (it.hasNext()) {

Map.Entry pairs = (Map.Entry)it.next();
System.out.println(pairs.getValue() + "\t" + pairs.getKey());
}

System.out.println("Time: " + (end - start) + " ms");
}
}

==C++===
If it doesn't work, try
http://pastebin.com/f6d921545

//Added time...originally by
//Newsgroups: comp.lang.c++.moderated
//From: "Vadim Ferderer" <[email protected]>

#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
#include <cstdio>
#include <map>
#include <ctime>

int main( int argc, char* argv[] )
{
int w_total = 0;
int l_total = 0;
int c_total = 0;
std::map< std::string, int > dictionary;


printf(" lines words bytes file\n" );

//TIME STARTS HERE
clock_t start=clock();

for ( int i = 1; i < argc; ++i )
{
std::ifstream input_file( argv );
std::eek:stringstream buffer;
buffer << input_file.rdbuf();
std::string input( buffer.str() );


int w_cnt = 0;
int l_cnt = 0;
int c_cnt = 0;
bool inword = false;
int wstart = 0;
for ( unsigned int j = 0; j < input.length(); j++ )
{
char c = input[j];
if (c == '\n')
++l_cnt;
if (c >= '0' && c <= '9')
{
}
else if (c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z')
{
if (!inword)
{
wstart = j;
inword = true;
++w_cnt;
}
}
else if (inword)
{
std::string word = input.substr( wstart, j - wstart );
std::map< std::string, int >::iterator it = dictionary.find(
word );
if ( it == dictionary.end() )
dictionary[word] = 1;
else
++it->second;
inword = false;
}
++c_cnt;
}


if (inword)
{
std::string w = input.substr( wstart );
std::map< std::string, int >::iterator it = dictionary.find( w
);
if ( it == dictionary.end() )
dictionary[w] = 1;
else
++it->second;
}



printf("%d\t%d\t%d\t %s\n", l_cnt, w_cnt, c_cnt, argv);
l_total += l_cnt;
w_total += w_cnt;
c_total += c_cnt;
}
//TIME ENDS HERE
clock_t end=clock();

if (argc > 2)
{
printf("--------------------------------------\n%d\t%\d\t%d\t
total",
l_total, w_total, c_total);
}

printf("--------------------------------------\n");
for( std::map< std::string, int >::const_iterator cit =
dictionary.begin(), cend_it = dictionary.end(); cit != cend_it; ++cit
)
printf( "%d %s\n", cit->second, cit->first.c_str() );

int time = int(end-start)/CLOCKS_PER_SEC * 1000;
std::cout <<"Time: " <<
double(end-start)/CLOCKS_PER_SEC * 1000 << " ms\n";

}
 
R

Razii

c++ version compiled with VC++
cl /O2 /GL wc1.cpp /link /ltcg

C:\>java -server WordCount alice30.txt>log.txt
Time: 266 ms
C:\>java WordCount alice30.txt>log.txt
Time: 78 ms
C:\>wc1 alice30.txt>log2.txt
Time: 31 ms
C:\>java -server WordCount bible.txt>log.txt
Time: 781 ms
C:\>java WordCount bible.txt>log.txt
Time: 625 ms
C:\>wc1 alice30.bible>log2.txt
Time: 578 ms
C:\>java -server WordCount bible2.txt>log.txt
Time: 5297 ms
C:\>java WordCount bible2.txt>log.txt
Time: 5421 ms
C:\>wc1 alice30.bible2>log2.txt
Time: 5750 ms
C:\>java -server WordCount alice30.txt bible.txt bible2.txt>log.txt
Time: 5687 ms
C:\>java WordCount alice30.txt bible.txt bible2.txt>log.txt
Time: 6218 ms
C:\>wc1 alice30.txt bible.txt bible2.txt>log2.txt
Time: 6531 ms

Compare these results with Java's JET compiler...
http://www.excelsior-usa.com/jet.html

C:\>WordCount alice30.txt>log.txt
Time: 16 ms

Huh? that's TWO TIMES faster than C++

C:\>WordCount bible.txt>log.txt
Time: 500 ms

C:\>WordCount bible2.txt>log.txt
Time: 4735 ms

C:\>WordCount alice30.txt bible.txt bible2.txt>log.txt
Time: 5453 ms

All are faster than C++ version..

If you want to test this result and don't want JET compiler, I
uploaded my executable WordCount.exe here

http://www.yousendit.com/transfer.php?action=download&ufid=519C866F2E2E2BA7

If that doesn't work, try

http://www.filecrunch.com/file/~3qxt7n

Warning: it's 16 meg zipped since it needs some of JET (library?)
files to be in same directory.
 
R

Razii

See the other post. The Java version compiled with JET for Windows is
two times faster (for alice30.txt) than your C++ version.

Download the WordCount.exe (16 meg since some JET files are needed in
subdirectory for it to run).

http://www.yousendit.com/transfer.php?action=download&ufid=519C866F2E2E2BA7

If that doesn't work, here..

http://www.filecrunch.com/file/~3qxt7n

Oh, never mind. Now I am getting 16ms for both c++ and JET version for
alice30.txt. It's such a small file that results can vary this much
easily on each run.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,176
Messages
2,570,947
Members
47,498
Latest member
log5Sshell/alfa5

Latest Threads

Top