Java vs C++ speed (IO & Sorting)

B

Bo Persson

Razii said:
If you can't take insults, don't insult others.


You are dumber than a stack of stones.

By the way, notice that this guy, Jerry Coffin, didn't post result
with 40x bible since he figured out that -server is faster than c++
version.

Also, interesting to see that none of c++ guru posted any time. I
guess that says it all.

I guess they are just out of insults.


Bo Persson
 
R

Razii

Well, yes and no. I am trying to test every single benchmark I
encounter. If I find deficiency, I am trying to fix it

I wrote a third version that is faster than the second version :)

http://pastebin.com/f691e5e86


For 3 meg file

Time: 578 ms (First version)
Time: 422 ms (Second version)
Time: 360 ms (Third version)

don't use -server with smaller files like 3 meg. Client java is
faster.

Now with 40 meg file (using -server this time)

Time: 4922 ms (First version)
Time: 3422 ms (Second version)
Time: 2797 ms (Third version)

:) :) :) :) :)

VC++
Time: 531 ms (3 meg)
Time: 5296 ms (40)

Now that's slow ...

U++
Time: 78 ms (3 meg)
Time: 828 ms (40 meg)

Now that's really fast :)

Third version below

Also, posted here http://pastebin.com/f691e5e86


-----------------------

//counts the words in a text file...
//combined effort: wlfshmn from #java on IRC Undernet
//and RAZII
import java.io.*;
import java.util.*;
import java.nio.*;
import java.nio.channels.*;
public final class WordCount3
{
private static final Map<String, int[]> dictionary =
new HashMap<String, int[]>(800000);
private static int tWords = 0;
private static int tLines = 0;
private static long tBytes = 0;

public static void main(final String[] args) throws Exception
{
System.out.println("Lines\tWords\tBytes\tFile\n");

//TIME STARTS HERE
final long start = System.currentTimeMillis();
for (String arg : args)
{
File file = new File(arg);
if (!file.isFile())
{
continue;
}

int numLines = 0;
int numWords = 0;
long numBytes = file.length();

ByteBuffer in = new FileInputStream(arg).getChannel().map(
FileChannel.MapMode.READ_ONLY, 0, file.length());

StringBuilder sb = new StringBuilder();
boolean inword = false;
in.rewind();
for (int i = 0; i < numBytes; i++)
{
char c = (char )in.get();

if (c == '\n')
numLines++;
else if (c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z')
{
sb.append(c);
inword = true;
}
else if (inword)
{
numWords++;
int[] count = dictionary.get(sb.toString());
if (count != null)
{ count[0]++;}
else
{dictionary.put(sb.toString(), new int[]{1});}
sb.delete(0, sb.length());
inword = false;
}

}


System.out.println( numLines + "\t" + numWords + "\t" + numBytes +
"\t" + arg);
tLines += numLines;
tWords += numWords;
tBytes += numBytes;
}

//only converting it to TreepMap so the result
//appear ordered, I could have
//moved this part down to printing phase
//(i.e. not include it in time).
TreeMap<String, int[] > sort = new TreeMap<String, int[]>
(dictionary);

//TIME ENDS HERE
final long end = System.currentTimeMillis();

System.out.println("---------------------------------------");
if (args.length > 1)
{
System.out.println(tLines + "\t" + tWords + "\t" + tBytes +
"\tTotal");
System.out.println("---------------------------------------");
}
for (Map.Entry<String, int[]> pairs : sort.entrySet())
{
System.out.println(pairs.getValue()[0] + "\t" + pairs.getKey());
}
System.out.println("Time: " + (end - start) + " ms");
}
}
 
R

Razii

P.S.: I hope you have compiled it as "OPTIMAL" :)

I chose MSC9 Speed from the menu. In any case, I am not sure what can
be faster than this anyway. It's 0ms for alice30.txt !!!
 
R

Razii

Time: 4922 ms (First version)
Time: 3422 ms (Second version)
Time: 2797 ms (Third version)


Actually, I forgot to use -server.. third version with - server is :

Time: 2306 ms
 
M

Mirek Fidler

I chose MSC9 Speed from the menu. In any case, I am not sure what can
be faster than this anyway. It's 0ms for alice30.txt !!!

Who knows. Two years ago I was pretty sure U++ Core is as fast as
possible. Since then, we have found ways how to optimize it to be
about twice as fast ;)

Mirek
 
R

Razii

Who knows. Two years ago I was pretty sure U++ Core is as fast as
possible. Since then, we have found ways how to optimize it to be
about twice as fast ;)

Are you on linux, windows? Can you do the benchmark with 40 meg text
file on your machine with D (move the sort up) vs the third version of
Java (WordCount3) that I posted here http://pastebin.com/f691e5e86

Just add the internal time counter with D like I did with U++.
 
M

Mirek Fidler

Are you on linux, windows? Can you do the benchmark with 40 meg

Both :) But if you want to benchmark state-of-the-art C++/U++
performace, GCC recently got quite better than VC++ optimizing the
code. Also, GCC standard library seems to be better than VC++ version
as well now.

text
file on your machine with D (move the sort up) vs the third version of
Java (WordCount3) that I posted herehttp://pastebin.com/f691e5e86

Just add the internal time counter with D like I did with U++.

Sorry, right now I am a little bit short of time. I am looking forward
to play with all this during next week.

Mirek
 
J

Jerry Coffin

[ ... ]
Many products do give customers a choice to install the correct JVM
distribution. For example, the Glassfish server offers a variety of bundles
with and without the JDK.

<http://java.sun.com/javaee/downloads/index.jsp>

It's not complex, because it uses an installer. You're right, one doesn't
expect the customer to handle moving around auxiliary files; one automates
that part of it.

While there may be situations that would justify this, for trivial
utilities like counting words in a file, it sounds quite ridiculous (at
least to me).
 
R

Razii

I think the reason U++ is fast is probably due to algorithm used in
hashing strings in VectorMap. How VectorMap works? If I figure that
out, I bet I can match the speed :)
 
S

Sherman Pendley

I'm studying to update my C++ skills from long ago. Turbo Pascal & C++ were
actually my first OOP languages, but that was pre-standard and pre-STL days
in the early 90s. Yeah, I have the grays to prove it. :)

I've been using C variants all along though, so the general syntax is still
hard-wired into my fingers. I don't need a "c++ for dummies" introduction,
just a refresher course.

I'm a fan of O'Reilly. Any opinions on their "Practical C++ Programming,"
"C++ In a Nutshell," or "C++ Cookbook"? In general, I've found that I can
skip their "head first ..." or "learning ..." books, and go straight to
the "programming ..." books.

sherm--
 
J

Jerry Coffin

I'm studying to update my C++ skills from long ago. Turbo Pascal & C++ were
actually my first OOP languages, but that was pre-standard and pre-STL days
in the early 90s. Yeah, I have the grays to prove it. :)

I've been using C variants all along though, so the general syntax is still
hard-wired into my fingers. I don't need a "c++ for dummies" introduction,
just a refresher course.

For that, _Accelerated C++_ would probably work quite nicely. At some
point, you might want to look at _Exceptional C++_ and _More Exceptional
C++_ as well. It's probably also worth spending a bit of time with
_Effective C++_ and _More Effective C++_. _Accelerated C++_ would
definitely be the first one to study though.
I'm a fan of O'Reilly. Any opinions on their "Practical C++ Programming,"
"C++ In a Nutshell," or "C++ Cookbook"? In general, I've found that I can
skip their "head first ..." or "learning ..." books, and go straight to
the "programming ..." books.

_C++ in a Nutshell_ is almost a pure reference book, not really a course
in C++ (refresher or otherwise). I haven't looked at _C++ Cookbook_, so
I can't really comment on it.
 
M

Mirek Fidler

I think the reason U++ is fast is probably due to algorithm used in
hashing strings in VectorMap. How VectorMap works? If I figure that
out, I bet I can match the speed :)

Well, you are of course quite right.

Anyway, I do not think you can repeat this in Java - it simply lacks
required low-level facilities.

In any case, VectorMap, String and hashing is a long story to
explain... but if you are really interested, it is open-source after
all. And you have working debugger :)

BTW, also notice how simple, almost "naive approach" U++ code is. And
how complex is your Java (although, of course, to me U++ looks like
the most natural thing in the world and Java is quite unfamiliar).

Why would I want to use Java if I can do my job in U++ a lot faster,
and the result will be a lot faster too? :)

Mirek
 
R

Razii

BTW, also notice how simple, almost "naive approach" U++ code is. And
how complex is your Java (although, of course, to me U++ looks like
the most natural thing in the world and Java is quite unfamiliar).

I think your library is designed specifically for this benchmark
(i.e., looking for words in a file). How about if I change the
criteria, that instead of finding words, find quotes. i.e all words
and sentences within " and '? In any case, in some situations, the
java version would look simpler to understand and read than U++
(probably in an application that has threading, network and/or GUI).
My first two versions were as simple to read as wc2.cpp on D page. In
the third version , I used nio and couldn't (for now) find a simple
way to make it work easily with StreamTokenizer.
 
R

Razii

Anyway, I do not think you can repeat this in Java - it simply lacks
required low-level facilities.

Bug fix report.

I just changed one line in version 3 and it's twice faster :)
http://www.pastebin.ca/964045

In fact with 6 args at command line (each file is 40 meg), Java
-server gets close to U++ :)

Have a look

C:\>WCUPP bible2.txt bible2.txt bible2.txt bible2.txt bible2.txt
bible2.txt

Time: 5046 ms

C:\>java -server WordCount3 bible2.txt bible2.txt bible2.txt
bible2.txt bible2.txt bible2.txt

Time: 6828 ms

Ah, only 1.8 sec difference :) Comparing to my previous versions..

Time: 625 ms (version 1) (3 meg)
Time: 187 ms (version 3 with the fix) (3 meg)

40 meg file (java -server)
Time: 5297 ms (version 1)
Time: 1265 ms (version 3 with the fix)

1265 is not too behind U++ ( 843 ms ). You should be worried of the
4th version :)

Visual C++ still at (Time: 5546 ms ) for 40 meg

The Updated version

-------------
http://www.pastebin.ca/964045

//counts the words in a text file...
//combined effort: wlfshmn from #java on IRC Undernet
//and RAZII
import java.io.*;
import java.util.*;
import java.nio.*;
import java.nio.channels.*;
public final class WordCount3
{
private static final Map<String, int[]> dictionary =
new HashMap<String, int[]>(16000);
private static int tWords = 0;
private static int tLines = 0;
private static long tBytes = 0;

public static void main(final String[] args) throws Exception
{
System.out.println("Lines\tWords\tBytes\tFile\n");

//TIME STARTS HERE
final long start = System.currentTimeMillis();
for (String arg : args)
{
File file = new File(arg);
if (!file.isFile())
{
continue;
}

int numLines = 0;
int numWords = 0;
long numBytes = file.length();

ByteBuffer in = new FileInputStream(arg).getChannel().map(
FileChannel.MapMode.READ_ONLY, 0, numBytes);

StringBuilder sb = new StringBuilder();
boolean inword = false;
in.rewind();
for (int i = 0; i < numBytes; i= i +2)
{
char c = (char) in.get();
if (c == '\n')
numLines++;
else if (c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z')
{
sb.append(c);
inword = true;
}
else if (inword)
{
numWords++;
int[] count = dictionary.get(sb.toString());
if (count != null)
{ count[0]++;}
else
{dictionary.put(sb.toString(), new int[]{1});}
sb.delete(0, sb.length());
inword = false;
}

}


System.out.println( numLines + "\t" + numWords + "\t" + numBytes +
"\t" + arg);
tLines += numLines;
tWords += numWords;
tBytes += numBytes;
}

//only converting it to TreepMap so the result
//appear ordered, I could have
//moved this part down to printing phase
//(i.e. not include it in time).
TreeMap<String, int[] > sort = new TreeMap<String, int[]>
(dictionary);

//TIME ENDS HERE
final long end = System.currentTimeMillis();

System.out.println("---------------------------------------");
if (args.length > 1)
{
System.out.println(tLines + "\t" + tWords + "\t" + tBytes +
"\tTotal");
System.out.println("---------------------------------------");
}
for (Map.Entry<String, int[]> pairs : sort.entrySet())
{
System.out.println(pairs.getValue()[0] + "\t" + pairs.getKey());
}
System.out.println("Time: " + (end - start) + " ms");
}
}
 
R

Razii

Well, I am really disappointed with C++ people and especially VC+++. I
fixed a minor bug in version three and it's now two time faster :)

Here is what I have now

3 meg file
Time: 625 ms (My version 1) (3 meg)
Time: 187 ms (My version 3 with the fix) (3 meg)

40 meg file (and java -server)
Time: 5297 ms (my version 1)
Time: 1265 ms (my version 3 with the fix)

What about C++ with standard library and VC++?

Time: 531 ms (3 meg)
Time: 5546 ms (for 40 meg)

Am I to believe that C++ with standard library is 4 TIMES SLOWER?

C++ IS FOUR TIMES SLOWER THAN JAVA WITH standard library?

This is really disappointing. I had high hopes.

The version 3 with bug fix is here
---------------

Also, posted here http://www.pastebin.ca/964045

//counts the words in a text file...
//combined effort: wlfshmn from #java on IRC Undernet
//and RAZII
import java.io.*;
import java.util.*;
import java.nio.*;
import java.nio.channels.*;
public final class WordCount3
{
private static final Map<String, int[]> dictionary =
new HashMap<String, int[]>(16000);
private static int tWords = 0;
private static int tLines = 0;
private static long tBytes = 0;

public static void main(final String[] args) throws Exception
{
System.out.println("Lines\tWords\tBytes\tFile\n");

//TIME STARTS HERE
final long start = System.currentTimeMillis();
for (String arg : args)
{
File file = new File(arg);
if (!file.isFile())
{
continue;
}

int numLines = 0;
int numWords = 0;
long numBytes = file.length();

ByteBuffer in = new FileInputStream(arg).getChannel().map(
FileChannel.MapMode.READ_ONLY, 0, numBytes);

StringBuilder sb = new StringBuilder();
boolean inword = false;
in.rewind();
for (int i = 0; i < numBytes; i= i +2)
{
char c = (char) in.get();
if (c == '\n')
numLines++;
else if (c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z')
{
sb.append(c);
inword = true;
}
else if (inword)
{
numWords++;
int[] count = dictionary.get(sb.toString());
if (count != null)
{ count[0]++;}
else
{dictionary.put(sb.toString(), new int[]{1});}
sb.delete(0, sb.length());
inword = false;
}

}


System.out.println( numLines + "\t" + numWords + "\t" + numBytes +
"\t" + arg);
tLines += numLines;
tWords += numWords;
tBytes += numBytes;
}

//only converting it to TreepMap so the result
//appear ordered, I could have
//moved this part down to printing phase
//(i.e. not include it in time).
TreeMap<String, int[] > sort = new TreeMap<String, int[]>
(dictionary);

//TIME ENDS HERE
final long end = System.currentTimeMillis();

System.out.println("---------------------------------------");
if (args.length > 1)
{
System.out.println(tLines + "\t" + tWords + "\t" + tBytes +
"\tTotal");
System.out.println("---------------------------------------");
}
for (Map.Entry<String, int[]> pairs : sort.entrySet())
{
System.out.println(pairs.getValue()[0] + "\t" + pairs.getKey());
}
System.out.println("Time: " + (end - start) + " ms");
}
}
 
R

Razii

Well, I am really disappointed with C++ people and especially VC+++. I
fixed a minor bug in version three and it's now two time faster :)

I am disappointed alright but totally ignore the last post. There was
no bug fixes :) The version was fake one.
 
B

Bo Persson

Razii said:
Well, I am really disappointed with C++ people and especially
VC+++. I fixed a minor bug in version three and it's now two time
faster :)

Here is what I have now

3 meg file
Time: 625 ms (My version 1) (3 meg)
Time: 187 ms (My version 3 with the fix) (3 meg)

40 meg file (and java -server)
Time: 5297 ms (my version 1)
Time: 1265 ms (my version 3 with the fix)

What about C++ with standard library and VC++?

Time: 531 ms (3 meg)
Time: 5546 ms (for 40 meg)

Am I to believe that C++ with standard library is 4 TIMES SLOWER?

C++ IS FOUR TIMES SLOWER THAN JAVA WITH standard library?

This is really disappointing. I had high hopes.

The version 3 with bug fix is here
---------------

Also, posted here http://www.pastebin.ca/964045

//counts the words in a text file...
//combined effort: wlfshmn from #java on IRC Undernet
//and RAZII

ByteBuffer in = new FileInputStream(arg).getChannel().map(
FileChannel.MapMode.READ_ONLY, 0, numBytes);

Ok, if I get this you are using a memory mapped file in your Java
version.
std::ifstream input_file( argv );
std::eek:stringstream buffer;
buffer << input_file.rdbuf();
std::string input( buffer.str() );


While in the C++ version you use a high level operator<< to do a
bytewise copy from one stream to another, then copy the result to a
std::string.

So a memory mapped file is faster than copying the file content
multiple times. Surprise!


I thought we had already decided that when processing large files, the
I/O times dominate the test, and suppling a proper buffering is
important. Redundant copying certainly is not!


Bo Persson
 
R

Razii

Ok, if I get this you are using a memory mapped file in your Java
version.

Well, I said ignore that version since it was missing words (due to a
stupid typo). This is the one that is working

http://pastebin.com/d48680a60

Time for 40 meg file

Time: 2281 ms

for C++..

C:\>wc1 bible2.txt
Time: 5421 ms

In fact

C:\>java -server WordCount3 bible2.txt bible2.txt
Time: 4344 ms

even with two bible2.txt, it's still faster than c++ with one
bible2.txt
Ok, if I get this you are using a memory mapped file in your Java
version.

Well then change the following in C++ version to mapping the file...
std::ifstream input_file( argv );
std::eek:stringstream buffer;
buffer << input_file.rdbuf();
std::string input( buffer.str() );
 
L

ldv

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,176
Messages
2,570,949
Members
47,500
Latest member
ArianneJsb

Latest Threads

Top