Java vs C++ speed (IO & Sorting)

R

Razii

You need to fix your D program. It's doing the sorting at the output
time. How were you benchmarking without the output when c++ is already
sorted in map and you are sorting D at output time?

Can you add the internal time counter in D, move sorting above before
the time counting ends and post that version here? I want to test it
myself.
 
M

Mirek Fidler

You need to fix your D program. It's doing the sorting at the output
time. How were you benchmarking without the output when c++ is already
sorted in map and you are sorting D at output time?

Also, the benchmark is invalid with such a small text file. How can
you say it's benchmark when time is 16ms in one run and 32 ms in the
next?

Use bible.txt (3 meg) and bible2.txt (40 meg, 10x bible)

Eh, first things first: I am not related with D, it is not "my D
program".

In fact, I was benchmarking it to compare it with U++ implementation
(which I am related to). In that process, I have just commented that
part out in D code (in C++ - U++, there is a macro NOOUTPUT).

And yes, I was benchmarking with bigger file, in fact about 2 meg one,
for the exact reason you suggest. Details are in second link I have
posted:

http://www.ultimatepp.org/www$uppweb$vsd$en-us.html

FYI, my overal position is that C++ core language is fast as hell, but
the problem is standard library - it is a crap. Both the design is
grossly outdated and average implementation is not very well
optimized.

I will show you numbers soon :)

Mirek
 
M

Mirek Fidler

You need to fix your D program. It's doing the sorting at the output
time. How were you benchmarking without the output when c++ is already
sorted in map and you are sorting D at output time?

Ah, now rereading this detail... Well, U++ version is no using map, so
it is not sorted; but I have kept sorting because it is not a part of
output, should be counted in to processing.

But now thinking about it, you are right that I perhaps commented out
sorting in D version with output part, this handicaping U++ version :)
Have to check, this could affect numbers a bit more...

Mirek
 
R

Razii

And yes, I was benchmarking with bigger file, in fact about 2 meg one,
for the exact reason you suggest. Details are in second link I have
posted:

Even 2 meg is too small for a Java -server. In such a small running
program, JIT will always be slower than compiled file.

(3 meg file and java -server)
Time: 704 ms

(3 meg file with Java compiled with JET)
Time: 500 ms

However, with 40 meg file the times are same

Time: 4765 ms (JET Java)
Time: 4750 ms (java -server)
 
M

Mirek Fidler

Even 2 meg is too small for a Java -server. In such a small running
program, JIT will always be slower than compiled file.

(3 meg file and java -server)
Time: 704 ms

(3 meg file with Java compiled with JET)
Time: 500 ms

~700ms was good enough to test it against D.
However, with 40 meg file the times are same

Time: 4765 ms (JET Java)
Time: 4750 ms (java -server)

NP, I can test with 40 megs too...

Mirek
 
S

stan

Razii said:
Exactly, and that's what I said and suggested.

Yes, only you didn't appear to know why. I was trying to help the other
poster understand why the path might be a factor.

Out of curiosity, do you ever NOT crosspost?
 
S

stan

Lew said:
If you've been following this thread, you'll have found out that you need to
install some auxiliary JDK files into your Java installation, or simply use
the JDK version, which already has those auxiliary files installed.

This is, AFAIK, only true for Windows. It's not true for my Linux system.

In windows, the runtime environment typically drops a copy of java.exe
in the system32 directory ( to make it easier for malware to find it )
and the runtime environment typically doesn't install the server dll.
The development kit ( seperate installation for both win,linux) installs
the server component, but in a seperate directory. Most win c++ people
don't particularly want java in the path ahead of windows and it causes
this issue. In linux most user stuff is in /usr/bin so both the runtime
and development kit install to the same place and there typically no
issue.
 
R

Razii

Most win c++ people
don't particularly want java in the path ahead of windows and it causes
this issue

I have explained this to Lew for how many days now? The guy keeps
posting nonense. There is no issue here. java.exe in JDK/bin (not in
System32) must run for -server to work. If you don't want to change
the path, do something like this:

C:\>"Program Files\Java\SDK\jdk\bin\java.exe" -server WordCount
 
R

red floyd

Razii said:
I have explained this to Lew for how many days now? The guy keeps
posting nonense. There is no issue here. java.exe in JDK/bin (not in
System32) must run for -server to work. If you don't want to change
the path, do something like this:

C:\>"Program Files\Java\SDK\jdk\bin\java.exe" -server WordCount

And why are we discussing how to set up java and the Windows PATH
environment variable here?
 
R

Razii

And why are we discussing how to set up java and the Windows PATH
environment variable here?

Huh? What is the topic of the thread? Why are you reading a thread
about java and c++ benchmark if you have no interest in it? Right
click on the thread and click ignore.
 
R

Razii

In fact, I was benchmarking it to compare it with U++ implementation
(which I am related to)

Post your U++ version here, with time statement included. Also,
explain exactly how will I make it work on my comp running XP.

And what's the delay? I thought you were all ready.
 
R

Razii

I have a new java verion that is much faster than previous verion!
Try this verion for benchmarking..

My old verion with 40 meg file

C:\>java -server WordCount bible2.txt>log.txt
Time: 4797 ms

My new version with 40 meg file

C:\>java -server WordCount2 bible2.txt>log.txt
Time: 3125 ms

:) :) :)

The C++ verion with 40 meg bible2.txt

C:\>wc1 bible2.txt>log.txt
Time: 5390 ms

Pardon me while I laugh :))

Ha ha ha ha ha

The new verion below


-----
Also, if the folliwng doesn't work
source can be found here too
http://www.pastebin.ca/963017


//counts the words in a text file...
//combined effort: wlfshmn from #java on IRC
//Undernet and Razii

import java.io.*;
import java.util.*;

public final class WordCount2
{
private static final Map<String, int[]> dictionary =
new HashMap<String, int[]>(800000);
private static int tWords = 0;
private static int tLines = 0;
private static long tBytes = 0;

public static void main(final String[] args) throws Exception
{
System.out.println("Lines\tWords\tBytes\tFile\n");

//TIME STARTS HERE final
long start = System.currentTimeMillis();
for (String arg : args)
{
File file = new File(arg);
if (!file.isFile())
{
continue;
}
int numLines = 0;
int numWords = 0;
long numBytes = file.length();
BufferedReader input = new BufferedReader(new
InputStreamReader(new FileInputStream(arg),
"ISO-8859-1"));
StreamTokenizer st = new StreamTokenizer(input);
st.ordinaryChar('/'); st.ordinaryChar('.');
st.ordinaryChar('-'); st.ordinaryChar('"');
st.ordinaryChar('\''); st.eolIsSignificant(true);

while (st.nextToken() != StreamTokenizer.TT_EOF)
{
if (st.ttype == StreamTokenizer.TT_EOL)
{
numLines++;
}
else if (st.ttype == StreamTokenizer.TT_WORD)
{
numWords++;
int[] count = dictionary.get(st.sval);
if (count != null)
{ count[0]++;}
else
{ dictionary.put(st.sval, new int[]{1});}
}
}
System.out.println( numLines + "\t" + numWords + "\t" + numBytes +
"\t" + arg);
tLines += numLines;
tWords += numWords;
tBytes += numBytes;
}

//only converting it to TreepMap so the result
//appear ordered, I could have
//moved this part down to printing phase
//(i.e. not include it in time).
TreeMap<String, int[] > sort = new TreeMap<String, int[]>
(dictionary);

//TIME ENDS HERE final
long end = System.currentTimeMillis();

System.out.println("---------------------------------------");
if (args.length > 1)
{
System.out.println(tLines + "\t" + tWords + "\t" + tBytes +
"\tTotal");
System.out.println("---------------------------------------");
}
for (Map.Entry<String, int[]> pairs : sort.entrySet())
{
System.out.println(pairs.getValue()[0] + "\t" + pairs.getKey());
}
System.out.println("Time: " + (end - start) + " ms");
}
}
 
R

Razii

~700ms was good enough to test it against D.

Well, ignore the old verions!!!

I have a new java verion that is much faster than previous verion!

My old verion with 40 meg file

C:\>java -server WordCount bible2.txt>log.txt
Time: 4797 ms

My new version with 40 meg file

C:\>java -server WordCount2 bible2.txt>log.txt
Time: 3125 ms

:) :) :)

The C++ verion with 40 meg bible2.txt

C:\>wc1 bible2.txt>log.txt
Time: 5390 ms

Pardon me while I laugh :))

Ha ha ha ha ha

The new verion below


-----
Also, if the folliwng doesn't work
source can be found here too
http://www.pastebin.ca/963017


//counts the words in a text file...
//combined effort: wlfshmn from #java on IRC
//Undernet and Razii

import java.io.*;
import java.util.*;

public final class WordCount2
{
private static final Map<String, int[]> dictionary =
new HashMap<String, int[]>(800000);
private static int tWords = 0;
private static int tLines = 0;
private static long tBytes = 0;

public static void main(final String[] args) throws Exception
{
System.out.println("Lines\tWords\tBytes\tFile\n");

//TIME STARTS HERE final
long start = System.currentTimeMillis();
for (String arg : args)
{
File file = new File(arg);
if (!file.isFile())
{
continue;
}
int numLines = 0;
int numWords = 0;
long numBytes = file.length();
BufferedReader input = new BufferedReader(new
InputStreamReader(new FileInputStream(arg),
"ISO-8859-1"));
StreamTokenizer st = new StreamTokenizer(input);
st.ordinaryChar('/'); st.ordinaryChar('.');
st.ordinaryChar('-'); st.ordinaryChar('"');
st.ordinaryChar('\''); st.eolIsSignificant(true);

while (st.nextToken() != StreamTokenizer.TT_EOF)
{
if (st.ttype == StreamTokenizer.TT_EOL)
{
numLines++;
}
else if (st.ttype == StreamTokenizer.TT_WORD)
{
numWords++;
int[] count = dictionary.get(st.sval);
if (count != null)
{ count[0]++;}
else
{ dictionary.put(st.sval, new int[]{1});}
}
}
System.out.println( numLines + "\t" + numWords + "\t" + numBytes +
"\t" + arg);
tLines += numLines;
tWords += numWords;
tBytes += numBytes;
}

//only converting it to TreepMap so the result
//appear ordered, I could have
//moved this part down to printing phase
//(i.e. not include it in time).
TreeMap<String, int[] > sort = new TreeMap<String, int[]>
(dictionary);

//TIME ENDS HERE final
long end = System.currentTimeMillis();

System.out.println("---------------------------------------");
if (args.length > 1)
{
System.out.println(tLines + "\t" + tWords + "\t" + tBytes +
"\tTotal");
System.out.println("---------------------------------------");
}
for (Map.Entry<String, int[]> pairs : sort.entrySet())
{
System.out.println(pairs.getValue()[0] + "\t" + pairs.getKey());
}
System.out.println("Time: " + (end - start) + " ms");
}
}
 
S

stan

red said:
And why are we discussing how to set up java and the Windows PATH
environment variable here?

Because our vey own official c++ troll has a couple of c++'ers actually
trying his meaningless attempts at benchmarking. Along the way our
troll can't understand that what may be perfectly reasonable for a java
progrmmer can be suboptimal or dangerous for people who aren't java
programmers.

Sadly some have kept the troll well fed lately and he has proven to be
incapable of seeing how rude and immature trolling makes you appear to
adults.
 
M

Mirek Fidler

Post your U++ version here, with time statement included. Also,
explain exactly how will I make it work on my comp running XP.

And what's the delay? I thought you were all ready.

U++ version is in the second link I have posted.

If you download U++ 2008.1beta2, the benchmarking code is part of
standard distribution. I recommend downloading .deb version and
testing with Ubuntu...

Mirek
 
R

Razii

Sadly some have kept the troll well fed lately and he has proven to be
incapable of seeing how rude and immature trolling makes you appear to
adults.

By the way, what's you USCF rating?
 
R

Razii

If you download U++ 2008.1beta2, the benchmarking code is part of
standard distribution. I recommend downloading .deb version and
testing with Ubuntu...

I already tried it. I give up. That thing was damn fast. Maybe you
have especially optimized U++ for this kind of benchmark? In any case,
if you do post the java benchmark, make sure it's version 2 that I
posted, not the old version. The link is here.

http://www.pastebin.ca/963177

That link won't expire.

C:\>WCUPP bible2.txt
Time: 843 ms

Gosh! that's fast. Just for fun, I tried this:

C:\>WCUPP bible2.txt bible2.txt bible2.txt bible2.txt bible2.txt
bible2.txt

Time: 4937 ms

C:\>java -server WordCount2 bible2.txt bible2.txt bible2.txt
bible2.txt bible2.txt

Time: 14375 ms

U++ is almost 3 times faster!

At least when compared to VC++

C:\>wc1 bible2.txt bible2.txt bible2.txt bible2.txt bible2.txt

Time: 28515 ms

My version is twice faster :) I am waiting for C++ gurus to show me
C++ version that uses standard library and is faster than my version.
 
R

Razii

C:\>WCUPP bible2.txt
Time: 843 ms

Gosh! that's fast. Just for fun, I tried this:

C:\>WCUPP bible2.txt bible2.txt bible2.txt bible2.txt bible2.txt
bible2.txt

Time: 4937 ms

C:\>java -server WordCount2 bible2.txt bible2.txt bible2.txt
bible2.txt bible2.txt

Time: 14375 ms

U++ is almost 3 times faster!

At least when compared to VC++

C:\>wc1 bible2.txt bible2.txt bible2.txt bible2.txt bible2.txt

Time: 28515 ms

My version is twice faster :) I am waiting for C++ gurus to show me
C++ version that uses standard library and is faster than my version.

And I am on Windows and was using this. only thing I changed was added
internal timer

--------
#include <Core/Core.h>

using namespace Upp;

#include <ctime>

int main(int argc, const char *argv[])

{

int n;

VectorMap<String, int> map;

Cout() << " lines words bytes file\n";

//TIME STARTS HERE
clock_t start=clock();

int total_lines = 0;

int total_words = 0;

int total_bytes = 0;

for(int i = 1; i < argc; i++) {

String f = LoadFile(argv);

int lines = 0;

int words = 0;

const char *q = f;

for(;;) {

int c = *q;

if(IsAlpha(c)) {

const char *b = q++;

while(IsAlNum(*q)) q++;

map.GetAdd(String(b, q), 0)++;

words++;

}

else {

if(!c) break;

if(c == '\n')

++lines;

q++;

}

}

Cout() << Format("%8d%8d%8d %s\n", lines, words, f.GetCount(),
argv);

total_lines += lines;

total_words += words;

total_bytes += f.GetCount();

}

Vector<int> order = GetSortOrder(map.GetKeys());

//TIME ENDS HERE
clock_t end=clock();

Cout() << Format("--------------------------------------%8d%8d%8d
total\n", total_lines, total_words, total_bytes);


for(int i = 0; i < order.GetCount(); i++)

Cout() << map.GetKey(order) << ": " << map[order] <<
'\n';

Cout()<<"Time: " <<
double(end-start)/CLOCKS_PER_SEC * 1000 << " ms\n";

return 0;

}
 
M

Mirek Fidler

I already tried it. I give up. That thing was damn fast. Maybe you

Hehe :) Thanks.
have especially optimized U++ for this kind of benchmark? In any case,

Well, yes and no. I am trying to test every single benchmark I
encounter. If I find deficiency, I am trying to fix it. This "wc"
benchmark helped me find some problems in String implementation (about
1 year ago). But of course, such optimization is not benchmark
specific...

FYI, I have tested the original "sort the bible" too and it ended just
like this - I have found a problem in U++'s Stream::GetLine method, so
I have fixed it. But again, such fix will benefit all U++ code as
well...
My version is twice faster :) I am waiting for C++ gurus to show me
C++ version that uses standard library and is faster than my version.

IMO, you will not get one anytime soon. C++ standard library is flawed
in many respects.

But as you see, if C++ core language is used properly, you cannot
touch it by a long shot...

Mirek

P.S.: I hope you have compiled it as "OPTIMAL" :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
474,176
Messages
2,570,947
Members
47,498
Latest member
log5Sshell/alfa5

Latest Threads

Top