Passing a Method Name to a Method, Redux

G

Gene Wirchenko

Dear Java'ers:

I have completed my benchmarking. The code is below. Note that
the difference in the bodies between ParseSequentialSearch(),
ParseBinarySearch(), and ParseTreesetSearch() is but one line. I
really would have preferred not having to duplicate the code.

Oddly, the timings have a LOT of noise in them. In some runs, a
sequential search has out-performed a binary search. Occasionally, a
sequential search has beaten both a binary search and a Treeset
search. The times for sequential searching are only a bit worse than
for binary searching. Treeset searching is about 20% faster. Any
explanations?

I had to kludge this:
cIdent=""+CurrChar;
cIdent is a String, CurrChar is a char.
cIdent=CurrChar;
does not compile.

So how would you have written this benchmark?

***** Start of Code *****
// TimingTesting
// Timing Testing of Character Searching
// Last Modification: 2011-06-23



import java.util.*;



class TimingTesting
{

static String cParseString=
"//identifier//IDENTIFIER//a_b_c abc123
4b5%$__dbl;one;two;three;END";

static String IdentChars=
"0123456789"+
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"+
"_"+
"abcdefghijklmnopqrstuvwxyz"; // sorted order!

static SortedSet<Character> IdentCharsSet=new TreeSet<Character>();

static int nRepetitions=1000000;



static boolean SequentialSearch
(
char CurrChar
)
{
boolean fFound=false;
for (int i=0; i<IdentChars.length() && !fFound; i++)
fFound=IdentChars.charAt(i)==CurrChar;
return fFound;
}



static boolean BinarySearch
(
char CurrChar
)
{
int xLow=0;
int xHigh=IdentChars.length()-1;
int xTry;
boolean fFound=false;
while (xLow<=xHigh)
{
xTry=(xLow+xHigh)/2;
if (CurrChar==IdentChars.charAt(xTry))
return true;
if (CurrChar<IdentChars.charAt(xTry))
xHigh=xTry-1;
else
xLow=xTry+1;
}
return false;
}



static boolean TreesetSearch
(
char CurrChar
)
{
return IdentCharsSet.contains(CurrChar);
}



static void ParseSequentialSearch()
{
int xScan=0;
boolean fBuildingIdent=false;
boolean fInIdentChars;
String cIdent=""; // fussy init
while (xScan<cParseString.length())
{
char CurrChar=cParseString.charAt(xScan);
fInIdentChars=SequentialSearch(CurrChar);
if (SequentialSearch(CurrChar)) // different code
if (fBuildingIdent)
cIdent+=CurrChar;
else
{
fBuildingIdent=true;
cIdent=""+CurrChar;
}
else
if (fBuildingIdent)
{
fBuildingIdent=false;
if (nRepetitions==1)
System.out.println(cIdent);
}
else
{}
xScan++;
}
if (fBuildingIdent)
if (nRepetitions==1)
System.out.println(cIdent);
}



static void ParseBinarySearch()
{
int xScan=0;
boolean fBuildingIdent=false;
boolean fInIdentChars;
String cIdent=""; // fussy init
while (xScan<cParseString.length())
{
char CurrChar=cParseString.charAt(xScan);
fInIdentChars=SequentialSearch(CurrChar);
if (SequentialSearch(CurrChar)) // different code
if (fBuildingIdent)
cIdent+=CurrChar;
else
{
fBuildingIdent=true;
cIdent=""+CurrChar;
}
else
if (fBuildingIdent)
{
fBuildingIdent=false;
if (nRepetitions==1)
System.out.println(cIdent);
}
else
{}
xScan++;
}
if (fBuildingIdent)
if (nRepetitions==1)
System.out.println(cIdent);
}



static void ParseTreesetSearch()
{
int xScan=0;
boolean fBuildingIdent=false;
boolean fInIdentChars;
String cIdent=""; // fussy init
while (xScan<cParseString.length())
{
char CurrChar=cParseString.charAt(xScan);
fInIdentChars=SequentialSearch(CurrChar);
if (TreesetSearch(CurrChar)) // different code
if (fBuildingIdent)
cIdent+=CurrChar;
else
{
fBuildingIdent=true;
cIdent=""+CurrChar;
}
else
if (fBuildingIdent)
{
fBuildingIdent=false;
if (nRepetitions==1)
System.out.println(cIdent);
}
else
{}
xScan++;
}
if (fBuildingIdent)
if (nRepetitions==1)
System.out.println(cIdent);
}



public static void main(String[] args)
{
int i;

long StartTime;
long EndTime;
long Duration;

System.out.println("Timing Testing of Character Searching");
System.out.println();

// Initialise Set.
for (i=0; i<IdentChars.length(); i++)
IdentCharsSet.add(IdentChars.charAt(i));

// Character Sequential
System.out.print("Character Sequential Search");
StartTime=System.nanoTime();
for (i=1; i<=nRepetitions; i++)
ParseSequentialSearch();
EndTime=System.nanoTime();
Duration=EndTime-StartTime;
System.out.println(" Duration="+Duration);

// Character Binary Search
System.out.print("Character Binary Search ");
StartTime=System.nanoTime();
for (i=1; i<=nRepetitions; i++)
ParseBinarySearch();
EndTime=System.nanoTime();
Duration=EndTime-StartTime;
System.out.println(" Duration="+Duration);

// Character Treeset
System.out.print("Character Treeset Search ");
StartTime=System.nanoTime();
for (i=1; i<=nRepetitions; i++)
ParseTreesetSearch();
EndTime=System.nanoTime();
Duration=EndTime-StartTime;
System.out.println(" Duration="+Duration);
}

}
***** End of Code *****

Sincerely,

Gene Wirchenko
 
M

markspace

So how would you have written this benchmark?


Um, realistically? Is this really what you want to do?

static boolean TreesetSearch( char CurrChar ) {
return IdentCharsSet.contains( CurrChar );
}

All of the identifiers in your "language" are single characters?

I would have used actual strings, preferably from existing code so you
could test performance. Although I appreciate you making a
self-contained example for us'm here on usenet.
 
M

markspace

search. The times for sequential searching are only a bit worse than
for binary searching. Treeset searching is about 20% faster. Any
explanations?


Why does your TreeSetSearch() call SequentialSearch? Doesn't that
defeat the purpose of your timing comparisons? I'm also not following
the parsing you are doing at all. What is the goal of that method?

static void ParseTreesetSearch()
{
int xScan = 0;
boolean fBuildingIdent = false;
boolean fInIdentChars;
String cIdent = ""; // fussy init
while( xScan < cParseString.length() ) {
char CurrChar = cParseString.charAt( xScan );
fInIdentChars = SequentialSearch( CurrChar );
^^^^^^^^^^^^^^^^
if( TreesetSearch( CurrChar ) ) // different code


Odd call indicated above. Doesn't that just do the exact same thing
that TreesetSearch does?
 
M

markspace

Some other things I'd do:


1. Make life easier on the Mark I eyeball and print out your timing in
seconds:

System.out.println( " Duration=" + (Duration/1e9) );


2. Specify your command line arguments. I found this affected the
timing greatly (over 100% faster than in the IDE) .


C:\Users\Brenden\Dev\Test2\src>javac -g:none test\CallingTest.java

C:\Users\Brenden\Dev\Test2\src>
C:\Users\Brenden\Dev\Test2\src>java -server test.CallingTest
Timing Testing of Character Searching

Character Sequential Search Duration=22.251718931
Character Binary Search Duration=21.492850347
Character Treeset Search Duration=19.472623928
Hash Test Duration=0.098741719

C:\Users\Brenden\Dev\Test2\src>
 
J

Joshua Cranmer

Dear Java'ers:

I have completed my benchmarking. The code is below. Note that
the difference in the bodies between ParseSequentialSearch(),
ParseBinarySearch(), and ParseTreesetSearch() is but one line. I
really would have preferred not having to duplicate the code.

Oddly, the timings have a LOT of noise in them. In some runs, a
sequential search has out-performed a binary search. Occasionally, a
sequential search has beaten both a binary search and a Treeset
search. The times for sequential searching are only a bit worse than
for binary searching. Treeset searching is about 20% faster. Any
explanations?

Here is probably the best implementation:

static boolean[] isIdentifierChar;
static String slowSearch;
static void initIdentifierChars(String validChars) {
isIdentifierChar = new boolean[128];
for (char c : validChars.toCharArray())
if (c < 128)
isIdentifierChar[c] = true;
slowSearch = validChars;
}

static boolean isValidChar(char c) {
if (c >= 128)
return slowSearch.indexOf(c) >= 0;
return isIdentifieChar[c];
}

If it's really performance critical, you can just make the character
array the full 64K characters and skip the search, so testing for valid
identifiers is simply an index lookup.

There are other performance improvements:
1. Use StringBuilder. It's mutable, so it avoids copies (kind of like
ArrayList). Heck, that's what your code is doing, it just keeps copying
back and forth between strings.

2. It's probably better to throw out StringBuilder and use token extents
(i.e., from index 5 to 9 is this identifier). It saves copying and can
allow for better error messages in parsers by being able to pinpoint
line/column numbers accurately.

But yeah, I would have coded my benchmark like that. Since you seem to
be concerned with microoptimization here, anything which looks different
from what the real parser implementation would show up as having impact
in timing numbers.

Finally, I might add, if you're writing a full parser, it might be
better to just use an existing Java lexer rather than rolling your own
unless you have a *very* good reason not to.
 
G

Gene Wirchenko

Why does your TreeSetSearch() call SequentialSearch? Doesn't that

I goofed.
defeat the purpose of your timing comparisons? I'm also not following
the parsing you are doing at all. What is the goal of that method?

I am parsing for identifiers. In this test, I just throw them
away.

[snip]
Odd call indicated above. Doesn't that just do the exact same thing
that TreesetSearch does?

Cut and paste did me in. Thank you for catching this.

Sincerely,

Gene Wirchenko
 
G

Gene Wirchenko

Um, realistically? Is this really what you want to do?

static boolean TreesetSearch( char CurrChar ) {
return IdentCharsSet.contains( CurrChar );
}

Yes. I wanted a simple method call in the parser so I could
cut-and-paste. I did not know if I would need more than one call. I
am going to go with a Treeset so I will not have a separate method in
the implementation.
All of the identifiers in your "language" are single characters?

No. An identifier is a sequence of one or more characters that
are in IdentChars (or IdentCharsSet).
I would have used actual strings, preferably from existing code so you
could test performance. Although I appreciate you making a
self-contained example for us'm here on usenet.

I wanted an example. I have test files for when I have this
worked out.

Sincerely,

Gene Wirchenko
 
G

Gene Wirchenko

Some other things I'd do:
1. Make life easier on the Mark I eyeball and print out your timing in
seconds:

System.out.println( " Duration=" + (Duration/1e9) );

Yes, if this were not just test code.
2. Specify your command line arguments. I found this affected the
timing greatly (over 100% faster than in the IDE) .

There are none.
C:\Users\Brenden\Dev\Test2\src>javac -g:none test\CallingTest.java

C:\Users\Brenden\Dev\Test2\src>
C:\Users\Brenden\Dev\Test2\src>java -server test.CallingTest
Timing Testing of Character Searching

Character Sequential Search Duration=22.251718931
Character Binary Search Duration=21.492850347
Character Treeset Search Duration=19.472623928
Hash Test Duration=0.098741719
^^^^^^^^^
What is this one, please?

Sincerely,

Gene Wirchenko
 
M

markspace

Yes, if this were not just test code.

Test code is probably more important to get right than the actual
production code. You'll be using the test code more than the production. ;)

I have another question. This code here, most of this is debugging?
And you take the entire string and concatenate the characters together,
but you don't actually parse this into tokens:


while( xScan < cParseString.length() ) {
char CurrChar = cParseString.charAt( xScan );
fInIdentChars = SequentialSearch( CurrChar );
if( TreesetSearch( CurrChar ) ) // different code
{
if( fBuildingIdent ) {
cIdent += CurrChar;
} else {
fBuildingIdent = true;
cIdent = "" + CurrChar;
}
} else if( fBuildingIdent ) {
fBuildingIdent = false;
if( nRepetitions == 1 ) {
System.out.println( cIdent );
}
} else {
}
xScan++;
}

Your just double checking that SequentialSearch and TreesetSearch return
the same thing? And you concat the whole cParseString into cIdent, you
don't look for white space or anything and break cParseString into tokens?

Don't worry about Hash Test, I have something faster now.

BTW I refactored your test that you were copy-and-pasting around into
one method. Using techniques I mentioned in my first post to you on
this subject.

private static void time( TestCase r ) {
long StartTime = System.nanoTime();
for( int i = 1; i <= nRepetitions; i++ ) {
r.parse();
}
long EndTime = System.nanoTime();
long Duration = EndTime - StartTime;
System.out.println( " Duration=" + (Duration/1e9) );
}
 
L

lewbloch

markspace said:
...
BTW I refactored your test that you were copy-and-pasting around into
one method.  Using techniques I mentioned in my first post to you on
this subject.

     private static void time( TestCase r ) {
         long StartTime = System.nanoTime();
         for( int i = 1; i <= nRepetitions; i++ ) {
            r.parse();
         }
         long EndTime = System.nanoTime();
         long Duration = EndTime - StartTime;
         System.out.println( " Duration=" + (Duration/1e9) );
     }

You'll want to run each loop a bunch of times (10,000? 100,000? >=
1M?) before starting the timing loop in order to cancel the effects of
HotSpot warmup.

Unless your real-world scenario pretty much guarantees that HotSpot
won't be a factor.

Micro-benchmarks in Java are, at best, a dicey basis for any
performance conclusions.
 
M

markspace

You'll want to run each loop a bunch of times (10,000? 100,000?>=
1M?) before starting the timing loop in order to cancel the effects of
HotSpot warmup.


I was hoping that the -server flag would obviate most or all warm-up
(this is from another post in this tread):

C:\Users\Brenden\Dev\Test2\src>java -server test.CallingTest

No?
 
G

Gene Wirchenko

Test code is probably more important to get right than the actual
production code. You'll be using the test code more than the production. ;)

No. I expect that I will be using the resulting preprocessor for
years. The test code will be tossed shortly.
I have another question. This code here, most of this is debugging?
And you take the entire string and concatenate the characters together,
but you don't actually parse this into tokens:

Yes, I do. The only tokens that I am interested in are
identifiers.

[snip]
Your just double checking that SequentialSearch and TreesetSearch return
the same thing? And you concat the whole cParseString into cIdent, you
don't look for white space or anything and break cParseString into tokens?

No. I am just after the timing.

The only tokens that I am interested in are the identifiers. The
identifiers are defined as sequences of characters in IdentChars or
IdentCharsSet. The rest of the input will be echoed.
Don't worry about Hash Test, I have something faster now.

But *I* want something faster. That was the whole pioont of this
testing: to find out which way was fastest.

[snip]

Sincerely,

Gene Wirchenko
 
G

Gene Wirchenko

Here is probably the best implementation:

Nope. I want a configurable character set for identifiers

[snip]
There are other performance improvements:
1. Use StringBuilder. It's mutable, so it avoids copies (kind of like
ArrayList). Heck, that's what your code is doing, it just keeps copying
back and forth between strings.
Noted.

2. It's probably better to throw out StringBuilder and use token extents
(i.e., from index 5 to 9 is this identifier). It saves copying and can
allow for better error messages in parsers by being able to pinpoint
line/column numbers accurately.

I plan to for the real thing. I have other changes as well.
But yeah, I would have coded my benchmark like that. Since you seem to
be concerned with microoptimization here, anything which looks different
from what the real parser implementation would show up as having impact
in timing numbers.

Nah, I just do not have a sense of how fast various Java things
are. Benchmark to find out.
Finally, I might add, if you're writing a full parser, it might be
better to just use an existing Java lexer rather than rolling your own
unless you have a *very* good reason not to.

I am writing a *simple* parser. It is not for grovelling over
Java code. It is for a preprocessor for SQL Server for better code
management. I mean for it to be fairly language-agnostic.

Sincerely,

Gene Wirchenko
 
B

blmblm

[ snip ]
BTW I refactored your test that you were copy-and-pasting around into
one method. Using techniques I mentioned in my first post to you on
this subject.

private static void time( TestCase r ) {
long StartTime = System.nanoTime();
for( int i = 1; i <= nRepetitions; i++ ) {
r.parse();
}
long EndTime = System.nanoTime();
long Duration = EndTime - StartTime;
System.out.println( " Duration=" + (Duration/1e9) );
}

Gene --

Below is my revision, along similar lines to what markspace suggests
above but a complete program program based on your code. I tried not
to make gratuitous additional changes, but I admit that I did move the
declaration of the loop counter into the "for" construct because, well,
*one* gratuitous change to make the program look less Java-unidiomatic?
(Also I admit I ran the whole program through vim's "reindent", but
that should change only whitespace.)

Non-whitespace changes marked "// (blmblm)".


import java.util.*;



public class TimingTesting2
{

// (blmblm) added
interface Searcher {
boolean search(char CurrChar);
}

static String cParseString=
"//identifier//IDENTIFIER//a_b_c abc1234b5%$__dbl;one;two;three;END";

static String IdentChars=
"0123456789"+
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"+
"_"+
"abcdefghijklmnopqrstuvwxyz"; // sorted order!

static SortedSet<Character> IdentCharsSet=new TreeSet<Character>();

static int nRepetitions=1000000;



static boolean SequentialSearch
(
char CurrChar
)
{
boolean fFound=false;
for (int i=0; i<IdentChars.length() && !fFound; i++)
fFound=IdentChars.charAt(i)==CurrChar;
return fFound;
}



static boolean BinarySearch
(
char CurrChar
)
{
int xLow=0;
int xHigh=IdentChars.length()-1;
int xTry;
boolean fFound=false;
while (xLow<=xHigh)
{
xTry=(xLow+xHigh)/2;
if (CurrChar==IdentChars.charAt(xTry))
return true;
if (CurrChar<IdentChars.charAt(xTry))
xHigh=xTry-1;
else
xLow=xTry+1;
}
return false;
}



static boolean TreesetSearch
(
char CurrChar
)
{
return IdentCharsSet.contains(CurrChar);
}


// (blmblm) merge three separate Parse routines
static void Parse(Searcher searcher)
{
int xScan=0;
boolean fBuildingIdent=false;
boolean fInIdentChars;
String cIdent=""; // fussy init
while (xScan<cParseString.length())
{
char CurrChar=cParseString.charAt(xScan);
fInIdentChars=SequentialSearch(CurrChar);
if (searcher.search(CurrChar)) // (blmblm)
if (fBuildingIdent)
cIdent+=CurrChar;
else
{
fBuildingIdent=true;
cIdent=""+CurrChar;
}
else
if (fBuildingIdent)
{
fBuildingIdent=false;
if (nRepetitions==1)
System.out.println(cIdent);
}
else
{}
xScan++;
}
if (fBuildingIdent)
if (nRepetitions==1)
System.out.println(cIdent);
}


// (blmblm) factor out common code, as suggested by markspace
static void time(Searcher searcher) {
long StartTime;
long EndTime;
long Duration;

StartTime=System.nanoTime();
// (blmblm) gratuitous change (declare loop counter within "for")
for (int i=1; i<=nRepetitions; i++)
Parse(searcher);
EndTime=System.nanoTime();
Duration=EndTime-StartTime;
System.out.println(" Duration="+Duration);
}


public static void main(String[] args)
{
System.out.println("Timing Testing of Character Searching");
System.out.println();

// Initialise Set.
// (blmblm) gratuitous change (declare loop counter within "for")
for (int i=0; i<IdentChars.length(); i++)
IdentCharsSet.add(IdentChars.charAt(i));

// Character Sequential
System.out.print("Character Sequential Search");
// (blmblm) use common-code method
time(new Searcher() {
public boolean search(char CurrChar) {
return SequentialSearch(CurrChar);
}
});

// Character Binary Search
System.out.print("Character Binary Search ");
// (blmblm) use common-code method
time(new Searcher() {
public boolean search(char CurrChar) {
return BinarySearch(CurrChar);
}
});

// Character Treeset
System.out.print("Character Treeset Search ");
// (blmblm) use common-code method
time(new Searcher() {
public boolean search(char CurrChar) {
return TreesetSearch(CurrChar);
}
});
}

}
 
M

markspace

No. I expect that I will be using the resulting preprocessor for
years. The test code will be tossed shortly.

You have a couple of problems with your code, one organizational and the
other understanding the effeciencies.

The organizational one relates to the idea that you'll just toss your
tests away. Don't ever do that! The test code is part of the project,
and should remain with it. Test code is also put under code control,
and managed along with the projects. It's important because every time
you want to change your parser, you'll need to re-run the tests to make
sure everything is working.

Are you using an IDE? Most will auto generate a test framework for you.
It's very handy and you should be doing this regardless how you write
code. The IDE just makes it very handy.
No. I am just after the timing.

The other thing, efficiency, I'll show you right now. The
organizational stuff is actually probably a bigger deal, but I think
you'll be happy to see how to make code faster.

This line here is the biggest offender.
cIdent += CurrChar;

This is super inefficient inside a loop. To do this, the system has to
create a new string with one extra character, and then toss away the old
string. Making a new object and tossing an old one is bound to slow you
down.

final public void parse() {
StringBuilder sb1 = new StringBuilder( 255 );
for( int xScan = 0; xScan <
TimingTesting.cParseString.length(); xScan++ ) {
char c = TimingTesting.cParseString.charAt( xScan );
if( find( c ) ) {
sb1.append( c );
}
}
String ... = sb1.toString()

Here's my adaptation of your loop. Notice I make a StringBuilder once,
outside the loop, and call append() inside the loop, which is much much
faster. Then I call toString once outside the loop again, so I only
create a new String once, not each time inside the loop. Try to
refactor your code to do this, it will make it much faster.


One last thing for now: on splitting a string into tokens, look at this:

String[] tok = TimingTesting.cParseString.split( "[^a-zA-Z0-9]+" );
System.out.println( Arrays.toString( tok ) );
 
J

Jeff Higgins

I am writing a *simple* parser. It is not for grovelling over
Java code. It is for a preprocessor for SQL Server for better code
management. I mean for it to be fairly language-agnostic.
What means simple? JavaCC is the parser generator that I'm most familiar
with. <http://javacc.java.net/>
 
J

Joshua Cranmer

Nope. I want a configurable character set for identifiers

You can configure it. You configure it by calling the first method. If
you don't like it, you can have people configure it by setting the array
value themselves. Note that this is a similar implementation to your
"TreeSet" idea, just using a raw array instead of a TreeSet.

If you are really concerned about speeds, it is a higher-cost initial
overhead and an asymptotically and practically faster per-character
lookup time.
 
L

Lew

I was hoping that the -server flag would obviate most or all warm-up (this is
from another post in this tread):

C:\Users\Brenden\Dev\Test2\src>java -server test.CallingTest

No?

No. How could it?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,708
Latest member
SherleneF1

Latest Threads

Top