Question about Pattern.matcher performance differences between Java6and Java7/8

Z

zach.lists

Hello,

I've noticed a difference in performance between Java6 and Java7/8 (Oracle) on OS X with some toy code. Java6 runs much faster than the updated versions.

VisualVM is pointing me to java.util.regex.Pattern.matcher being the hotspot where all the time is now spent.

I'm still trying to slim down the offending code into a concise example. However I was wondering if anyone knew if there were any changes in java.util.regex.Pattern.matcher performance between Java6 and Java7?

I've read that named groups were introduced in Java7, but I haven't found anything mentioning performance tanking dramatically.

Thanks for any leads you folks may be able to point me to.

-Zach
 
E

Eric Sosman

Hello,

I've noticed a difference in performance between Java6 and Java7/8 (Oracle) on OS X with some toy code. Java6 runs much faster than the updated versions.

VisualVM is pointing me to java.util.regex.Pattern.matcher being the hotspot where all the time is now spent.

I'm still trying to slim down the offending code into a concise example. However I was wondering if anyone knew if there were any changes in java.util.regex.Pattern.matcher performance between Java6 and Java7?

I've read that named groups were introduced in Java7, but I haven't found anything mentioning performance tanking dramatically.

Thanks for any leads you folks may be able to point me to.

One suggestion: Don't fret about the performance of "toy code."

If the performance of "real code" suffers, that's another matter.
Note, though, that if Pattern.matcher() has suddenly become the hot
spot in real code it doesn't necessarily follow that Pattern.matcher()
has become slower. It might instead mean that something else is now
matching a lot more regexes than it used to, or is using a more
complex regex than before. If your car's fuel economy suddenly gets
worse it doesn't prove the engine's gone bad, it may instead have
something to do with the boat trailer you've hitched on ...
 
K

Kevin McMurtrie

Eric Sosman said:
One suggestion: Don't fret about the performance of "toy code."

If the performance of "real code" suffers, that's another matter.
Note, though, that if Pattern.matcher() has suddenly become the hot
spot in real code it doesn't necessarily follow that Pattern.matcher()
has become slower. It might instead mean that something else is now
matching a lot more regexes than it used to, or is using a more
complex regex than before. If your car's fuel economy suddenly gets
worse it doesn't prove the engine's gone bad, it may instead have
something to do with the boat trailer you've hitched on ...

Note that "real code" is difficult to accurately profile in Java.
Instrumentation causes extremely intrusive de-optimization. Sampling
can only see safepoints, which can be sparse or in odd locations.
Neither can determine real GC load. What ends up happening is that you
pull a chunk of suspect code out of your app and run it by itself in a
controlled environment to benchmark it. The alternative is using a
native profiler and trying to manually correlate CPU instructions with
Java source after HotSpot has moved everything around.


When I benchmark one of my large apps using sampling, Integer.hashCode()
sometimes makes the top ten CPU consumers. The call count is normal and
there's absolutely no code inside Integer.hashCode(). That's just where
HotSpot likes dropping a safepoint in my app.
 
S

Stefan Ram

Kevin McMurtrie said:
Neither can determine real GC load. What ends up happening is that you
pull a chunk of suspect code out of your app and run it by itself in a
controlled environment to benchmark it. The alternative is using a

Sometimes, one wants to compare two versions of suspect code,
one of which has some new »optimization«, to see whether that
optimization helps indeed as much as hoped for.

When one needs to optimize, the application must be observably
slow. That means: macroscopically slow, so slow that one can
measure it using a stop watch or at least detect he slowness
somehow.

In this case, one can compare the two versions in-place. That is:
run the app with the old version and then with the new version.
This will include »real GC load« and all.
 
R

Robert Klemme

Sometimes, one wants to compare two versions of suspect code,
one of which has some new »optimization«, to see whether that
optimization helps indeed as much as hoped for.

When one needs to optimize, the application must be observably
slow. That means: macroscopically slow, so slow that one can
measure it using a stop watch or at least detect he slowness
somehow.

In this case, one can compare the two versions in-place. That is:
run the app with the old version and then with the new version.
This will include »real GC load« and all.

I'm not sure I agree to the "real GC load": if you artificially slow the
application down to a level where you can use a stop watch you have
already changed behavior dramatically and all bets are off.

Or did you mean it the other way round: you can only apply the stop
watch approach if the application is so slow?

Kind regards

robert
 
E

Eric Sosman

Eric Sosman said:
Hello,

I've noticed a difference in performance between Java6 and Java7/8 (Oracle)
on OS X with some toy code. [...]

One suggestion: Don't fret about the performance of "toy code."

If the performance of "real code" suffers, that's another matter.
[...]
Note that "real code" is difficult to accurately profile in Java.

So, too, is "toy code."
 
A

Arved Sandstrom

Note that "real code" is difficult to accurately profile in Java.
Instrumentation causes extremely intrusive de-optimization. Sampling
can only see safepoints, which can be sparse or in odd locations.
Neither can determine real GC load. What ends up happening is that you
pull a chunk of suspect code out of your app and run it by itself in a
controlled environment to benchmark it. The alternative is using a
native profiler and trying to manually correlate CPU instructions with
Java source after HotSpot has moved everything around.


When I benchmark one of my large apps using sampling, Integer.hashCode()
sometimes makes the top ten CPU consumers. The call count is normal and
there's absolutely no code inside Integer.hashCode(). That's just where
HotSpot likes dropping a safepoint in my app.
My first (and second and third) instinct has been for a long time to use
profilers to get somewhat coarse timing information, and at the same
time to get somewhat coarse function/method usage frequency information.
And not really because of runtime optimizations either; much more
because the last-mile analysis through human code review is most
effective at analyzing performance issues if provided initial data.

AHS
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top