XPath speed.

D

Daniel Pitts

Hello everyone.
I've noticed that a lot of time in a few of my code-bases is spent in
XPath.evaluate or XPathExpression.evaluate. I've re-written part of it
to use direct DOM navigation where feasable with a huge speed increase.
I was thinking there must be faster implementation of XPath than the
default com.sun implementation. Does anyone have any experience with this?

Thanks,
Daniel.
 
S

Stanimir Stamenkov

Tue, 08 Jul 2008 17:55:18 -0700, /Daniel Pitts/:
I've noticed that a lot of time in a few of my code-bases is spent in
XPath.evaluate or XPathExpression.evaluate. I've re-written part of it
to use direct DOM navigation where feasable with a huge speed increase.
I was thinking there must be faster implementation of XPath than the
default com.sun implementation. Does anyone have any experience with this?

Haven't used the XPath classes (directly) but I believe the com.sun
classes must fork of the Xalan [1] implementation. Did you try it
plugging the latest Xalan (2.7.1) release? You may get more help
with it on the Xalan user mailing list [2].

[1] http://xml.apache.org/xalan-j/
[2] http://mail-archives.apache.org/mod_mbox/xml-xalan-j-users/
 
T

Tom Anderson

I've noticed that a lot of time in a few of my code-bases is spent in
XPath.evaluate or XPathExpression.evaluate. I've re-written part of it
to use direct DOM navigation where feasable with a huge speed increase.
I was thinking there must be faster implementation of XPath than the
default com.sun implementation. Does anyone have any experience with
this?

Nope. But i'm working on an XPath-heavy app, so let us know if you
discover anything!

tom
 
D

Daniel Pitts

Tom said:
Nope. But i'm working on an XPath-heavy app, so let us know if you
discover anything!

tom
Apparently this is a known issue with the way the sun implementation
hides XPathContext, which (as I understand it) caches some important
information every time its instantiated. XPath.evaluate and
XPathExpression.evaluate both instantiate a new XPathContext every time.

A co-worker of mine sent these references:

For my use case:
My XPath expressions (as Strings) were all being injected by Spring into
one of my beans. XPath was just one option, so I had an interface
(FieldNormalizer) and a few concrete options, XPathFieldNormalizer for
example.

The "average" XPath expression in my app was in one of these forms:
../Tag or /Tag or Tag (get child element called Tag)
..//Tag or //Tag (get descendant element called Tag)
'StringLiteral'
concat('StringLiteral', ./@attribute) (don't ask.)

All of which are "easy" to do "quickly" using normal DOM operations.

The fix (read "hack") I took was to write handlers for those special
cases.
I wrote a ChildElementNormalizer, DescendantElementNormalizer,
StringLiteralNormalizer, and StringLiteralPlusAttributeNormalizer.
I then replaced the XPathFieldNormalizer constructor with a factory
method that will find if the express matches one of the above forms, and
use the appropriate specialization.

The gained my particular application about 10 fold request performance.

The moral is: It is important to analyze your common cases carefully
when optimizing. A profiler tool is a must for this :).

Hope this helps some of you out there facing similar problems.
 
T

Tom Anderson

Apparently this is a known issue with the way the sun implementation
hides XPathContext, which (as I understand it) caches some important
information every time its instantiated. XPath.evaluate and
XPathExpression.evaluate both instantiate a new XPathContext every time.

Aha. I don't think my library is reusing XPathContexts - i should look
into this.

At the moment, we don't have a performance problem, so this would be
premature. However, there are plans afoot to use the code to do something
a lot more performance-critical, and so if XPath queries prove to be a
significant part of the workload, easy speedups would be good to know
about.
For my use case:
My XPath expressions (as Strings) were all being injected by Spring into one
of my beans. XPath was just one option, so I had an interface
(FieldNormalizer) and a few concrete options, XPathFieldNormalizer for
example.

The "average" XPath expression in my app was in one of these forms:
./Tag or /Tag or Tag (get child element called Tag)
.//Tag or //Tag (get descendant element called Tag)
'StringLiteral'
concat('StringLiteral', ./@attribute) (don't ask.)

All of which are "easy" to do "quickly" using normal DOM operations.

The fix (read "hack") I took was to write handlers for those special cases.
I wrote a ChildElementNormalizer, DescendantElementNormalizer,
StringLiteralNormalizer, and StringLiteralPlusAttributeNormalizer.
I then replaced the XPathFieldNormalizer constructor with a factory method
that will find if the express matches one of the above forms, and use the
appropriate specialization.

The gained my particular application about 10 fold request performance.

Wow. Our case is similar - almost all of the expressions look like:

//markertag/chain/of/child/tags

Which could be implemented very quickly indeed with specialised code like
yours.

I imagine you could implement XPath expressions by code generation, as is
done for reflection. That could potentially be as fast as custom code,
while retaining all the flexibility of XPath. It would have monstrous
overhead, but there are many situations where that wouldn't matter.

tom
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,202
Messages
2,571,057
Members
47,661
Latest member
FloridaHan

Latest Threads

Top