M
Mike
Related to another topic I just posted, I wanted to discuss ways to optimize
the validation of very large (>100MB) XML documents.
First, I have no idea if something like this already exists; it may even be
the typical implementation for all I know.
At any rate, it occurs to me that the set of business rules that need to be
validated against an XML document represent a limited set of nodes at any
given time (while parsing through the document). For example, if there is a
parent<->child node dependency, then only the pertinent information related
to those nodes needs to be kept in memory. Once the dependency has been
resolved (by validating the rule), the memory associated with those nodes
could then be freed. In this way, large documents could be validated
efficiently, by only storing information related to dependencies, and
immediately freeing memory once the dependency is resolved.
I don't have a lot of practical XML experience. But I've read, for example,
that using a SAX parser can be difficult in cases where you need to maintain
a lot of "state" information. So, what I'm asking is whether there is a
general solution to this problem, rather than having application-specific
code to handle the "state" of dependencies?
It seems to me that rule dependencies could be represented by a "graph",
similar in some ways to the Java garbage collector. And, like the garbage
collector, memory could be freed once there are no more "references" to a
particular dependency. The dependencies themselves would be something like
"threads" that connect nodes. Larger threads would require more memory.
Further optimization might be achieved by determining if the dependency
threads are better suited for a depth-first or breadth-first traversal, or
some combination.
In my other post, I ask about whether XML Schema can be used for validation
of rules like this, or if there are other solutions. In the context of this
post, does XML Schema or any other method support any of the concepts I talk
about above?
If XML Schema is unable to handle rules like this, and there is no other
available solution, does it make sense that something based on XPath might
work? I'm wondering if the XPath expressions could be used to represent the
dependencies (as in what to keep in memory), and then something else would
actually evaluate the dependency.
Thanks for any help/suggestions/comments,
Mike
the validation of very large (>100MB) XML documents.
First, I have no idea if something like this already exists; it may even be
the typical implementation for all I know.
At any rate, it occurs to me that the set of business rules that need to be
validated against an XML document represent a limited set of nodes at any
given time (while parsing through the document). For example, if there is a
parent<->child node dependency, then only the pertinent information related
to those nodes needs to be kept in memory. Once the dependency has been
resolved (by validating the rule), the memory associated with those nodes
could then be freed. In this way, large documents could be validated
efficiently, by only storing information related to dependencies, and
immediately freeing memory once the dependency is resolved.
I don't have a lot of practical XML experience. But I've read, for example,
that using a SAX parser can be difficult in cases where you need to maintain
a lot of "state" information. So, what I'm asking is whether there is a
general solution to this problem, rather than having application-specific
code to handle the "state" of dependencies?
It seems to me that rule dependencies could be represented by a "graph",
similar in some ways to the Java garbage collector. And, like the garbage
collector, memory could be freed once there are no more "references" to a
particular dependency. The dependencies themselves would be something like
"threads" that connect nodes. Larger threads would require more memory.
Further optimization might be achieved by determining if the dependency
threads are better suited for a depth-first or breadth-first traversal, or
some combination.
In my other post, I ask about whether XML Schema can be used for validation
of rules like this, or if there are other solutions. In the context of this
post, does XML Schema or any other method support any of the concepts I talk
about above?
If XML Schema is unable to handle rules like this, and there is no other
available solution, does it make sense that something based on XPath might
work? I'm wondering if the XPath expressions could be used to represent the
dependencies (as in what to keep in memory), and then something else would
actually evaluate the dependency.
Thanks for any help/suggestions/comments,
Mike