Asger said:
"I'm just an old C/C++ guy" to "Desperate Perl Hacker" is a little more
then I can understand.
Sorry for the shorthand. "Desperate perl hacker" is a bit of slang used
in the XML community to refer to folks -- no matter what language
they're working in -- who are just looking for a quickie solution to a
particular XML task rather than one that actually follows the full XML
architecture. As such, it's a good description of the approach you're
taking.
Nothing wrong with it, within its limits, but it does have limits. And
it isn't a criticism of you, your skills, or even of Perl, but an
observation that you're taking an approach which has built-in
limitations. If the benefits you're gaining are worth accepting those
limits, go for it; that's the difference between computer science and
software engineering (and I'm very much an engineer myself).
For instance empty tags like these:
<TagName></TagName>
<TagName/>
are not allowed in the files that I am working with. So I catch these as
errors and report them, which a normal parser wouldn't do.
For what it's worth, most of us would implement that in the application
layer rather than the parser. Same result, slightly different
partitioning, probably about the same performance.
As I said: De gustibus non disputandum est. Your solution isn't the one
I'd take given what you've told us. That doesn't mean it's the wrong
one; it does mean that I know my advice to you isn't going to be a good
fit for what you're trying to do and the way you've chosen to do it.
Customized XML parsers can indeed yield a performance gain; IBM has
demonstrated (and patented) some optimizing technology that
automatically produces a highly tuned parser for a particular set of
expected documents. So if you really do know this is going to be the
critical bottleneck in your system, by all means hack it as necessary.