Element name length & performance implications

T

Tom Kerigan

I know that longer element names increase the size of an XML document,
ultimately resulting in a larger amount of data at parse-time. Is there
anything else, specifically related to an element name and its length,
that can impact the performance of an XML parser?

The bulk of our XML parsing uses the latest and greatest version of
Apache Xerces.
 
?

=?ISO-8859-1?Q?J=FCrgen_Kahrs?=

Tom said:
ultimately resulting in a larger amount of data at parse-time. Is there
anything else, specifically related to an element name and its length,
that can impact the performance of an XML parser?

The number of attributes of an element may have
an influence. I have seen a parser (I think it
was xmllint) which seemed to have runtime O(n^2)
where n=number of attributes. This became unbearable
in some unlikely situations (more than 1000 attributes).
 
P

Peter Flynn

I'm not sure that that in itself would significantly affect the performance,
as reading bytes (which is all it's doing at that stage) is a relatively
low-level occupation. If the lexer is tokenising element type names and
storing them in some array-like data structure, big names will affect I/O
but not much else. But I'm happy to be proved wrong on that.

Depth can have an effect, especially in mixed content. I have relatively
small documents (4-5Mb) which are marked up very densely in TEI, with
deeply-nested structures such as variant readings of a manuscript or
linguistic (part-of-speech) markup in mixed content such that the character
data can be 15-20 levels below the root element. Nevertheless, onsgmls
rips through these in 5-8 seconds on a Dell 4150 running FC4/KDE/Emacs.

I have seen some truly ludicrous examples of data-oriented e-commerce XML
with element type names machine-generated from concatenated
database-table-field-relation[-field-relation]*-value names which ran to
400-500 characters, but the files were very small (40-50kb) so I'm not
sure what effect the names had on the parser (apart from the initial I/O).
The number of attributes of an element may have
an influence. I have seen a parser (I think it
was xmllint) which seemed to have runtime O(n^2)
where n=number of attributes. This became unbearable
in some unlikely situations (more than 1000 attributes).

Number of attributes could probably affect it, but anyone who "designs"
a document type with elements bearing 1000 attributes deserves all they
get, IMHO.

///Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,001
Messages
2,570,254
Members
46,850
Latest member
VMRKlaus8

Latest Threads

Top