Element name length & performance implications

Tom Kerigan · Oct 25, 2005

I know that longer element names increase the size of an XML document,
ultimately resulting in a larger amount of data at parse-time. Is there
anything else, specifically related to an element name and its length,
that can impact the performance of an XML parser?

The bulk of our XML parsing uses the latest and greatest version of
Apache Xerces.

=?ISO-8859-1?Q?J=FCrgen_Kahrs?= · Oct 25, 2005

Tom said:
ultimately resulting in a larger amount of data at parse-time. Is there
anything else, specifically related to an element name and its length,
that can impact the performance of an XML parser?

The number of attributes of an element may have
an influence. I have seen a parser (I think it
was xmllint) which seemed to have runtime O(n^2)
where n=number of attributes. This became unbearable
in some unlikely situations (more than 1000 attributes).

Peter Flynn · Oct 25, 2005

I'm not sure that that in itself would significantly affect the performance,
as reading bytes (which is all it's doing at that stage) is a relatively
low-level occupation. If the lexer is tokenising element type names and
storing them in some array-like data structure, big names will affect I/O
but not much else. But I'm happy to be proved wrong on that.

Depth can have an effect, especially in mixed content. I have relatively
small documents (4-5Mb) which are marked up very densely in TEI, with
deeply-nested structures such as variant readings of a manuscript or
linguistic (part-of-speech) markup in mixed content such that the character
data can be 15-20 levels below the root element. Nevertheless, onsgmls
rips through these in 5-8 seconds on a Dell 4150 running FC4/KDE/Emacs.

I have seen some truly ludicrous examples of data-oriented e-commerce XML
with element type names machine-generated from concatenated
database-table-field-relation[-field-relation]*-value names which ran to
400-500 characters, but the files were very small (40-50kb) so I'm not
sure what effect the names had on the parser (apart from the initial I/O).

The number of attributes of an element may have
an influence. I have seen a parser (I think it
was xmllint) which seemed to have runtime O(n^2)
where n=number of attributes. This became unbearable
in some unlikely situations (more than 1000 attributes).

Number of attributes could probably affect it, but anyone who "designs"
a document type with elements bearing 1000 attributes deserves all they
get, IMHO.

///Peter

Spaces in element?	4	Dec 19, 2007
Root element name not declared	1	Oct 23, 2007
Can XML-RPC performance be improved?	15	Mar 21, 2006
Xerces-C++ Schema validation	2	Oct 25, 2006
Automatically retrieving XML	4	Oct 26, 2011
Manipulate set of data vs. performance	2	Dec 23, 2007
RFC: thoughts for a "streamlined" XML syntax variant...	15	May 11, 2012
XML Newbie Madness	3	Apr 12, 2008

Element name length & performance implications

Tom Kerigan

=?ISO-8859-1?Q?J=FCrgen_Kahrs?=

Peter Flynn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads