I'm not sure that that in itself would significantly affect the performance,
as reading bytes (which is all it's doing at that stage) is a relatively
low-level occupation. If the lexer is tokenising element type names and
storing them in some array-like data structure, big names will affect I/O
but not much else. But I'm happy to be proved wrong on that.
Depth can have an effect, especially in mixed content. I have relatively
small documents (4-5Mb) which are marked up very densely in TEI, with
deeply-nested structures such as variant readings of a manuscript or
linguistic (part-of-speech) markup in mixed content such that the character
data can be 15-20 levels below the root element. Nevertheless, onsgmls
rips through these in 5-8 seconds on a Dell 4150 running FC4/KDE/Emacs.
I have seen some truly ludicrous examples of data-oriented e-commerce XML
with element type names machine-generated from concatenated
database-table-field-relation[-field-relation]*-value names which ran to
400-500 characters, but the files were very small (40-50kb) so I'm not
sure what effect the names had on the parser (apart from the initial I/O).
The number of attributes of an element may have
an influence. I have seen a parser (I think it
was xmllint) which seemed to have runtime O(n^2)
where n=number of attributes. This became unbearable
in some unlikely situations (more than 1000 attributes).
Number of attributes could probably affect it, but anyone who "designs"
a document type with elements bearing 1000 attributes deserves all they
get, IMHO.
///Peter