One more observation: There are a heck of a lot of characters that are
valid in element names (just about any alphanumeric in just about any
language, plus some punctuation), since XML's defined in terms of
Unicode. Simply checking whether all the characters in an element name
are legal is something of a pain; figuring out what to replace the
(many!) other Unicode characters with is going to be (ahem) interesting.
The simplest solution would probably be to invent some sort of escaping
syntax (and then, as usual with such things, also escape the
escape-introduction sequence so the conversion is reliably unique and
reversible).
Unless you control ALL names in the document, that does introduce the
risk that a name created by someone else will contain something that
looks like an escape sequence.
BUT... frankly, you really don't *WANT* element names being made up on
the fly, since they're what describes the structure of your document.
Consider putting your non-XML descriptor in _content_, eg an attribute
value, rather than an element name. Among other things, XML already has
the ability to escape characters in text content.
(You still won't be able to use every possible character, even after
escaping it, if you're working in XML 1.0. I believe XML 1.1 -- which is
rarely used -- expanded the legal character set, but you may not want to
make support for 1.1 a prereqisite. The alternative is to fall back to
inventing your own escaping mechanism, eg by doing a base-64 encoding
upon the UTF8 data.)
In other words: What problem are you really trying to solve, and is the
rather ugly kluge you proposed really necessary and/or sufficient?