Function to return a valid element name

A

adurth

Hi!
Is there any function that converts a string containing characters
that are invalid for use in an element name to a valid one?

Thanks,
Andreas
 
J

Joe Kesselman

Aah yes, sorry I have not been precise. I am looking for a xml
function like translate() or replace().

In that case, I believe the answer is... translate(), or implement your
own recursive string processing if single-character substitutions aren't
sufficient for you. There's nothing standardized for this purpose, since
it isn't something commonly done.
 
A

adurth

In that case, I believe the answer is... translate(), or implement your
own recursive string processing if single-character substitutions aren't
sufficient for you. There's nothing standardized for this purpose, since
it isn't something commonly done.

Okay, thank you anyway.
 
J

Joe Kesselman

One more observation: There are a heck of a lot of characters that are
valid in element names (just about any alphanumeric in just about any
language, plus some punctuation), since XML's defined in terms of
Unicode. Simply checking whether all the characters in an element name
are legal is something of a pain; figuring out what to replace the
(many!) other Unicode characters with is going to be (ahem) interesting.
The simplest solution would probably be to invent some sort of escaping
syntax (and then, as usual with such things, also escape the
escape-introduction sequence so the conversion is reliably unique and
reversible).

Unless you control ALL names in the document, that does introduce the
risk that a name created by someone else will contain something that
looks like an escape sequence.


BUT... frankly, you really don't *WANT* element names being made up on
the fly, since they're what describes the structure of your document.
Consider putting your non-XML descriptor in _content_, eg an attribute
value, rather than an element name. Among other things, XML already has
the ability to escape characters in text content.

(You still won't be able to use every possible character, even after
escaping it, if you're working in XML 1.0. I believe XML 1.1 -- which is
rarely used -- expanded the legal character set, but you may not want to
make support for 1.1 a prereqisite. The alternative is to fall back to
inventing your own escaping mechanism, eg by doing a base-64 encoding
upon the UTF8 data.)


In other words: What problem are you really trying to solve, and is the
rather ugly kluge you proposed really necessary and/or sufficient?
 
A

adurth

One more observation: There are a heck of a lot of characters that are
valid in element names (just about any alphanumeric in just about any
language, plus some punctuation), since XML's defined in terms of
Unicode. Simply checking whether all the characters in an element name
are legal is something of a pain; figuring out what to replace the
(many!) other Unicode characters with is going to be (ahem) interesting.
The simplest solution would probably be to invent some sort of escaping
syntax (and then, as usual with such things, also escape the
escape-introduction sequence so the conversion is reliably unique and
reversible).

Unless you control ALL names in the document, that does introduce the
risk that a name created by someone else will contain something that
looks like an escape sequence.

BUT... frankly, you really don't *WANT* element names being made up on
the fly, since they're what describes the structure of your document.
Consider putting your non-XML descriptor in _content_, eg an attribute
value, rather than an element name. Among other things, XML already has
the ability to escape characters in text content.

(You still won't be able to use every possible character, even after
escaping it, if you're working in XML 1.0. I believe XML 1.1 -- which is
rarely used -- expanded the legal character set, but you may not want to
make support for 1.1 a prereqisite. The alternative is to fall back to
inventing your own escaping mechanism, eg by doing a base-64 encoding
upon the UTF8 data.)

In other words: What problem are you really trying to solve, and is the
rather ugly kluge you proposed really necessary and/or sufficient?

Hi!
Thank you for your extended thoughts on this. As you might have
guessed, I´m pretty new to XML. In my case a tool from a toolchain can
export results as a xml-file. Until now this feature has not been used
but now we want to use it and therefore import it to another tool. As
you can imagine the output is not compatible to what the second tool
can import so I'm currently writing a xsl transformation. In order to
do this, some element values will become element names in the output
xml. Meanwhile I have found the problem I was facing when I posted
this not to be illegal characters in regard to xml (except some
spaces), but the fact that the second tool doesn´t accept a whole
bunch of characters used in the source xml. Consequently it seems to
me that translate() is my choice. If you can advice otherwise, please
tell me!

Regards,
Andreas
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,008
Messages
2,570,268
Members
46,867
Latest member
Lonny Petersen

Latest Threads

Top