C
ciaran.mchale
I used Google to find information about JAXB 2.0 and I ended up
downloading a document called "The Java Architecture for XML Binding
(JAXB) 2.0: Proposed Final Draft September 30, 2005".
Section D.2 (pages 331 to 334) gives an unambiguous explanation of the
mapping from an XML name into a Java identifier *if* the
underscoreHandling is "asWordSeparator" (which is the default). In
particular, Table D-1 uses regular expressions to clearly indicate what
causes word breaks.
Unfortunately, I found the discussion about the semantics of setting
underscoreHandling to "asCharInWord" to be overly brief and hence
ambiguous. Here are some examples to illustrate what I mean by the
ambiguity. In these examples, I used the "|" character to indicate word
breaks.
Does "_abc" map into "_abc" or "abc"? The former mapping is not a valid
Java identifier, and the latter mapping is possible only if "_" is
considered to punctuation at the start of the XML name (but
"asCharInWord" states that "_" is not considered to be punctuation).
Does "abc_DEF" map into "abc_DEF", "abc_|DEF" or "abc|_DEF"? I have no
idea about this one because it is not stated if "_" is an uppercase or
a lowercase character.
The main reason I am interested in this ambiguity is that I sometimes
learn a new technology by implementing parts of it as a "hobby
project". In this case, I am trying to implement parts of JAXB as a way
for me to get up to speed with it. So I decided to write a function to
do the XML-to-Java mapping for identifiers. Straight away I ran into
this problem. Another reason I am interested is that this ambiguity
seems to be such a fundamental issue that part of my brain is saying
"If the people on the JAXB 2.0 standard screwed up on defining even
this fundamental thing then how can I feel confident that the rest of
the standard isn't riddled with holes too."
Regards,
Ciaran.
downloading a document called "The Java Architecture for XML Binding
(JAXB) 2.0: Proposed Final Draft September 30, 2005".
Section D.2 (pages 331 to 334) gives an unambiguous explanation of the
mapping from an XML name into a Java identifier *if* the
underscoreHandling is "asWordSeparator" (which is the default). In
particular, Table D-1 uses regular expressions to clearly indicate what
causes word breaks.
Unfortunately, I found the discussion about the semantics of setting
underscoreHandling to "asCharInWord" to be overly brief and hence
ambiguous. Here are some examples to illustrate what I mean by the
ambiguity. In these examples, I used the "|" character to indicate word
breaks.
Does "_abc" map into "_abc" or "abc"? The former mapping is not a valid
Java identifier, and the latter mapping is possible only if "_" is
considered to punctuation at the start of the XML name (but
"asCharInWord" states that "_" is not considered to be punctuation).
Does "abc_DEF" map into "abc_DEF", "abc_|DEF" or "abc|_DEF"? I have no
idea about this one because it is not stated if "_" is an uppercase or
a lowercase character.
The main reason I am interested in this ambiguity is that I sometimes
learn a new technology by implementing parts of it as a "hobby
project". In this case, I am trying to implement parts of JAXB as a way
for me to get up to speed with it. So I decided to write a function to
do the XML-to-Java mapping for identifiers. Straight away I ran into
this problem. Another reason I am interested is that this ambiguity
seems to be such a fundamental issue that part of my brain is saying
"If the people on the JAXB 2.0 standard screwed up on defining even
this fundamental thing then how can I feel confident that the rest of
the standard isn't riddled with holes too."
Regards,
Ciaran.