It seems that XPath does not distinguish between an inexistent pathand a null string?

R

Ramon F Herrera

Some experimentation indicates that if I have an XML field that ends
like these:

<someword></someword>

or

<someword />

An XPath query will not return anything different from those and
completely weird inexistent path.

Is that the case? How am I supposed to save an empty string??

-Ramon
 
A

Alain Ketterlin

Ramon F Herrera said:
Some experimentation indicates that if I have an XML field that ends
like these:

<someword></someword>

or

<someword />

These two are exactly equivalent.
An XPath query will not return anything different from those and
completely weird inexistent path.

Why should it? There is no node inside the someword element, the same as
there is no node inside an inexistent path.
Is that the case? How am I supposed to save an empty string??

As you do above. If you want to test whether an element is empty, use
/path/to/the/element[count(node()) = 0] (absolutely no node, whatever
type -- if you want "no child element" use count(*), "no child and no
attribute" use count(*|@*), "no comment or pi" use
count(comment()|processing-instruction()), etc.)

-- Alain.
 
B

Bjoern Hoehrmann

* Ramon F Herrera wrote in comp.text.xml:
Some experimentation indicates that if I have an XML field that ends
like these:

<someword></someword>

or

<someword />

An XPath query will not return anything different from those and
completely weird inexistent path.

Is that the case? How am I supposed to save an empty string??

The XPath data model does not capture the difference between the two
expressions, much like other software might not capture the difference
between "1.000" and "1.0000". You could, in theory, use the difference
between the two instances to indicate an "empty string", but many XML
tools would not be able to tell them apart, so you would either have to
constrain yourself to a specific set of tools (that might break with
the next upgrade) or decide on some alternative representation, say, to
omit the element if the value is an empty string.
 
J

Joe Kesselman

<someword></someword>
or
<someword />

Per the definition of the XML Data Model, there is no difference between
the two.
Is that the case? How am I supposed to save an empty string??

If you need to distinguish between null and empty, XML Schema uses an
attribute to flag the former. (See the description of nillable values.)
I'd recommend adopting that approach.

Or simply not having <someword/> present at all...?


--
Joe Kesselman,
http://www.love-song-productions.com/people/keshlam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 
M

mayeul.marguet

Some experimentation indicates that if I have an XML field that ends
like these:

<someword></someword>

or

<someword />

An XPath query will not return anything different from those and
completely weird inexistent path.

If you convert to text, XPath does not distinguish between empty text
and inexistent path. Indeed.
Is that the case? How am I supposed to save an empty string??

Don't convert to text, or don't convert to text yet. Get a list of
elements first.
If the list is empty, the path does not exist. If the list contains
something, the path exist. Get your text from there.
 
J

japisoft

Well,

When your document is parsed, the final result has no difference. But from
the schema/DTD view, there's a
difference because

<!ELEMENT somework EMPTY>

and

<!ELEMENT somework (#PCDATA)>

are not equal, so write :

<someword></someword>

is equal to

<someword/>

is only true if the content type is (#PCDATA) or

<xs:element name="somework" type="xs:string"/> from a W3C Schema.

Best wishes,

A.Brillant
http://www.editix.com - XML Editor


"Ramon F Herrera" a écrit dans le message de groupe de discussion :
(e-mail address removed)...


Some experimentation indicates that if I have an XML field that ends
like these:

<someword></someword>

or

<someword />

An XPath query will not return anything different from those and
completely weird inexistent path.

Is that the case? How am I supposed to save an empty string??

-Ramon
 
A

Alain Ketterlin

japisoft said:
When your document is parsed, the final result has no difference. But
from the schema/DTD view, there's a
difference because

<!ELEMENT somework EMPTY>

and

<!ELEMENT somework (#PCDATA)>

are not equal, so write :

<someword></someword>

is equal to

<someword/>

is only true if the content type is (#PCDATA) or

Do you have a source for this claim? I think it's wrong. (Not the fact
that #PCDATA and EMPTY are different, the fact that there is a
difference between the two forms of empty elements.)

The XML recommandation says: "If an element is empty, it must be
represented either by a start-tag immediately followed by an end-tag or
by an empty-element tag." (Section 3.1)

The only difference I know of is mentioned in the next paragraph: "For
interoperability, the empty-element tag must be used, and can only be
used, for elements which are declared EMPTY." And "For interoperability"
is defined earlier as non-binding.

-- Alain.
 
R

Richard Tobin

japisoft said:
<someword></someword>

is equal to

<someword/>

is only true if the content type is (#PCDATA) or

<xs:element name="somework" type="xs:string"/> from a W3C Schema.

No, there is no significant difference - they are just two ways of
writing the same XML document. They both represent an element called
"someword" with no children. <x/> is just shorthand for <x></x>.

-- Richard
 
J

japisoft

Well This is just supposition, because if you want to serialize your XML :
If the output is <myNode/> rather than <myNode></myNode> this is quite
wrong if this is EMPTY. So I figure the parser should keep a difference
info
between the two versions for the output.

<myNode></myNode> => <myNode/>

BUT

<myNode> != <myNode></myNode>

because <myNode></myNode> means this is empty But it could have text
content. I traduce


<myNode></myNode> => with empty text

<myNode/> => with nothing

"Alain Ketterlin" a écrit dans le message de groupe de discussion :
(e-mail address removed)-strasbg.fr...

japisoft said:
When your document is parsed, the final result has no difference. But
from the schema/DTD view, there's a
difference because

<!ELEMENT somework EMPTY>

and

<!ELEMENT somework (#PCDATA)>

are not equal, so write :

<someword></someword>

is equal to

<someword/>

is only true if the content type is (#PCDATA) or

Do you have a source for this claim? I think it's wrong. (Not the fact
that #PCDATA and EMPTY are different, the fact that there is a
difference between the two forms of empty elements.)

The XML recommandation says: "If an element is empty, it must be
represented either by a start-tag immediately followed by an end-tag or
by an empty-element tag." (Section 3.1)

The only difference I know of is mentioned in the next paragraph: "For
interoperability, the empty-element tag must be used, and can only be
used, for elements which are declared EMPTY." And "For interoperability"
is defined earlier as non-binding.

-- Alain.
 
M

Manuel Collado

El 14/06/2012 12:04, Alain Ketterlin escribió:
Do you have a source for this claim? I think it's wrong. (Not the fact
that #PCDATA and EMPTY are different, the fact that there is a
difference between the two forms of empty elements.)

The XML recommandation says: "If an element is empty, it must be
represented either by a start-tag immediately followed by an end-tag or
by an empty-element tag." (Section 3.1)

The only difference I know of is mentioned in the next paragraph: "For
interoperability, the empty-element tag must be used, and can only be
used, for elements which are declared EMPTY." And "For interoperability"
is defined earlier as non-binding.

Yes, at the end of Section 1.2. And it means "... chances that XML
documents can be processed by the existing installed base of SGML
processors ...".

I.e., there is no preference at all (<someword></someword> vs.
<someword/>) if the XML document is intended to be processed by an XML
processor.
 
J

Joe Kesselman

I.e. said:
<someword/>) if the XML document is intended to be processed by an XML
processor.

Speaking as someone who has been working with XML for over a decade now,
and who was involved in the design of the DOM: Exactly. XML considers
this *ONLY* a syntactic difference, with no semantic meaning.
<someword/> is just shorthand for <someword></someword>.

If you need to distinguish between empty and null, see the other thread:
Either use an attribute to distinguish the two cases, or leave the
element out entirely.


--
Joe Kesselman,
http://www.love-song-productions.com/people/keshlam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 
P

Peter Flynn

Do you have a source for this claim?

It's been a long time, but I think you'll find the discussion in the
archives of the XML SIG at the time. I seem to remember we did it to
death and the consensus was that XML should not distinguish between the
two forms.
I think it's wrong. (Not the fact
that #PCDATA and EMPTY are different, the fact that there is a
difference between the two forms of empty elements.)

Officially there is no difference. In practice, if your parser has
access to a DTD or Schema, it would be possible for it to detect if
content was allowed or not.
The XML recommandation says: "If an element is empty, it must be
represented either by a start-tag immediately followed by an end-tag or
by an empty-element tag." (Section 3.1)

The only difference I know of is mentioned in the next paragraph: "For
interoperability, the empty-element tag must be used, and can only be
used, for elements which are declared EMPTY." And "For interoperability"
is defined earlier as non-binding.

It is still regarded as good practice in the publishing field, where the
semantics of potential mixed content can be important. It also serves as
a reminder to those who examine the markup that the element type cannot
have any content, rather than the current element just being empty by
chance.

///Peter
 
J

Joe Kesselman

Officially there is no difference

Correct. If you doubt this, look at the W3C's XML Infoset document (the
official statement of what information an XML document represents),
which has absolutely no way to represent this distinction. Whether
<foo></foo> or <foo/>, the infoset represents it as an element with no
child nodes.

There are stylistic conventions which may cause one or the other to be
preferred by specific applications -- "for interoperability" with SGML
tools being one of those, though realistically there is now enough XML
tooling out there that very few people are still trying to put XML
through SGML tools. But that's strictly style, not substance.

If you want to distinguish empty vs. null, XML lets you do so by adding
an attribute, or a child element, that your application recognizes as
signifying that the content is null. XML Schema suggests an attribute
for that purpose. Adopt it.

--
Joe Kesselman,
http://www.love-song-productions.com/people/keshlam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 
M

Manuel Collado

El 28/06/2012 6:23, Joe Kesselman escribió:
Correct. If you doubt this, look at the W3C's XML Infoset document (the
official statement of what information an XML document represents),
which has absolutely no way to represent this distinction. Whether
<foo></foo> or <foo/>, the infoset represents it as an element with no
child nodes.

There are stylistic conventions which may cause one or the other to be
preferred by specific applications -- "for interoperability" with SGML
tools being one of those, though realistically there is now enough XML
tooling out there that very few people are still trying to put XML
through SGML tools. But that's strictly style, not substance.

Just a minor hint: IIRC, if interoperability with SGML tools matters,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,233
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top