xml syntax

G

Geoff

Hello,

I've been reading about xml, etc. but does xml have a bnf to go with it? If
not, how do the parsers work, if yes, can someone point me to where the bnf
might be?

Thanks.

-g
 
A

Andy Dingley

Geoff said:
I've been reading about xml, etc. but does xml have a bnf to go with it?

No, XML defines a syntax, which specifies what a "well formed" XML
document looks like. Then XML Schema or DTD on top of this allows the
developer to define the set of elements and their nesting rules
permitted within a set of documents. This specifiies what a document
"valid according to that schema" can look like. DTD is an older
standard than Schema, and there's a lot of overlap -- neither can be
completely ignored though.

BNF is somewhere around the DTD level. However BNF is also far too
trivial, and EBNF is too inconsistently defined, to be useful here.
They're just not used in XML work.

It's also a fundamental part of XML (in contrast to SGML) that
"well-formed" is important and that being "valid" rather less so. It's
common practice for tools to work with XML by parsing it, even though
they don't have the DTD --- and if you don't know the DTD, then you
can't even tell if XML is valid or not. However it's still a practical
XML document and you can do useful work on it.

If you're looking for interesting new schema languages to look at, in
preference to BNF, then look hard at Relax NG.
 
A

Andy Dingley

Magnus said:
The EBNF rules are defined in context in the Rec.

One has to question the formal value of any specification language that
has to be re-defined every time you use it! Nor is the EBNF included
in the XML specification anything like a complete statement of XML
syntax and all its rules.
 
M

Magnus Henriksson

Andy said:
One has to question the formal value of any specification language that
has to be re-defined every time you use it!

I don't know what you mean by this. The EBNF for XML sets the rules for
XML itself, not a particular XML vocabulary.
Nor is the EBNF included
in the XML specification anything like a complete statement of XML
syntax and all its rules.

No, there are further well-formedness and validity constraints that
cannot be expressed in EBNF. But it is pretty useful when writing an XML
parser, by hand or using a BNF parser.


// Magnus
 
R

Richard Tobin

One has to question the formal value of any specification language that
has to be re-defined every time you use it!
[/QUOTE]
I don't know what you mean by this. The EBNF for XML sets the rules for
XML itself, not a particular XML vocabulary.

I think he was referring to the Eing of the BNF.

-- Richard
 
J

Joe Kesselman

Geoff said:
I've been reading about xml, etc. but does xml have a bnf to go with it?

As others have said: If you want the official word, go to the W3C's
website. These days you need to not only read and understand the XML
spec, but the XML Namespaces spec... and possibly others (eg XML Schema)
depending on what you're trying to do.

Pure Backus-Naur Form is rarely used these days, because it tends to be
excessively verbose. Almost every spec which uses a BNF-like notation to
define its syntax does prefer some form of EBNF (Extended BNF). The
extensions are themselves pretty well standardized at this point, and
most BNF-driven parser generators actually accept some form of BNF
themselves, though you may have to do some manual adaptation

Re "How do the parsers work" -- the real answer here is "Well enough
that you probably don't want to write yet another one unless you have
some reason to believe yours will actually be better in some regard --
or unless your teachers or employers insist you do so as a learning or
clean-room exercise.
 
J

Joe Kesselman

Geoff said:
I've been reading about xml, etc. but does xml have a bnf to go with it?

As others have said: If you want the official word, go to the W3C's
website. These days you need to not only read and understand the XML
spec, but the XML Namespaces spec... and possibly others (eg XML Schema)
depending on what you're trying to do.

Pure Backus-Naur Form is rarely used these days, because it tends to be
excessively verbose. Almost every spec which uses a BNF-like notation to
define its syntax winds up using some form of EBNF (Extended BNF). The
extensions are themselves pretty well standardized at this point, and
most BNF-driven parser generators actually accept an EBNF
themselves, though you may have to do some manual adaptation.

Re "How do the parsers work" -- the real answer here is "Well enough
that you probably don't want to write yet another one unless you have
some reason to believe yours will actually be better in some regard --
or unless your teachers or employers insist you do so as a learning or
clean-room exercise. But to give you the answer you were probably
looking for: Some are recursive descent, are some table-driven state
machines, some apply more complicated analysis. Since decent open-source
parsers are available (eg Apache Xerces), if you're really interested in
this topic you might want to spend some time studying their code.
 
G

Geoff

Hello,

Some of the code I've seen so far, looks like the person kind of did things
by hand.

Eyeballing it and writing code is certainly one way to do but I probably
would have used one of the parser generators, fed the bnf (or ebnf) into it
and let it create the parser.

The parser could be used to check the well-formedness of an xml document.

-g
 
J

Joe Kesselman

Geoff said:
Some of the code I've seen so far, looks like the person kind of did things
by hand.

In some cases that's true, either initially or later to achieve
performance improvements.
>I probably would have used one of the parser generators, fed the bnf
>(or ebnf) into it and let it create the parser.

That will get you about one-third of the way to a useful toolset -- it
will deal with syntax. You still have to deal with semantics, and then
with the APIs (probably SAX or DOM at least, plus -- if this is to be
practical -- a serializer from that form back to XML, which becomes more
complicated when you start trying to deal with the fine details of the
spec.)

An XML parser is a 90/10 problem. You can implement 90% of the spec for
10% of the effort. Making it complete, robust, and efficient takes the
other 90% of the time.
The parser could be used to check the well-formedness of an xml document.

There are many existing parsers. That doesn't mean a new one can't be
created; it does mean you should think about what your new parser is
intended to do that existing ones don't. (If the answer is "give you an
excuse to practice writing parsers", that may be sufficient
justification, but that didn't seem to be where your first question was
starting from.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,005
Messages
2,570,264
Members
46,860
Latest member
JeremiahCo

Latest Threads

Top