Geoff said:
I've been reading about xml, etc. but does xml have a bnf to go with it?
As others have said: If you want the official word, go to the W3C's
website. These days you need to not only read and understand the XML
spec, but the XML Namespaces spec... and possibly others (eg XML Schema)
depending on what you're trying to do.
Pure Backus-Naur Form is rarely used these days, because it tends to be
excessively verbose. Almost every spec which uses a BNF-like notation to
define its syntax winds up using some form of EBNF (Extended BNF). The
extensions are themselves pretty well standardized at this point, and
most BNF-driven parser generators actually accept an EBNF
themselves, though you may have to do some manual adaptation.
Re "How do the parsers work" -- the real answer here is "Well enough
that you probably don't want to write yet another one unless you have
some reason to believe yours will actually be better in some regard --
or unless your teachers or employers insist you do so as a learning or
clean-room exercise. But to give you the answer you were probably
looking for: Some are recursive descent, are some table-driven state
machines, some apply more complicated analysis. Since decent open-source
parsers are available (eg Apache Xerces), if you're really interested in
this topic you might want to spend some time studying their code.