Writing a parser

J

Jan Danielsson

Hello all,

I guess this is a question for people who have written a parser.

Does an XML parser ever need to be recursive? I mean like:

&fo&bar;o;

I know this particular example is in the XML specs, and it says that
it will not happen. But are there some really wild constructions that
are allowed, that would require recurive parsing?

Like.. <tag <!-- Comment <tag2 attr="<fo&ou<!-- comment!
-->ml;o/>"></tag2> -->></tag>

Please, don't start taking that a part, I know all the errors in it.
However, what I want to demonstrate is the level of complexity I'm
wondering about. Any case where recursion is needed?
 
G

Gerald Aichholzer

Hello,

Jan said:
I guess this is a question for people who have written a parser.

Does an XML parser ever need to be recursive? I mean like:

&fo&bar;o;

I know this particular example is in the XML specs, and it says that
it will not happen. But are there some really wild constructions that
are allowed, that would require recurive parsing?

Like.. <tag <!-- Comment <tag2 attr="<fo&ou<!-- comment!
-->ml;o/>"></tag2> -->></tag>

Please, don't start taking that a part, I know all the errors in it.
However, what I want to demonstrate is the level of complexity I'm
wondering about. Any case where recursion is needed?

I'm no expert, but AFAIK a XML parser will have to stop if the XML
file is not well-formed. The above example contains errors (you
said it), so it is not well-formed. There's no need for a parser
to accept the above construct. I even think that a parser is not
allowed to accept it.

Gerald
 
?

=?ISO-8859-1?Q?J=FCrgen_Kahrs?=

Jan said:
However, what I want to demonstrate is the level of complexity I'm
wondering about. Any case where recursion is needed?

Why do you worry about recursion ?
Recursive functions usually make parsers easier to implement.
If you *really* cant recurse in your implementation, use stacks
for holding the context.
 
R

Richard Tobin

Jan Danielsson said:
Does an XML parser ever need to be recursive? I mean like:

Yes, but not in the way your examples are.

Elements may contain other elements:

<foo>...<bar>...</bar>...</foo>

Even if you don't return this as a nested structure (for example,
a SAX parser just returns start and end tags), you need to maintain
a stack of open elements so you can detect errors like this:

<foo>...<bar>...</bar>...</wrong>

The replacement text of entities may contain references to other
entities:

<!ENTITY foo "some text">
<!ENTITY bar "contains this [ &bar; ] text">

So that a reference in the document to "&foo;" must be expanded
to "contains this [ some text ] text".

And similarly for external entities.

-- Richard
 
J

Jan Danielsson

Jürgen Kahrs said:
Why do you worry about recursion ?
Recursive functions usually make parsers easier to implement.
If you *really* cant recurse in your implementation, use stacks
for holding the context.

I'm sorry, but I was talking about recursive *expressions* in *XML*,
not as in "a function calling itself". I already have a stack based
parser, but I'm beginning to wonder it is worth the trouble, I haven't
actually seen any examples where I would actually need the stack based
design, and there'a much neater way to solve it, imho, but it would
make certain recursions *in* *XML* impossible.

Sorry for the confusion.
 
S

Soren Kuula

Richard said:
<!ENTITY foo "some text">
<!ENTITY bar "contains this [ &bar; ] text">

So that a reference in the document to "&foo;" must be expanded
to "contains this [ some text ] text".
Surely you mean
<!ENTITY bar "contains this [ &foo; ] text">

?

Soren
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,001
Messages
2,570,251
Members
46,851
Latest member
CristineKo

Latest Threads

Top