R
Ryan Stewart
I'm getting XHTML input that can be in a number of formats, and I'm
trying to get it into a consistent format for later use. "Consistent"
in this case means everything in the root/body is in either a p, table,
img, ol, or ul tag. I'm processing just the body text. There is no head
section or anything. So the body is the root of the tree that I'm
processing. I've got almost everything working except one thing. If I
get input like the following:
some text<br/>some more text
then I need that to become two paragraphs, like:
<p>some text</p>
<p>some more text</p>
That's easy enough. But if I get this input:
some text <a href="blah">link</a> some more text
that should all become one paragraph:
<p>some text <a href="blah">link</a> some more text<p>
And if a table, list, or image is encountered, that should be the end
of a paragraph if there is one:
some text<table> ... </table>some more text
becomes
<p>some text</p>
<table> ... </table>
<p>some more text</p>
Again, simply placing the text nodes inside p tags is simple, but a
problem arises if there is a link or other tag inside some of that
text. (At this point other tags don't actually matter because I'm
stripping them out, but links need to be passed through.)
Basically, my problem boils down to this:
1) I need to select any text node child of the root and surround it
with p tags, but
2) if an a element is a child of the root, it should be joined with any
adjacent text nodes and the whole thing should be surrounded with p
tags.
Can someone give me an example of how to do this with XSL?
trying to get it into a consistent format for later use. "Consistent"
in this case means everything in the root/body is in either a p, table,
img, ol, or ul tag. I'm processing just the body text. There is no head
section or anything. So the body is the root of the tree that I'm
processing. I've got almost everything working except one thing. If I
get input like the following:
some text<br/>some more text
then I need that to become two paragraphs, like:
<p>some text</p>
<p>some more text</p>
That's easy enough. But if I get this input:
some text <a href="blah">link</a> some more text
that should all become one paragraph:
<p>some text <a href="blah">link</a> some more text<p>
And if a table, list, or image is encountered, that should be the end
of a paragraph if there is one:
some text<table> ... </table>some more text
becomes
<p>some text</p>
<table> ... </table>
<p>some more text</p>
Again, simply placing the text nodes inside p tags is simple, but a
problem arises if there is a link or other tag inside some of that
text. (At this point other tags don't actually matter because I'm
stripping them out, but links need to be passed through.)
Basically, my problem boils down to this:
1) I need to select any text node child of the root and surround it
with p tags, but
2) if an a element is a child of the root, it should be joined with any
adjacent text nodes and the whole thing should be surrounded with p
tags.
Can someone give me an example of how to do this with XSL?