S
seven.reeds
Hi,
I'm new to parsing/using xml but this project seemed reasonable to cut
my teeth on. I have a few dozen "articles" that are local
announcements of interest for my group's customers. They have a
simple format of a title, zero or more static or hyper-linked images
and one or more paragraphs of text.
A "Title" will hold plain text. The "Text"s will hold plain or mixed
content. The "Image"s will need to know about the hyper-link URL (if
any); the image source URL and possibly "height" and "width"
attributes.
I have made a stab at making a DTD
<!ELEMENT article (title, image*, text+)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT image (src, width?, height?, link?)>
<!ATTLIST src CDATA #REQUIRED>
<!ATTLIST link CDATA #IMPLIED>
<!ATTLIST width PCDATA #IMPLIED>
<!ATTLIST height PCDATA #IMPLIED>
<!ELEMENT text (#CDATA)>
A sample xml doc looks like
<?xml version="1.0" ?>
<!DOCTYPE article SYSTEM "http://www.itg.uiuc.edu/publications/news/
news.dtd">
<article>
<title> Applied Physics Letters Features ITG Image on Cover </title>
<image link="http://scitation.aip.org/dbt/dbt.jsp?
KEY=APPLAB&Volume=90&Issue=21"
src="/images/apl_cover-130.jpg" />
<text> The cover for the <a href="http://scitation.aip.org/dbt/
dbt.jsp?KEY=APPLAB&Volume=90&Issue=21">May
21, 2007 edition of Applied Physics Letters</a>features an image
produced in the ...
</text>
</article>
Now, I have spent time searching this group and a couple others
related to the scripting language and the XML parser i am using. I
*know* what my problem is... what i don't know is why I have it.
My XML parser chokes on the first "&" (ampersand) in the "link"
attribute of the "image" tag. I know that being "well-formed" means
the amps should be "quoted" but I thought that the "CDATA bits in the
DTD meant that *ALL* characters are accepted in this context.
Is my DTD wrong for the xml I have? Is my parser/validator not
picking up on the DTD?
I know that I can pre-process the incoming xml file and change the
amps to the html entity version but that feels wastefull if CDATA is
doing what i thought it should do.
other than a clue , what am I missing?
I'm new to parsing/using xml but this project seemed reasonable to cut
my teeth on. I have a few dozen "articles" that are local
announcements of interest for my group's customers. They have a
simple format of a title, zero or more static or hyper-linked images
and one or more paragraphs of text.
A "Title" will hold plain text. The "Text"s will hold plain or mixed
content. The "Image"s will need to know about the hyper-link URL (if
any); the image source URL and possibly "height" and "width"
attributes.
I have made a stab at making a DTD
<!ELEMENT article (title, image*, text+)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT image (src, width?, height?, link?)>
<!ATTLIST src CDATA #REQUIRED>
<!ATTLIST link CDATA #IMPLIED>
<!ATTLIST width PCDATA #IMPLIED>
<!ATTLIST height PCDATA #IMPLIED>
<!ELEMENT text (#CDATA)>
A sample xml doc looks like
<?xml version="1.0" ?>
<!DOCTYPE article SYSTEM "http://www.itg.uiuc.edu/publications/news/
news.dtd">
<article>
<title> Applied Physics Letters Features ITG Image on Cover </title>
<image link="http://scitation.aip.org/dbt/dbt.jsp?
KEY=APPLAB&Volume=90&Issue=21"
src="/images/apl_cover-130.jpg" />
<text> The cover for the <a href="http://scitation.aip.org/dbt/
dbt.jsp?KEY=APPLAB&Volume=90&Issue=21">May
21, 2007 edition of Applied Physics Letters</a>features an image
produced in the ...
</text>
</article>
Now, I have spent time searching this group and a couple others
related to the scripting language and the XML parser i am using. I
*know* what my problem is... what i don't know is why I have it.
My XML parser chokes on the first "&" (ampersand) in the "link"
attribute of the "image" tag. I know that being "well-formed" means
the amps should be "quoted" but I thought that the "CDATA bits in the
DTD meant that *ALL* characters are accepted in this context.
Is my DTD wrong for the xml I have? Is my parser/validator not
picking up on the DTD?
I know that I can pre-process the incoming xml file and change the
amps to the html entity version but that feels wastefull if CDATA is
doing what i thought it should do.
other than a clue , what am I missing?