E
es@d
Hello there,
I'm trying to build what is in basis a screen scraper sofware that
takes an url as input and produces an xml file as output.I wanted to
introduce something like "document definitiion" for the source URL,
i.e.
<document id="some_news_site_without_rss"
url="http://www.example.com/news.html">
<news repeat="true">
<article>
<title begin="somehtml" end="somehtml">
</article>
</news>
</document>
would something like
<document>
<news>
<article>
<title>Some title 1</title>
</article>
<article>
<title>Some title 2</title>
</article>
<article>
<title>Some title 3</title>
</article>
</news>
</document>
I hope you get the idea.
My problem is that I've tried to describe this "definition language"
using DTD, but as far as I can see DTD doesn't support/specifies
something like "I want to have one fix parent element - document, all
the other elements are user-specified (unspecified), but they have to
be closed and have following attributes...".
I'm not so deep into XML/SGML thing so maybe I'm just missing some
basic thing.
Thanks,
Esad Hajdarevic
I'm trying to build what is in basis a screen scraper sofware that
takes an url as input and produces an xml file as output.I wanted to
introduce something like "document definitiion" for the source URL,
i.e.
<document id="some_news_site_without_rss"
url="http://www.example.com/news.html">
<news repeat="true">
<article>
<title begin="somehtml" end="somehtml">
</article>
</news>
</document>
would something like
<document>
<news>
<article>
<title>Some title 1</title>
</article>
<article>
<title>Some title 2</title>
</article>
<article>
<title>Some title 3</title>
</article>
</news>
</document>
I hope you get the idea.
My problem is that I've tried to describe this "definition language"
using DTD, but as far as I can see DTD doesn't support/specifies
something like "I want to have one fix parent element - document, all
the other elements are user-specified (unspecified), but they have to
be closed and have following attributes...".
I'm not so deep into XML/SGML thing so maybe I'm just missing some
basic thing.
Thanks,
Esad Hajdarevic