DOCTYPE + SAX

J

jdownie

I'm trying to get xml.sax to interpret a file that begins with…

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://
www.w3.org/TR/html4/loose.dtd">

After a while I get...

http://www.w3.org/TR/html4/loose.dtd:31:2: error in processing
external entity reference

…although…

time curl http://www.w3.org/TR/html4/loose.dtd

…gives…

real 0m26.888s
user 0m0.006s
sys 0m0.013s

Is this a rookie mistake? Should I expect a python SAX parser to
incorporate entities from a remote DTD into it's parsing
interpretation?
 
A

Alain Ketterlin

jdownie said:
I'm trying to get xml.sax to interpret a file that begins with…

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://
www.w3.org/TR/html4/loose.dtd">

After a while I get...

http://www.w3.org/TR/html4/loose.dtd:31:2: error in processing
external entity reference

…although…

time curl http://www.w3.org/TR/html4/loose.dtd
[works]

You're mistaken. There is no problem fetching the file, but there is a
problem while parsing the file (at line 31, where you find a comment in
an entity declaration, which is not acceptable in XML).

You're trying to use HTML's SGML DTD in a XML document. Direct your
doctype to XHTML's DTD, and everything will be fine (hopefully).

BTW, your installation will probably let you use a locally cached copy
of the DTD, instead of fetching a file at every parse. How this works
depends somehow on the parser you use.

-- Alain.
 
J

jdownie

jdownie said:
I'm trying to get xml.sax to interpret a file that begins with…
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://
www.w3.org/TR/html4/loose.dtd">
After a while I get...
http://www.w3.org/TR/html4/loose.dtd:31:2:error in processing
external entity reference
…although…

time curlhttp://www.w3.org/TR/html4/loose.dtd
[works]

You're mistaken. There is no problem fetching the file, but there is a
problem while parsing the file (at line 31, where you find a comment in
an entity declaration, which is not acceptable in XML).

You're trying to use HTML's SGML DTD in a XML document. Direct your
doctype to XHTML's DTD, and everything will be fine (hopefully).

BTW, your installation will probably let you use a locally cached copy
of the DTD, instead of fetching a file at every parse. How this works
depends somehow on the parser you use.

-- Alain.

Excellent. I think I understand that. I'll look around for the xhtml
version of the html4/loose DTD and try what you suggest. Thanks very
much.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,709
Latest member
AustinMudi

Latest Threads

Top