A
Andreas Lobinger
Aloha,
i'm trying to write an xml filter, that extracts some info about
an .xml document (with external entities), esp. start elements and
external entities. The document is a DOCBOOK xml and afacs
well formed and passes our docbook toolchain (dblatex etc.).
My parser is (very simple):
[115] scylla(scylla)> more pbxml.py
class xmlhandle:
def __init__(self):
self.parser_stack = [];
self.parser = None;
def se(self,name,attr):
print "s", self.parser.CurrentLineNumber, name, attr
def ex(self,context,baseid,n1,n2):
print "x",context,n1,n2
def fromxml(fname):
import xml.parsers.expat
p = xml.parsers.expat.ParserCreate()
xl = xmlhandle()
p.StartElementHandler = xl.se
p.ExternalEntityRefHandler = xl.ex
xl.parser = p
p.ParseFile(file(fname))
return
if __name__ == "__main__":
import sys
fromxml(sys.argv[1])
my document (in 2 parts):
[116] scylla(scylla)> more s3.xml
<?xml version="1.0"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
"/usr/share/xml/docbook/xml/4.2/docbookx.dtd"
[
<!ENTITY bookinfo SYSTEM "bookinfo.xml">
]>
<book>
&bookinfo;
<chapter id="technicalDescription"><title>technical description</title>
<para>
This chapter includes specification of the main simulation loop.
</para>
</chapter>
</book>
[118] scylla(scylla)> more bookinfo.xml
<bookinfo>
<title>BookTitle</title>
<authorgroup>
<author>
<firstname>A</firstname>
<surname>B</surname>
</author>
</authorgroup>
</bookinfo>
The run produces:
[120] scylla(scylla)> python pbxml.py s3.xml
s 7 book {}
x bookinfo bookinfo.xml None
s 9 chapter {u'id': u'technicalDescription'}
s 9 title {}
s 10 para {}
Traceback (most recent call last):
File "pbxml.py", line 25, in ?
fromxml(sys.argv[1])
File "pbxml.py", line 20, in fromxml
p.ParseFile(file(fname))
TypeError: an integer is required
Anyone any idea where the error is produced?
Anyone any idea how to debug(? if it's really a bug or
missunderstanding of expate) this?
Hoping for an answer and wishing a happy day,
LOBI
i'm trying to write an xml filter, that extracts some info about
an .xml document (with external entities), esp. start elements and
external entities. The document is a DOCBOOK xml and afacs
well formed and passes our docbook toolchain (dblatex etc.).
My parser is (very simple):
[115] scylla(scylla)> more pbxml.py
class xmlhandle:
def __init__(self):
self.parser_stack = [];
self.parser = None;
def se(self,name,attr):
print "s", self.parser.CurrentLineNumber, name, attr
def ex(self,context,baseid,n1,n2):
print "x",context,n1,n2
def fromxml(fname):
import xml.parsers.expat
p = xml.parsers.expat.ParserCreate()
xl = xmlhandle()
p.StartElementHandler = xl.se
p.ExternalEntityRefHandler = xl.ex
xl.parser = p
p.ParseFile(file(fname))
return
if __name__ == "__main__":
import sys
fromxml(sys.argv[1])
my document (in 2 parts):
[116] scylla(scylla)> more s3.xml
<?xml version="1.0"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
"/usr/share/xml/docbook/xml/4.2/docbookx.dtd"
[
<!ENTITY bookinfo SYSTEM "bookinfo.xml">
]>
<book>
&bookinfo;
<chapter id="technicalDescription"><title>technical description</title>
<para>
This chapter includes specification of the main simulation loop.
</para>
</chapter>
</book>
[118] scylla(scylla)> more bookinfo.xml
<bookinfo>
<title>BookTitle</title>
<authorgroup>
<author>
<firstname>A</firstname>
<surname>B</surname>
</author>
</authorgroup>
</bookinfo>
The run produces:
[120] scylla(scylla)> python pbxml.py s3.xml
s 7 book {}
x bookinfo bookinfo.xml None
s 9 chapter {u'id': u'technicalDescription'}
s 9 title {}
s 10 para {}
Traceback (most recent call last):
File "pbxml.py", line 25, in ?
fromxml(sys.argv[1])
File "pbxml.py", line 20, in fromxml
p.ParseFile(file(fname))
TypeError: an integer is required
Anyone any idea where the error is produced?
Anyone any idea how to debug(? if it's really a bug or
missunderstanding of expate) this?
Hoping for an answer and wishing a happy day,
LOBI