expat error, help to debug?

A

Andreas Lobinger

Aloha,

i'm trying to write an xml filter, that extracts some info about
an .xml document (with external entities), esp. start elements and
external entities. The document is a DOCBOOK xml and afacs
well formed and passes our docbook toolchain (dblatex etc.).

My parser is (very simple):
[115] scylla(scylla)> more pbxml.py

class xmlhandle:
def __init__(self):
self.parser_stack = [];
self.parser = None;

def se(self,name,attr):
print "s", self.parser.CurrentLineNumber, name, attr

def ex(self,context,baseid,n1,n2):
print "x",context,n1,n2

def fromxml(fname):
import xml.parsers.expat
p = xml.parsers.expat.ParserCreate()
xl = xmlhandle()
p.StartElementHandler = xl.se
p.ExternalEntityRefHandler = xl.ex
xl.parser = p
p.ParseFile(file(fname))
return

if __name__ == "__main__":
import sys
fromxml(sys.argv[1])

my document (in 2 parts):

[116] scylla(scylla)> more s3.xml
<?xml version="1.0"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
"/usr/share/xml/docbook/xml/4.2/docbookx.dtd"
[
<!ENTITY bookinfo SYSTEM "bookinfo.xml">
]>
<book>
&bookinfo;
<chapter id="technicalDescription"><title>technical description</title>
<para>
This chapter includes specification of the main simulation loop.
</para>
</chapter>
</book>

[118] scylla(scylla)> more bookinfo.xml
<bookinfo>
<title>BookTitle</title>
<authorgroup>
<author>
<firstname>A</firstname>
<surname>B</surname>
</author>
</authorgroup>
</bookinfo>

The run produces:

[120] scylla(scylla)> python pbxml.py s3.xml
s 7 book {}
x bookinfo bookinfo.xml None
s 9 chapter {u'id': u'technicalDescription'}
s 9 title {}
s 10 para {}
Traceback (most recent call last):
File "pbxml.py", line 25, in ?
fromxml(sys.argv[1])
File "pbxml.py", line 20, in fromxml
p.ParseFile(file(fname))
TypeError: an integer is required

Anyone any idea where the error is produced?
Anyone any idea how to debug(? if it's really a bug or
missunderstanding of expate) this?

Hoping for an answer and wishing a happy day,
LOBI
 
L

Lawrence D'Oliveiro

Anyone any idea where the error is produced?

Do you want to try adding an EndElementHandler as well, just to get more
information on where the error might be happening?
 
A

Andreas Lobinger

Aloha,
Do you want to try adding an EndElementHandler as well, just to get more
information on where the error might be happening?

I want.

Adding an EndElement (left as an exercise to the user) handler the
output looks like this:
[42] scylla(scylla)> python pbxml.py s3.xml
s 7 book {}
x bookinfo bookinfo.xml None
s 9 chapter {u'id': u'technicalDescription'}
s 9 title {}
e title
s 10 para {}
e para
e chapter
e book
Traceback (most recent call last):
File "pbxml.py", line 29, in ?
fromxml(sys.argv[1])
File "pbxml.py", line 24, in fromxml
p.ParseFile(file(fname))
TypeError: an integer is required

which shows me that the error is caused after parsing the /book ...
BUT still within p.ParseFile (expat internal), so i can't look
into it.

The example here may be missleading. It was stripped down from
a quite large docbook.xml and there ther error happened in the
middle of the document, not at the end.

Wishing a happy day,
LOBI
 
A

Andreas Lobinger

Aloha,


.... to share my findings with you:

def ex(self,context,baseid,n1,n2):
print "x",context,n1,n2
return 1

The registered Handler has to return a (integer) value.
Would have been nice if this had been mentioned in the documentation.

Wishing a happy day,
LOBI
 
A

Andreas Lobinger

Aloha,

Andreas said:
The registered Handler has to return a (integer) value.
Would have been nice if this had been mentioned in the documentation.

Delete last line, it is mentioned in the documentation.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,149
Members
46,695
Latest member
StanleyDri

Latest Threads

Top