D
Diez B. Roggisch
Hi, I have an XML file which contains entries of the form:
<idlist>
<myID>1</myID>
<myID>2</myID>
....
<myID>10000</myID>
</idlist>
Currently, I have written a SAX based handler that will read in all the
<myID></myID> entries and return a list of the contents of these
entries. However this is not scalable and for my purposes it would be
better if I could iterate over the list of <myID> nodes. Some thing
like:
for myid in getMyIDList(document):
print myid
I realize that I can do this with generators, but I can't see how I can
incorporate generators into my handler class (which is a subclass of
xml.sax.ContentHandler).
Any pointers would be appreciated
Use ElementTree. Or one of the other packages that implement its very
pythonic interface, lxml or cElementTree.
Otherwise, you don't have much chances of using SAX to create a generator
besides reading the whole document into memory (which somehow defeats the
purpose of SAX in the first place) or creating a separate thread that
communicates with an iterable over a queue.
Alternatively, there are parsers out there that implement a PULL style of
parsing instead of the PUSH SAX does. Butr before you start with theses -
take ElementTree.
Diez