using TreeBuilder in an ElementTree like way

G

Greg Aumann

I am trying to write some python code for a library that reads an
XML-like language from a file into elementtree data structures. Then I
want to be able to read and/or modify the structure and then be able to
write it out either as XML or in the original format. I really want the
api for the XML-like language to be the same as the elementtree api to
reduce confusion, ease of learning etc.

In reading the elementtree documentation I found the
ElementTree.TreeBuilder class which it says can be used to create
parsers for XML-like languages. So I wrote the code below. The code is
working but I am not sure that this is really the intended way to use
the ElementTree.TreeBuilder class.

Essentially I was trying to implement the following advice from Frederik
Lundh (Wed, Sep 8 2004 12:54 am):
> by the way, it's trivial to build trees from arbitrary SAX-style sources.
> just create an instance of the ElementTree.TreeBuilder class, and call
> the "start", "end", and "data" methods as appropriate.
>
> builder = ElementTree.TreeBuilder()
> builder.start("tag", {})
> builder.data("text")
> builder.end("tag")
> elem = builder.close()

but in another post he wrote (Wed, May 21 2003 2:56 am):
> usage:
>
> from elementtree import ElementTree, HTMLTreeBuilder
>
> # file is either a filename or an open stream
> tree = ElementTree.parse(file, parser=HTMLTreeBuilder.TreeBuilder())
> root = tree.getroot()
>
> or
>
> from elementtree import HTMLTreeBuilder
>
> parser = HTMLTreeBuilder.TreeBuilder()
> parser.feed(data)
> root = parser.close()

This second one makes me think I should have implemented a parser class
using Treebuilder. Also when I used return builder.close() in the code
below it didn't return an ElementTree structure but an _ElementInterface.

So my question is really about how I should structure the code so that
it is as similar to use this XML format as to use XML itself in
elementtree.

from elementtree import ElementTree
from nltk_lite.corpora.shoebox import ShoeboxFile

class Settings(ShoeboxFile):
def __init__(self):
super(Settings, self).__init__()

def parse(self, encoding=None):
builder = ElementTree.TreeBuilder()
for mkr, value in self.fields(encoding, unwrap=False):
block=mkr[0]
if block in ("+", "-"):
mkr=mkr[1:]
else:
block=None
if block == "+":
builder.start(mkr, {})
builder.data(value)
elif block == '-':
builder.end(mkr)
else:
builder.start(mkr, {})
builder.data(value)
builder.end(mkr)
return ElementTree.ElementTree(builder.close())
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,979
Messages
2,570,185
Members
46,727
Latest member
FelicaTole

Latest Threads

Top