xml file structure for use with ElementTree?

  • Thread starter Stewart Midwinter
  • Start date
S

Stewart Midwinter

I want to parse a file with ElementTree. My file has the following
format:
<!-- file population.xml -->
<?xml version='1.0' encoding='utf-8'?>
<population>
<person><name="joe" sex="male" age="49"></person>
<person><name="hilda" sex="female" age="33"></person>
<person><name="bartholomew" sex="male" age="17">
</person>
</population>
note that the population can have more than one person.

I've created the following script to read and parse this file:
<!-- file et1.py -->
from elementtree import ElementTree
tree = ElementTree.parse("population.xml")
root = tree.getroot()
# ...manipulate tree...
tree.write("outfile.xml")

This script works if I have only one <person> record, but with more
than one record, it fails with the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "D:\PYTHON23\elementtree\ElementTree.py", line 864, in parse
tree.parse(source, parser)
File "D:\PYTHON23\elementtree\ElementTree.py", line 588, in parse
parser.feed(data)
File "D:\PYTHON23\elementtree\ElementTree.py", line 1132, in feed
self._parser.Parse(data, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 3,
column 13

What am I doing wrong? Do I need to describe the xml structure in
some way?

2nd question. Assuming I can read and parse this file, I can create
and add an element to the tree. But how would I go about deleting
from the tree the <person> record for, say, name='joe'?

Is there a source of tutorial information anywhere for the use of
ElementTree? I'm looking for a few more examples than those contained
on the effbot.org site - none of which seem to address the question of
input file structure.

thanks
S
 
A

Andrew Dalke

Stewart said:
<?xml version='1.0' encoding='utf-8'?>
<population>
<person><name="joe" sex="male" age="49"></person>
<person><name="hilda" sex="female" age="33"></person>
<person><name="bartholomew" sex="male" age="17">
</person>
</population>
This script works if I have only one <person> record, but with more
than one record, it fails with the following error: ...
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 3,
column 13

What am I doing wrong? Do I need to describe the xml structure in
some way?

This says your XML is not valid XML. The problem is on
line 3 column 13, which is the "=". You can't do that.
Legal XML would look like

<person name="joe" sex="male" age="49"></person>

or more concisely for empty elements

2nd question. Assuming I can read and parse this file, I can create
and add an element to the tree. But how would I go about deleting
from the tree the <person> record for, say, name='joe'?

from elementtree import ElementTree

tree = ElementTree.parse("tmp.xml").getroot()

for person in tree.findall("person"):
if person.attrib["name"] == "joe":
tree.remove(person)
break
else:
raise AssertionError("Where's joe?")

<population>
<person age="33" name="hilda" sex="female" />
<person age="17" name="bartholomew" sex="male">
Is there a source of tutorial information anywhere for the use of
ElementTree? I'm looking for a few more examples than those contained
on the effbot.org site - none of which seem to address the question of
input file structure.

The effbot site, and its links to articles by Uche and David,
is the best source for documentation.

Given what you've shown, you need a reference to XML
and not ElementTree. The latter assumes you understand
the former. I don't have one handy.

Andrew
(e-mail address removed)
 
S

Stewart Midwinter

Andrew Dalke said:
Legal XML would look like
<person name="joe" sex="male" age="49"></person>

sigh... after a good night's sleep I discovered that myself this
morning. It's obvious, of course.
for person in tree.findall("person"):
if person.attrib["name"] == "joe":
tree.remove(person)
break
else:
raise AssertionError("Where's joe?")

That's the ticket! Unfortunately at the moment when I run this code I
get the following error:'
ElementTree instance has no attribute 'remove'
but I'll try to work through that.
Given what you've shown, you need a reference to XML
and not ElementTree. The latter assumes you understand
the former. I don't have one handy.

that's a polite way of saying I'm clueless about XML, which is true!
The main appeal of ElementTree was so I could avoid having to learn a
whole lot about XML in order to parse a simple file, but I am coming
to the conclusion that ElementTree is only simple if you already have
an understanding about XML.

thanks again,
S
 
A

Andrew Dalke

Stewart said:
That's the ticket! Unfortunately at the moment when I run this code I
get the following error:'
ElementTree instance has no attribute 'remove'
but I'll try to work through that.

Perhaps you need a newer version of ElementTree? I don't
know when 'remove' was added.
The main appeal of ElementTree was so I could avoid having to learn a
whole lot about XML in order to parse a simple file, but I am coming
to the conclusion that ElementTree is only simple if you already have
an understanding about XML.

The problem is that you also need to generate the XML file.
You could use ElementTree to do that, but I've not used
it that way yet.

XML syntax isn't that hard. The summary is that everything
looks pretty much like this

<tagname attrib="val">Content goes here</tagname>

The <tagname> is called an opening tag and the </tagname>
is called an opening tag. The whole thing is called an
element.

There can be 0 or more attributes in the opening tag, but
none in the closing tag. So the following are valid
opening tag names

<tagname>
<person name="Andrew">
<person name="Andrew" city="Santa Fe">

There are ways to escape special characters in the
values for an attribute. Only some characters are
allowed as tag names and attribute names. The ':'
is the only special one. It's used for namespaces.
That's more complicated and you'll need to look
elsewhere for details on that. I don't believe duplicate
attribute names are allowed. Even if they are, don't
use them.

The contents of an element can contain text and other
elements. This is what makes it an element tree.
So the following is also valid

<person><name>Andrew</name><city>Santa Fe</city></person>

It's a matter of some preference about whether to put
data into attributes or as contents of an element.

As a shortcut, if there is no content then ending
the tag with a '/>' makes it both an opening tag and
a closing tag, so the following is a complete element.

<person name="Andrew"/>

The first line of your XML document could contain
another sort of element called a processing directive.
It tells the XML parser how to process the rest of
the document. It looks like this

<?xml version='1.0' encoding='utf-8'?>

Besides describing which XML definition is used
(there's only one I know about), this tells the
processor to interpret bytes as the UTF-8 encoding
of Unicode characters. I believe the first few
bytes are also used to determing the byte ordering
in case the text is stored as big-endian or little-
endian "wide" unicode characters.

One final note. Only one top-level element is
allowed in an XML file. For example, this is allowed


<?xml version='1.0' encoding='utf-8'?>
<people>
<person name="Andrew"/>
<person name="Fred"/>
</people>

while this is not

<?xml version='1.0' encoding='utf-8'?>
<person name="Andrew"/>
<person name="Fred"/>

In other words there is only one root to the
element tree.

Andrew
(e-mail address removed)
 
F

Fredrik Lundh

Andrew said:
Perhaps you need a newer version of ElementTree? I don't
know when 'remove' was added.

remove is available on Element instances. it's not available on ElementTree
wrappers.

</F>
 
A

Andrew Dalke

/F:
remove is available on Element instances. it's not available on ElementTree
wrappers.

Ahh, didn't catch that in the post. I wish the OP had
given a normal traceback instead of summarizing it. Would
have made it easier to see that the "this code" in the
phrase "when I run this code" didn't actually refer to
the code I posted.

Andrew
(e-mail address removed)
 
S

Stewart Midwinter

Andrew Dalke said:
XML syntax isn't that hard. The summary is that everything
looks pretty much like this
.... snip ...


Thanks Andrew, that was a great summary of XML syntax. It will come
in handy when I take the next step, editing a larger, more complex
file.

My ElementTree method is working properly now; I posted a little
sample app on Fredrik's discussion board on quicktopic.com.

cheers
Stewart
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,979
Messages
2,570,185
Members
46,728
Latest member
FernMcmull

Latest Threads

Top