Parse XML using Python

A

anilby

Hi,

I wanted to write a script that will read the below file:

<abcd label="ABC">
..
<efg label="EFGA">
.....
<decg label="ABDG">
...
</decg>
...

</efg>

...
<mon1 label="MON">
...
</mon1>
...
</abcd>
..
..
<xyz label="A1">
..
<eg1 label="FGA">
.....
<dg label="BG">
...

</dg>

...

</eg1>

</xyz>

...
and so on

The output of the script shud be

ABC
...EFGA
.....ABDG
...MON

A1
...FGA
.....BG

Please help me in writing a Python script for the above task.
Regards,
Anil.
 
D

Diez B. Roggisch

Anil said:
Could you please tell me how to achieve the below.
I am interested in getting the output like:

ABC
EFGA --> child of ABC
ABDG --> child of AEFGA
MON --> child of ABC
A1
FGA --> child of A1
BG --> child of FGA

print """
ABC
EFGA --> child of ABC
ABDG --> child of AEFGA
MON --> child of ABC
A1
FGA --> child of A1
BG --> child of FGA
"""

Unless you don't tell us what _input_ shall be processed to yield that
output, I doubt anybody can be of more help....
 
J

Jeremy Bowers

Could you please tell me how to achieve the below.
I am interested in getting the output like:

Anil, "use sax" is all you are likely to get as a starting point. If
someone just hands you a solution, what have you learned?

Start using sax (or some other parser, personally I think I'd recommend
ElementTree for this use (google it)), and if you have trouble, post a
the exact code you have, the exact input you are using, what happens, and
what you expected to happen.

It is unlikely that repeated appeals, in the absence of evidence that you
tried to solve it yourself, will get you anywhere.
 
F

Fredrik Lundh

<abcd label="ABC">
</abcd>
.
.
<xyz label="A1">
</xyz>

..
and so on

an XML document can only have a single root element, but your example
has at least two top-level elements (abcd and xyz).

here is some elementtree code that handles this by wrapping your data in
a "root" element.

from elementtree import ElementTree

p = ElementTree.XMLTreeBuilder()

p.feed("<root>")
p.feed(open("mydocument.xml").read())
p.feed("</root>")

root = p.close()

def printelem(elem, prefix=""):
label = elem.get("label")
if label:
if not prefix:
print
print prefix + label
for elem in elem:
printelem(elem, prefix + "..")

for elem in root:
printelem(elem)

# end

the elementtree library can be found here:

http://effbot.org/zone/element-index.htm

</F>
 
W

William Park

Hi,

I wanted to write a script that will read the below file:

<abcd label="ABC">
.
<efg label="EFGA">
....
<decg label="ABDG">
..
</decg>
..

</efg>

..
<mon1 label="MON">
..
</mon1>
..
</abcd>
.
.
<xyz label="A1">
.
<eg1 label="FGA">
....
<dg label="BG">
..

</dg>

..

</eg1>

</xyz>

..
and so on

The output of the script shud be

ABC
..EFGA
....ABDG
..MON

A1
..FGA
....BG

Please help me in writing a Python script for the above task.

Take a look at
http://home.eol.ca/~parkw/park-january.html
on "Expat XML" section towards the end. Translating it to Python is
left for homework.

In essence,
indent=..
start () {
local "${@:2}"
echo "${indent|*XML_ELEMENT_DEPTH-1}$label"
}
xml -s start "`< file.xml`"
which prints
..ABC
....EFGA
......ABDG
....MON
..A1
....FGA
......BG
with modified input, ie. wrapping XML pieces into single root tree.
 
A

Anil

William said:
Take a look at
http://home.eol.ca/~parkw/park-january.html
on "Expat XML" section towards the end. Translating it to Python is
left for homework.

In essence,
indent=..
start () {
local "${@:2}"
echo "${indent|*XML_ELEMENT_DEPTH-1}$label"
}
xml -s start "`< file.xml`"
which prints
..ABC
....EFGA
......ABDG
....MON
..A1
....FGA
......BG
with modified input, ie. wrapping XML pieces into single root tree.
Thanks everyone for the responses. I will try the above solutions.
 
U

Uche Ogbuji

This is a neat solution. You can parse any well-formed general
entitity (e.g. Anil's document with multiple root nodes) in 4Suite
1.0a4:

from Ft.Xml.Domlette import EntityReader
s = """
<spam1>eggs</spam1>
<spam2>more eggs</spam2>
"""
docfrag = EntityReader.parseString(s, 'http://foo/test/spam.xml')

docfrag is now ready for processing using DOM methods.

--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
Use CSS to display XML -
http://www.ibm.com/developerworks/edu/x-dw-x-xmlcss-i.html
Location, Location, Location -
http://www.xml.com/pub/a/2004/11/24/py-xml.html
The State of Python-XML in 2004 -
http://www.xml.com/pub/a/2004/10/13/py-xml.html
Be humble, not imperial (in design) -
http://www.adtmag.com/article.asp?id=10286XMLOpen and more XML Hacks -
http://www.ibm.com/developerworks/xml/library/x-think27.html
A survey of XML standards -
http://www-106.ibm.com/developerworks/xml/library/x-stand4/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,213
Messages
2,571,109
Members
47,701
Latest member
LeoraRober

Latest Threads

Top