Using XML w/ Python...

Jay · Dec 11, 2005

OK, I have this XML doc, i dont know much about XML, but what i want
to do is take certain parts of the XML doc, such as </title> blah
</title> and take just that and put onto a text doc. Then same thing
doe the </body> part. Thats about it, i checked out some of the xml
modules but dont understand how to use them. Dont get parsing, so if
you could please explain working with XML and python to me. Email me at
(e-mail address removed)

Aim- jayjay08balla
MSN- (e-mail address removed)
Yahoo- raeraefad72

Thx

James · Dec 11, 2005

XPath is the least painful way of doing it.

Here are some samples with various libraries for XPath
http://www.oreillynet.com/pub/wlg/6225

Read XPath basics here
http://www.w3schools.com/xpath/default.asp

It is not practical and perhaps not polite to expect people write
tutorials just for you and send by email. There are a lot of tutorials
on the web on this. Just use Google.

Jay · Dec 11, 2005

Yes i know, i did check out a couple but i could never understand it.
They were confusing for me and i wasnt hoping for a full typed
tutorial, just like some help with excactly wat im trying to do, not
the whole module... but watever, Thx alot for the feedbak.

Ivan Herman · Dec 11, 2005

Look at the standard python library reference

http://docs.python.org/lib/dom-example.html

the handleSlide function almost does what you want, except that you should use
'parse' and not 'parseString'.

-------- Original Message --------
From: "Jay" <[email protected]>
To:
Subject: Re:Using XML w/ Python...
Date: 11/12/2005 09:33

Steve Holden · Dec 11, 2005

Jay said:
Yes i know, i did check out a couple but i could never understand it.
They were confusing for me and i wasnt hoping for a full typed
tutorial, just like some help with excactly wat im trying to do, not
the whole module... but watever, Thx alot for the feedbak.

Well I don't want to hold this up as an example of best practice (it was
a quick hack to get some book graphics for my web site), but this
example shows you how you can extract stuff from XML, in this case
returned from Amazon's web services module.

Sorry about any wrapping that mangles the code.

regards
Steve

#!/usr/bin/python
#
# getbooks.py: download book details from Amazon.com
#
# hwBuild: database-driven web content management system
# Copyright (C) 2005 Steve Holden - (e-mail address removed)
#
# This program is free software; you can redistribute it
# and/or modify it under the terms of the GNU General
# Public License as published by the Free Software
# Foundation; either version 2 of the License, or (at
# your option) any later version.
#
# This program is distributed in the hope that it will be
# useful, but WITHOUT ANY WARRANTY; without even the implied
# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
# PURPOSE. See the GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public
# License along with this program; if not, write to the
# Free Software Foundation, Inc., 59 Temple Place, Suite 330,
# Boston, MA 02111-1307 USA
#
import urllib
import urlparse
import os
import re
from xml.parsers import expat
from config import Config
picindir = os.path.join(Config['datadir'], "pybooks")
for f in os.listdir(picindir):
os.unlink(os.path.join(picindir, f))

filpat = re.compile(r"\d+")

class myParser:
def __init__(self):
self.parser = expat.ParserCreate()
self.parser.StartElementHandler = self.start_element
self.parser.EndElementHandler = self.end_element
self.parser.CharacterDataHandler = self.character_data
self.processing = 0
self.count = 0

def parse(self, f):
self.parser.ParseFile(f)
return self.count

def start_element(self, name, attrs):
if name == "MediumImage":
self.processing = 1
self.imgname = ""
if self.processing == 1 and name == "URL":
self.processing = 2

def end_element(self, name):
if self.processing == 2 and name == "URL":
self.processing = 1
print "Getting:", self.imgname
scheme, loc, path, params, query, fragment =
urlparse.urlparse(self.imgname)
itemno = filpat.match(os.path.basename(path))
fnam = itemno.group()
u = urllib.urlopen(self.imgname)
img = u.read()
outfile = file(os.path.join(picindir, "%s.jpg" % fnam), "wb")
outfile.write(img)
outfile.close()
self.count += 1
if self.processing ==1 and name == "MediumImage":
self.processing = 0

def character_data(self, data):
if self.processing == 2:
self.imgname += data

def main(search=None):
print "Search:", search
count = 0
for pageNum in range(1,5):
f =
urllib.urlopen("http://webservices.amazon.com/onca/...oup=Images&type=lite&Version=2004-11-10&f=xml"
% (urllib.quote(search or Config['book-search']), pageNum))
fnam = os.path.join(picindir, "bookdata.txt")
file(fnam, "w").write(f.read())
f = file(fnam, "r")
p = myParser()
n = p.parse(f)
if n == 0:
break
count += n
return count

if __name__ == "__main__":
import sys
search = None
if len(sys.argv) > 1:
search = sys.argv[1]
n = main(search)
print "Pictures found:", n

Dieter Verfaillie · Dec 11, 2005

OK, I have this XML doc, i dont know much about XML, but what i want
to do is take certain parts of the XML doc

the most simple module I've found to do that is xmltramp from
http://www.aaronsw.com/2002/xmltramp/

for example:

#!/usr/bin/env python
import xmltramp
note = xmltramp.load('http://www.w3schools.com/xml/note.xml')
print note.body

uche.ogbuji · Dec 11, 2005

Jay:
"""
K, I have this XML doc, i dont know much about XML, but what i want
to do is take certain parts of the XML doc, such as </title> blah
</title> and take just that and put onto a text doc. Then same thing
doe the </body> part. Thats about it, i checked out some of the xml
modules but dont understand how to use them. Dont get parsing, so if
you could please explain working with XML and python to me.
"""

Someone already mentioned

http://www.oreillynet.com/pub/wlg/6225

I do want to update that Amara API. As of recent releases it's as
simple as

import amara
doc = amara.parse("foo.opml")
for url in doc.xpath("//@xmlUrl"):
print url.value

Besides the XPath option, Amara [1] provides Python API options for
unknown elements, such as

node.xml_child_elements
node.xml_attributes

This is all covered with plenty of examples in the manual [2]

[1] http://uche.ogbuji.net/tech/4suite/amara/
[2] http://uche.ogbuji.net/uche.ogbuji.net/tech/4suite/amara/manual-dev

Jay · Dec 11, 2005

some great suggestions.
Ok, i am now understanding some of parseing and how to use it and
nodes, things like that. But say i wanted to take the title of
http://www.digg.com/rss/index.xml

and XMLTramp seemed the most simple to understand.

would the path be something like this?

import xmltramp
rssDigg = xmltramp.load("http://www.digg.com/rss/index.xml")
print note.rss.channel.item.title

I think thats wat im having the most confusion on now, is how to direct
to the path that i want...
Suggestions?

uche.ogbuji · Dec 12, 2005

"""
Ok, i am now understanding some of parseing and how to use it and
nodes, things like that. But say i wanted to take the title of
http://www.digg.com/rss/index.xml

and XMLTramp seemed the most simple to understand.

would the path be something like this?

import xmltramp
rssDigg = xmltramp.load("http://www.digg.com/rss/index.xml")
print note.rss.channel.item.title

I think thats wat im having the most confusion on now, is how to direct
to the path that i want...
"""

I suggest you read at least the front page information for the tools
you are using. It's quite clear from the xmltramp Web site (
http://www.aaronsw.com/2002/xmltramp/ ) that you want tomething like
(untested: the least homework you can do is to refine the example
yourself):

print rssDigg[rss.channel][item][title]

BTW, in Amara, the API is pretty much exactly what you guessed:
Video: Conan O'Brien iPod Ad Parody

Jay · Dec 12, 2005

Ok, im convinced to that i need to get Amara, I just installed 4Suite
and now installed Amara. Still doesnt work because like i said before,
i use ActivePython from

http://www.activestate.com/Products/ActivePython/

And the requirements for Amara is Python 2.4 so.... Thats where we have
a problem, i need Amara for ActivePython. And i would like to keep
working on ActivePython w/o downloading Python 2.4.

Dennis Lee Bieber · Dec 12, 2005

Ok, im convinced to that i need to get Amara, I just installed 4Suite
and now installed Amara. Still doesnt work because like i said before,
i use ActivePython from

http://www.activestate.com/Products/ActivePython/

And the requirements for Amara is Python 2.4 so.... Thats where we have
a problem, i need Amara for ActivePython. And i would like to keep
working on ActivePython w/o downloading Python 2.4.

Pardon? So far as I recall, the ActiveState release IS built on top
of the plain Python -- basically adding, for Windows, the PythonWin IDE
and associated DLLs. It should be transparent.

You say what you installed doesn't work, but have you shown anyone
/exactly/ what error conditions you are encountering?

Granted, I'm still running the 2.3 version from ActiveState -- but I
actually installed that over a version of 2.3 from a Zope/Plone
installation (I was getting problems between two installs of the same
Python version).
--

Jay · Dec 12, 2005

Ummm, my error conditions.....
PythonWin 2.3.5 (#62, Feb 9 2005, 16:17:08) [MSC v.1200 32 bit
(Intel)] on win32.
Portions Copyright 1994-2004 Mark Hammond ([email protected]) -
see 'Help/About PythonWin' for further copyright information.
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
ImportError: No module named amara

Pretty straight forward....
As far as it should work since their both transparent, umm, well its
not.
But what would be a help would be if u knew the install dir for
ActivePython so maybe i can install amara stand alone into the
ActivePython installation dir. ?? Maybe

Dennis Lee Bieber · Dec 12, 2005

Pretty straight forward....
As far as it should work since their both transparent, umm, well its
not.
But what would be a help would be if u knew the install dir for
ActivePython so maybe i can install amara stand alone into the
ActivePython installation dir. ?? Maybe

My install wouldn't help -- as I overrode the ActiveState default
directory to stuff it under Plone...

However, Python import statements are case sensitive, even though
the Windows file system isn't... Could you perhaps need to change the
case on the import?
Traceback (most recent call last):

--

James · Dec 12, 2005

ActivePython is same as Standard Python distribution but with a few
extras.

"As far as it should work since their both transparent, umm, well its
not."

Why do you think it is not transparent? Did you try installing it on
both?
I have ActivePython 2.4 here and it loads amara fine.

"
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
ImportError: No module named amara
"

That means you did not manage to install it properly. Are you new to
installing Python modules from command line? If you need more hand
holding, try the Python IRC channel on freenode. The responses will be
more in real time. You probably need that since you seem to have more
than one thing to learn about.

Jay · Dec 12, 2005

No, when i said
"As far as it should work since their both transparent, umm, well its
not."

I meant that only mine isnt, maybe urs is but for some reason it isnt.
And you said amara works fine for you, ok, then could you tell me what
package to install...

I have installed Amara 1.1.6 for Python 2.4 and it works on python 2.4
only.
Now, which package should i download for it to work on any python
prompt:
Allinone
Standalone
Or something else

uche.ogbuji · Dec 12, 2005

"""
No, when i said
"As far as it should work since their both transparent, umm, well its
not."

I meant that only mine isnt, maybe urs is but for some reason it isnt.
And you said amara works fine for you, ok, then could you tell me what
package to install...

I have installed Amara 1.1.6 for Python 2.4 and it works on python 2.4
only.
Now, which package should i download for it to work on any python
prompt:
Allinone
Standalone
Or something else
"""

I've never used ActivePython. I don't know of any special gotchas for
it. But Amara works in Python 2.3 or 2.4. The only differences
between the Allinone and standalone packages is that Allinone includes
4Suite. Do get at least version 1.1.6.

If you're still having trouble with the ActivePython setup, the first
thing I'd ask is how you installed Amara. DId you run a WIndows
installer? Next I'd check the library path for ActivePython. What is
the output of

python -c "import sys; print sys.path"

Where you replace "python" abpve with whatever way you invoke
ActivePython.

Jay · Dec 12, 2005

hmmmm, i just tryed the same thing earlier today and it didnt work, but
now it does, i downloaded the standalone package and now it works in
activepython when it didnt before and i tryed the same thing.

And yes, last time i did type python setup.py install.

Thx anyway.

Jay · Dec 12, 2005

Spoke too soon, i get this error when running amara in ActivePython
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "C:\Python23\Lib\site-packages\amara\__init__.py", line 50, in
parse
if IsXml(source):
NameError: global name 'IsXml' is not defined

So im guessing theres an error with one of the files...

uche.ogbuji · Dec 12, 2005

"""
Spoke too soon, i get this error when running amara in ActivePython

Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "C:\Python23\Lib\site-packages\amara\__init__.py", line 50, in
parse
if IsXml(source):
NameError: global name 'IsXml' is not defined

So im guessing theres an error with one of the files...
"""

IsXml is imported conditionally, so this is an indicator that somethign
about your module setup is still not agreeing with ActivePython. What
do you see as the output of:

python -c "import amara; print dir(amara)"

? I get:

['InputSource', 'IsXml', 'Uri', 'Uuid', '__builtins__', '__doc__',
'__file__', '__name__', '__path__', '__version__', 'bindery',
'binderytools', 'binderyxpath', 'create_document', 'dateutil_standins',
'domtools', 'os', 'parse', 'pushbind', 'pushdom', 'pyxml_standins',
'saxtools']

Jay · Dec 12, 2005

when putting excactly what you got, i gotTraceback ( File "<interactive input>", line 1
python -c "import amara; print dir(amara)"
^
SyntaxError: invalid syntax

when doing it seperately, i got>
['__builtins__', '__doc__', '__file__', '__name__', '__path__',
'__version__', 'binderytools', 'os', 'parse']

python profiling for a XML parser program	4	Sep 19, 2009
HOWTO: Parsing email using Python part2	1	Jul 15, 2011
Problem with a login script, SESSION user rights and put this together so it works with the other pages and MySQL. Code examples.	2	May 5, 2023
how to avoid spaghetti in Python?	2	Jan 21, 2014
XML -> Tab-delimited text file (using lxml)	2	Nov 19, 2008
Web Services examples using "raw" xml?	8	Aug 24, 2009
Python -Vs- Ruby: A regexp match to the death!	13	Aug 9, 2010
Partly erratic wrong behaviour, Python 3, lxml	5	Mar 4, 2010

Using XML w/ Python...

Jay

James

Jay

Ivan Herman

Steve Holden

Dieter Verfaillie

uche.ogbuji

Jay

uche.ogbuji

Jay

Dennis Lee Bieber

Jay

Dennis Lee Bieber

James

Jay

uche.ogbuji

Jay

Jay

uche.ogbuji

Jay

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads