What XML lib to use?

K

Kalle Anke

I'm confused, I want to read/write XML files but I don't really understand
what library to use.

I've used DOM-based libraries in other languages, is PyXML the library to
use?
 
R

Ramdas

You can try xml.dom and xml.sax. Both are inbuilt libraries with Python
standard package. You can read and write xml files with these very
easily. There are number of third party modules for Python that
manipulate XML. But the above are the basic ones.
 
R

Ramdas

You can try xml.dom and xml.sax. Both are inbuilt libraries with Python
standard package. You can read and write xml files with these very
easily. There are number of third party modules for Python that
manipulate XML. But the above are the basic ones.
 
M

mekstran

I'm confused, I want to read/write XML files but I don't really
understand what library to use.

I've used DOM-based libraries in other languages, is PyXML the
library to use?

PyXML will do the job. I'm currently using it in one of my projects.
4suite has their cDomlette also, which provides a high-speed
lightweight DOM implementation. Although, if you don't need
canonicalization, XPath, or those kinds of extensions, you can probably
get by with minidom and the other code included in the standard Python
distribution, and avoid the need to install additional libraries.

I have also heard excellent things about ElementTree; I haven't used it
myself though (largely because I can't find any resources on doing XML
canonicalization with it).

-Michael
 
M

Michael Hoffman

I have also heard excellent things about ElementTree; I haven't used it
myself though (largely because I can't find any resources on doing XML
canonicalization with it).

ElementTree/cElementTree is really easy to use and Pythonic.
 
U

uche.ogbuji

"""
I'm confused, I want to read/write XML files but I don't really
understand
what library to use.

I've used DOM-based libraries in other languages, is PyXML the library
to
use?
"""

There are many options (some say too many):

http://www.xml.com/pub/a/2004/10/13/py-xml.html

Try out Amara Bindery, if you like:

http://uche.ogbuji.net/tech/4suite/amara/

Browsing the manual should let you know whether you like the API:

http://uche.ogbuji.net/tech/4suite/amara/manual

BTW, lots on Python/XML processing covered in my column, including
other options besides Amara:

http://www.xml.com/pub/at/24
 
R

Robert Kern

I have also heard excellent things about ElementTree; I haven't used it
myself though (largely because I can't find any resources on doing XML
canonicalization with it).

You can use lxml which is an implementation of the ElementTree API using
libxml2 and libxslt under the covers for greater standards compliance
including c14n. I've been using extensively recently and highly
recommend it.

http://codespeak.net/lxml

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
 
E

Edvard Majakari

Kalle Anke said:
I'm confused, I want to read/write XML files but I don't really understand
what library to use.

I've used DOM-based libraries in other languages, is PyXML the library to
use?

It depends. Like there's no best car - "best" is very dependant on use of the
vehicle concerned in addition to personal preferences - there's no best XML
module either. Some seem very well in many respects, though :)

I recommend using EffBot's ElementTree. It's very simple to use (you get to do
stuff without thinking delicacies of parsing/generating), and it is
_fast_. Now let me repeat the last part - normally speed is of no concern
with the computers we have nowadays, but using eg. xml.minidom to process
files of size > 10 MB, your system might get very sluggish unless you are
quite careful in traversing the parse tree (and maybe even then).

Using a SAX / full-compliant DOM parser could be good for learning things,
though. As I said, depends a lot.

--
# Edvard Majakari Software Engineer
# PGP PUBLIC KEY available Soli Deo Gloria!

$_ = '456476617264204d616a616b6172692c20612043687269737469616e20'; print
join('',map{chr hex}(split/(\w{2})/)),uc substr(crypt(60281449,'es'),2,4),"\n";
 
P

Paul Boddie

Kalle said:
I've used DOM-based libraries in other languages, is PyXML the library to
use?

I would start off with minidom; a tutorial I once wrote can be found
here:

http://www.boddie.org.uk/python/XML_intro.html

That should demonstrate some minor differences between PyXML-style DOMs
and those for languages like Java. Should you need a faster DOM
implementation, you might want to look at libxml2dom:

http://www.boddie.org.uk/python/libxml2dom.html

It's a pure Python module that uses the lower levels of libxml2's own
Python bindings, so if you already have libxml2 plus bindings
installed, it should be very convenient. Although libxml2dom isn't by
any means complete, I do use it myself and would welcome any feedback
which would make it better.

Paul
 
P

paron

One more vote for Amara! I think it's unmatched for ease of use, if you
already know Python.

Ron
 
F

Fredrik Lundh

Edvard said:
Using a SAX / full-compliant DOM parser could be good for learning things,
though. As I said, depends a lot.

since there are no *sane* reasons to use SAX or DOM in Python, that's mainly
a job security issue...

</F>
 
P

Paul Boddie

Fredrik said:
since there are no *sane* reasons to use SAX or DOM in Python, that's mainly
a job security issue...

While I doubt that anyone would really recommend exclusive DOM API
usage for significant XML processing tasks (or for anything other than
educational purposes), I think you're overstating some case or other
here. Interoperability is a pretty sane argument for using DOM-based
technologies, whether that be skills interoperability (possibly related
to job security) or just using many different technologies together.
For example, PyQt and PyKDE expose various DOMs of the purest
"non-Pythonic" kind; Mozilla exposes DOMs for XML and HTML; adding a
layer of PyXML varnish to any of these isn't a huge job. Using
different technologies with the same foundations shouldn't have to
involve breaking open yet another API for the "fun" of it.

Paul
 
F

Fredrik Lundh

Paul said:
For example, PyQt and PyKDE expose various DOMs of the purest
"non-Pythonic" kind; Mozilla exposes DOMs for XML and HTML

I didn't see anything about manipulating an application's internal
data structures in the original post, but I might have missed some-
thing.

For stand-alone XML manipulation in Python, my point still stands:
programs using DOM and SAX are more bloated and slower than
the alternatives.

</F>
 
M

Magnus Lycka

Fredrik said:
since there are no *sane* reasons to use SAX or DOM in Python, that's mainly
a job security issue...

I can see two reasons (sane or not):
- You're familiar with those APIs and use them in e.g. C++.
- You don't want to rely on third party libraries unless you must.

In many cases, xml.dom.minidom etc will do fine...

Having said this, I must admit that I much prefer Fredrik's ElementTree.

Although Fredrik (did you know that?) had a part in Carmen's move to
Python around five years ago, ElementTree isn't installed by default
on our machines, and I think our CM people are happier if we use as
few third party libraries as possible...
 
P

Paul Boddie

Fredrik said:
Paul Boddie wrote:

[On interoperability]
I didn't see anything about manipulating an application's internal
data structures in the original post, but I might have missed some-
thing.

Well, manipulating documents in Mozilla and KHTML are just examples, as
I pointed out, and whilst I'd agree that the in-process restrictions
Mozilla appears to place on full participants in its component system
does kind of mean that the Mozilla DOM is an "application's internal
data structure", the opportunities are more open for KHTML in that one
isn't limited to just automating some application. Moreover, the XML
APIs exposed by PyQt are also available for general XML processing,
whether you regard them as performant or not.
For stand-alone XML manipulation in Python, my point still stands:
programs using DOM and SAX are more bloated and slower than
the alternatives.

Your point was that "there are no *sane* reasons to use SAX or DOM in
Python", which actually isn't true. Sure, processing a 100GB XML
document using the DOM isn't a sensible strategy (with this generation
of hardware!), and SAX isn't necessarily the most elegant way of
expressing the processing logic, and other tools and APIs exist to
perform such tasks more efficiently and elegantly, but then "to
read/write XML files" leaves the questioner's field of endeavour pretty
much open to interpretation. Somewhere amongst the many fields of
endeavour there are places where the DOM (whilst not as "Pythonic" as
some might like) certainly is a valid choice, possibly because it's the
only choice - all thanks to interoperability, as I said. ;-)

Paul
 
F

Fredrik Lundh

Paul said:
Your point was that "there are no *sane* reasons to use SAX or DOM in
Python", which actually isn't true.

I replied in the context of this thread. If you chose to ignore the
context, or if you're not capable of reading and understanding the
posts you're replying to, that's your problem.

</F>
 
R

Robert Kern

Fredrik said:
I replied in the context of this thread. If you chose to ignore the
context, or if you're not capable of reading and understanding the
posts you're replying to, that's your problem.

His interpretation of your words is a perfectly valid one even in the
context of this thread. "in Python" explicitly provides a context for
the rest of the sentence. In English, at least, it is perfectly
reasonable to presume that explicit contexts override implicit ones.

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
 
F

Fredrik Lundh

Robert said:
His interpretation of your words is a perfectly valid one even in the
context of this thread. "in Python" explicitly provides a context for
the rest of the sentence.

Exactly. "in Python", not "in an application with an existing API".

(also, if the OP had been forced to use an existing API by external
constraints, don't you think he would have mentioned it?)
In English, at least, it is perfectly reasonable to presume that explicit
contexts override implicit ones.

Letting a part of a sentence override the context of the discussion is
perhaps popular in certain tabloid journalist circles, and among slash-
dot editors and US political bloggers, but most people do in fact have
a context buffer memory that can hold more than a few words. (how
come you're so sure I wasn't talking about, say, the Python Lisp com-
piler? or the Monty Python sketch with the sadistic Belgian instrument-
making monk? or a Harry Potter book?)

I know what I meant. You know what I meant. Paul knows what I
meant. If you still want to play the "but there is a way to interpret
this in another way" game, file a bug report against the python.org
"what is python?" summary page.

</F>
 
P

Paul Boddie

Fredrik said:
Exactly. "in Python", not "in an application with an existing API".

Well, if you're still not convinced that DOMs exist outside monolithic
applications... ;-)
Letting a part of a sentence override the context of the discussion is
perhaps popular in certain tabloid journalist circles, and among slash-
dot editors and US political bloggers, but most people do in fact have
a context buffer memory that can hold more than a few words.

I don't really see how an absolutely-qualified complete sentence...

Q> since there are no *sane* reasons to use SAX or DOM in Python,
Q> that's mainly a job security issue...

....can somehow be qualified by the preceding discussion, when the only
ambiguous context is "that" == "good for learning things". It's an
absolute statement of opinion! (And yes, I think we all agree on what
"Python" is.)
I know what I meant. You know what I meant. Paul knows what I
meant.

I actually do know what you mean, but that doesn't mean that the
statement in question wasn't misleading, especially to people who
aren't familiar with or accustomed to discovering this missing context.
It's like saying "there are no *sane* reasons to drive a Volvo",
possibly in a follow-up to a discussion about how bad Volvos are
compared to Saabs. There may well be a sane reason to drive a Volvo,
but the statement doesn't allow for the possibility, unless in the hunt
for the missing context you're willing to take the term "significant
whitespace" to a whole new level.

Paul
 
G

Giovanni Bajo

Fredrik said:
since there are no *sane* reasons to use SAX or DOM in Python, that's
mainly a job security issue...


One sane reason is that ElementTree is not part of the standard library. There
are cases where you write a simple python script of 400 lines and you want it
to stay single-file. While ElementTree is very easy to distribute (for basic
features it's just a single file), it still won't fit some scenarios.

So, why did it not make it to the standard library yet, given that it's so much
better than the alternatives?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,264
Messages
2,571,320
Members
48,004
Latest member
KelseyFors

Latest Threads

Top