DOM implementation

Emanuele D'Arrigo · May 13, 2009

Hi everybody,

I just spent the past hour or so trying to have a better understanding
of how the various DOM-supporting libraries (xml.dom, xml.dom.minidom)
work. I've used etree and lxml successfully before but I wanted to
understand how close I can get to the W3C DOM standards. Ok, I think
more or less I got it all. A few questions emerged:

1) classes in xml.dom.minidom (i.e. Element) seem to be old style
classes. Is there a good reason they are kept that way or simply
nobody had the time/will to update the library to use new-style
classes?

2) for a lightweight implementation xml.dom.minidom comes with a lot
of methods that aren't part of the W3C standards. I'm referring to
toxml, toprettyxml, writxml and the _get_* family. Would it be better
if there was a package offering W3C-faithful classes only, on top of
which convenience and compatibility methods are added by another
package (or two!) through subclassing?

Manu

Paul Boddie · May 13, 2009

I just spent the past hour or so trying to have a better understanding
of how the various DOM-supporting libraries (xml.dom, xml.dom.minidom)
work. I've used etree and lxml successfully before but I wanted to
understand how close I can get to the W3C DOM standards.

You might want to look at pxdom if you want a high level of compliance
with W3C DOM standards:

http://www.doxdesk.com/software/py/pxdom.html

Ok, I think more or less I got it all. A few questions emerged:

1) classes in xml.dom.minidom (i.e. Element) seem to be old style
classes. Is there a good reason they are kept that way or simply
nobody had the time/will to update the library to use new-style
classes?

I imagine that no-one bothered to update the code. The built-in
modules like minidom do get maintenance, but not much further
development. (PyXML, which seemed to accumulate code from 4Suite,
possibly contributed code to the standard library, but it doesn't seem
to be actively maintained or developed any more.)

2) for a lightweight implementation xml.dom.minidom comes with a lot
of methods that aren't part of the W3C standards. I'm referring to
toxml, toprettyxml, writxml and the _get_* family. Would it be better
if there was a package offering W3C-faithful classes only, on top of
which convenience and compatibility methods are added by another
package (or two!) through subclassing?

Those methods probably don't add that much weight, considering the
weight that the W3C facilities already necessitate. I attempted to
make a somewhat W3C-compliant implementation with the libxml2dom
package (http://pypi.python.org/pypi/libxml2dom), although I felt that
providing PyXML-like conveniences (similar to those you describe) was
beneficial: some of the W3C APIs for parsing and serialisation are
baroque, and although I've tried to implement some of those, too, I
feel that it isn't a good use of my time.

Paul

Emanuele D'Arrigo · May 14, 2009

Thank you Paul for your reply!

I'm looking into pxdom right now and it looks very good and useful!

Thank you again!

Manu

Emanuele D'Arrigo · May 15, 2009

Hey Paul,

would you mind continuing this thread on Python + DOM? I'm trying to
implement a DOM Events-like set of classes and I could use another
brain that has some familiarity with the DOM to bounce ideas with. If
you are too busy never mind. Also, I thought of keeping the discussion
here rather than via email, for the benefit of current and future
readers.

Manu

Paul Boddie · May 15, 2009

Hey Paul,

would you mind continuing this thread on Python + DOM? I'm trying to
implement a DOM Events-like set of classes and I could use another
brain that has some familiarity with the DOM to bounce ideas with. If
you are too busy never mind. Also, I thought of keeping the discussion
here rather than via email, for the benefit of current and future
readers.

Sure! Just keep your observations coming! I've made a very lazy
attempt at DOM Events support in libxml2dom, since it looked as if it
might be necessary when providing elementary SVG Tiny support (which
also isn't finished), although I find these things quite hard to
figure out with the usual vagueness of the specifications on certain
crucial implementation-related details (and that there's a mountain of
specifications that one has to navigate).

One of my tests tries to exercise the code, but I might be doing it
all completely wrong:

https://hg.boddie.org.uk/libxml2dom/file/91c0764ac7c6/tests/svg_events.py

It occurs to me that various PyQt- and PyKDE-related bindings might
also provide some exposure to DOM Events, although I had heard that
WebKit, which should have support for lots of DOM features, exposes
some pretty useless interfaces to languages like Python, currently.
The situation with Mozilla and PyXPCOM may well be similar.

Paul

Emanuele D'Arrigo · May 15, 2009

Hi Paul, thank you for your swift reply!

Sure! Just keep your observations coming! I've made a very lazy
attempt at DOM Events support in libxml2dom,

I just had a look at libxml2dom, in particular its events.py file.
Given that we are working from a standard your implementation is
exceedingly similar to mine and had I know before I started writing my
own classes I would have started from it instead! =)
Browsing through the code, the EventTarget class docstring reads:

The listeners for a node are accessed through the global object.
This common
collection is consequently accessed by all nodes in a document,
meaning that
distinct objects representing the same node can still obtain the
set of
listeners registered for that node. In contrast, any attempt to
directly
store listeners on particular objects would result in the specific
object
which registered the listeners holding the record of such objects,
whereas
other objects obtained independently for the same node would hold
no such
record.

Naively, I implemented my EventTarget class storing its own listeners
rather than global ones. Nevertheless, I'm not quite understanding
this issue. Why shouldn't the listeners be stored directly on the
EventTarget? I have a glimpse of understanding that if the
DOMImplementation keeps EventTarget and Nodes (or Elements? which
entity is supposed to support Events?) separate this might be
necessary. But beside the fact that it's just a fuzzy and potentially
incorrect intuition, I seem to think that the appropriate way to
proceed would be for the DOMImplementation to provide a Node class
that also inherits from EventTarget. In so doing the listeners would
be immediately accessible as soon as one has a handle to a Node.

Furthermore, your code finds the bubbling route with the line:

bubble_route = target.xpath("ancestor::*")

That xpath method is a libxml method right?

(...) although I find these things quite hard to
figure out with the usual vagueness of the specifications on certain
crucial implementation-related details (and that there's a mountain of
specifications that one has to navigate).

Indeed there is some vagueness in the W3C recommendations and the
various documents offer very little redundancy with each other but
require you to be knowledgeable about them all! I'm managing to piece
together the pieces of the puzzle only after a couple of day having an
in-depth read-through of DOM, DOM Events and a little bit of XML
events to see how it all works in practice. XML events is also what's
prompting me to think that Node/Elements classes of the implementation
should also inherit from EventTarget as they can all be event
targets.

One of my tests tries to exercise the code, but I might be doing it
all completely wrong:

https://hg.boddie.org.uk/libxml2dom/file/91c0764ac7c6/tests/svg_event...

Before I can comment I'd like to better understand what you are aiming
for with libxml2dom. It seems to be providing some kind of conversion
services from the xml structure generated by libxml to a dom-like
structure (implemented by pxdom?).
Is that correct?

It occurs to me that various PyQt- and PyKDE-related bindings might
also provide some exposure to DOM Events, although I had heard that
WebKit, which should have support for lots of DOM features, exposes
some pretty useless interfaces to languages like Python, currently.
The situation with Mozilla and PyXPCOM may well be similar.

PyKDE is off-limits because it's unix only while I'm trying to be
cross-platform. PyQT is interesting. Very. Further investigation is
required. =)

Manu

Paul Boddie · May 15, 2009

I just had a look at libxml2dom, in particular its events.py file.
Given that we are working from a standard your implementation is
exceedingly similar to mine and had I know before I started writing my
own classes I would have started from it instead! =)

Another implementation is probably a good thing, though, since I don't
trust my own interpretation of the specifications. ;-)

Browsing through the code, the EventTarget class docstring reads:

[Long docstring cut]

Naively, I implemented my EventTarget class storing its own listeners
rather than global ones. Nevertheless, I'm not quite understanding
this issue. Why shouldn't the listeners be stored directly on the
EventTarget?

One reason for this might well be due to the behaviour of libxml2 and
libxml2dom: if I visit the same node in a document twice, obtaining a
node instance each time, these two instances will be different;
therefore, storing listeners on such instances is not very helpful
because the expectation that you will automatically see previously
added listeners on a node will not generally be fulfilled. With pxdom,
it may be a different situation, but libxml2dom is constrained by the
behaviour of libxml2: I don't attempt to check node equivalence and
then expose the structures representing a single node using a single
object; I generally try and instantiate as few Python objects,
wrapping libxml2 structures, as I can.

I have a glimpse of understanding that if the
DOMImplementation keeps EventTarget and Nodes (or Elements? which
entity is supposed to support Events?) separate this might be
necessary. But beside the fact that it's just a fuzzy and potentially
incorrect intuition, I seem to think that the appropriate way to
proceed would be for the DOMImplementation to provide a Node class
that also inherits from EventTarget. In so doing the listeners would
be immediately accessible as soon as one has a handle to a Node.

The libxml2dom.svg module has classes which inherit from EventTarget.
What I've tried to do is to make submodules to address particular
formats and document models.

Furthermore, your code finds the bubbling route with the line:

bubble_route = target.xpath("ancestor::*")

That xpath method is a libxml method right?

I use libxml2's XPath support exposed via libxml2dom.Node.

Indeed there is some vagueness in the W3C recommendations and the
various documents offer very little redundancy with each other but
require you to be knowledgeable about them all! I'm managing to piece
together the pieces of the puzzle only after a couple of day having an
in-depth read-through of DOM, DOM Events and a little bit of XML
events to see how it all works in practice. XML events is also what's
prompting me to think that Node/Elements classes of the implementation
should also inherit from EventTarget as they can all be event
targets.

I think that if I were to expose an event-capable DOM, other than that
provided for SVG, I would just have a specific submodule for that
purpose.

Before I can comment I'd like to better understand what you are aiming
for with libxml2dom. It seems to be providing some kind of conversion
services from the xml structure generated by libxml to a dom-like
structure (implemented by pxdom?).
Is that correct?

Yes. The aim is to provide a PyXML DOM API on top of libxml2
documents.

Paul

Emanuele D'Arrigo · May 19, 2009

Hello Paul, sorry for the long delay, I was trying to wrap my mind
around DOM and Events implementations...

Another implementation is probably a good thing, though, since I don't
trust my own interpretation of the specifications. ;-)

Tell me about it. In general I like the work the W3C is doing, but
some things could use a little less freedom and a little more clarity.
=) But then again, maybe it's for the best to leave things as they are
so that we can figure it out for ourselves.

One reason for this might well be due to the behaviour of libxml2 and
libxml2dom: if I visit the same node in a document twice, obtaining a
node instance each time, these two instances will be different;

Mmmm.... I don't know the specifics of libxml... are you saying that
once the object tree is created out of an XML file, requesting twice
the same node object -does not- result in a pointer to the same
instance in memory? How's that possible?

The libxml2dom.svg module has classes which inherit from EventTarget.

And what does the EventTarget inherit from? Or are those classes
inheriting
from both Nodes and EventTargets?

What I've tried to do is to make submodules to address particular
formats and document models.

I think the issue to consider there is that the DOM does not restrict
a document from being a mush-up of multiple formats. I.e. it should be
possible to have XHTML and SVG tags in the same document. As long as
those modules work at element/tag level and do not obstruct each other
I think you are on the right track!

I think that if I were to expose an event-capable DOM, other than that
provided for SVG, I would just have a specific submodule for that
purpose.

Ultimately I found it moderately easier to modify pxdom with the
intention of releasing "pxdome", a fork of pxdom. Monkey-patching
pxdom seemed to be a little too tricky and prone to error to create a
separate module.

I had a more in-depth look after having spent the weekend trying to
wrap my head around all sorts of implementation issues.

My understanding, also after a few exchanges in the (e-mail address removed)
mailing-list, is that initialization of an event can happen wherever
you feel like doing it, except in Document.createEvent(). I.e. it
could be a method on the event itself or an external function. In your
code however, I believe the initialization method should be
initMouseEventNS() rather then initEventNS() and the namespace for DOM
3 Events should be -None-. Between the two implementations the first
one seems to be more aligned with the DOM documentation.

The way I'm doing it is that I invoke Document.createEvent(eventType),
I initialize the resulting event in part manually and in part with
type-related default settings and I finally use
Document.pxdomTriggerEvent(event) to create a propagation path and
iterate through its targets. I.e.:

def _trigger_DOMSubtreeModified(target):

relevantTargetTypes = (Node.DOCUMENT_NODE,
Node.DOCUMENT_FRAGMENT_NODE,
Node.ELEMENT_NODE, Node.ATTRIBUTE_NODE)

if target.nodeType not in relevantTargetTypes:
return

if target.ownerDocument:
event = target.ownerDocument.createEvent("MutationEvent")

event._target = target

target.ownerDocument.pxdomEventDefaultInitNS(None,
"DOMSubtreeModified", event)
target.ownerDocument.pxdomTriggerEvent(event)

Notice that I'm currently keeping this function as a loose function
but it could very well be placed as a method in the Document class or
in each relevant classes. I'm not sure why one option would be better
than all others and the DOM doesn't specify it.

The dispatch of the event to each target on the propagation path is
also a matter of implementation. In the discussion in www-dom three
options have emerged: 1) the Document node establishes the propagation
path and iterates through the targets listed to dispatch the event to
each 2) an unspecified, external object does the same job 3) the
propagation path is established, stored on the event and each event
target is responsible for recursively dispatching the event to the
next target if propagation hasn't been stopped. Apparently an earlier
version of Mozilla's Gecko used option 3 but they eventually switched
to option 1. Again, it's unclear in what circumstances to use one
option or the other.

What I don't know at this time is how to merge all this with the
specific file formats such as SVG and HTML. I.e. in an SVG example, do
I create a GroupElement(Element) class and I override the
Document.createElement() method to create an instance of it any time a
<g> element is found in the input file? Or do I first create an
application-neutral DOM tree out of the input file and I then
instantiate a parallel application-specific structure, holding the
objects that provide methods to actually draw and group shapes? If I
get an answer from www-dom I'll report it here...

Manu

xml : remove a node with dom	3	Oct 28, 2010
DOM Level 3 implementation in REBOL: evaluators sought	0	Mar 12, 2008
lisp is winner in DOM parsing contest! 8-]	19	Jul 12, 2004
Is it possible an iframe can overlapp another?	3	Apr 20, 2022
Interface inheritance vs Implementation inheritance.	27	Feb 19, 2008
how to debug python application crashed occasionally	2	Apr 21, 2010
lists as an efficient implementation of large two-dimensionalarrays(!)	0	Feb 2, 2010
Conceptual flaw in pxdom?	10	May 17, 2009

DOM implementation

Emanuele D'Arrigo

Paul Boddie

Emanuele D'Arrigo

Emanuele D'Arrigo

Paul Boddie

Emanuele D'Arrigo

Paul Boddie

Emanuele D'Arrigo

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads