DOM text

R

Richard Lewis

Hello Pythoners,

I'm currently writing some Python to manipulate a semi-structured XML
document. I'm using DOM (minidom) and I've got working code for
transforming the document to HTML files and for adding the 'structured'
elements which populate the higher regions of the tree (i.e. near the
root).

What I have to do next is write some code for working with the 'less
structured' elements towards the 'leaf ends' of the tree. These are
rather like little sub-documents and contain a mixture of text with
inline formatting (for links, font styles, headings, paragraphs etc.)
and objects (images, media files etc.).

I admit I haven't tried very much code yet, but I'm not sure how I'm
going to handle situations like: the user wants to insert a link in the
middle of a paragraph. How can I use the DOM to insert a node into the
middle of some text? Am I right in thinking that the DOM will reference
a whole text node but nothing smaller?

Any thoughts or suggestions would be very welcome!

Cheers,
Richard
 
D

Diez B. Roggisch

Richard said:
I admit I haven't tried very much code yet, but I'm not sure how I'm
going to handle situations like: the user wants to insert a link in the
middle of a paragraph. How can I use the DOM to insert a node into the
middle of some text? Am I right in thinking that the DOM will reference
a whole text node but nothing smaller?

You have to split the text-node, and add the two resulting noedes
together with the new link-node (or whatever node you want there, can be
a whole tree) in the correct order to the parent of the two node. If
unsure what that means, create two simple documents and parse these to
dom to see how that works.

Diez
 
R

Richard Lewis

You have to split the text-node, and add the two resulting noedes
together with the new link-node (or whatever node you want there, can be
a whole tree) in the correct order to the parent of the two node. If
unsure what that means, create two simple documents and parse these to
dom to see how that works.
Thanks. I was kind of worried it might be like that!

I'm implementing a Cursor class now which keeps track of the current
parent Element, text node and character position so that I can easily (I
hope ;-) work out where the splitting and inserting needs to occur. Wish
me luck!!

Cheers,
Richard
 
R

Richard Lewis

I'm implementing a Cursor class now which keeps track of the current
parent Element, text node and character position so that I can easily (I
hope ;-) work out where the splitting and inserting needs to occur. Wish
me luck!!
Sorry to revive this thread, but there's something else thats causing me
confusion now!

My cursor class is going quite well and I can insert text and element
nodes. It also has methods to 'move' the 'cursor' forward and backward
by a node at a time. It keeps the current_node in an instance variable
which is initially assigned an element from a DOM tree instance created
elsewhere.

The problem I've come up against is when I use the next_node() method,
and the current_node is a (leaf) Text node, the nextSibling property of
current_node is None, where I know (from the document structure) that it
shouldn't be. To make matters more confusing, if I manually create an
instance of my DOM tree (interactively) and check the nextSibling of the
same Text node, it is the correct value (another Element node) while the
nextSibling property of the SectionCursor instance's current_node
property (referring to the same node) is None. I *think* it only applies
to leaf Text nodes.

Here is the *complete* code for my SectionCursor class:
(note that 'sections' are large(ish) document fragments from the main
document)
==========================================
class SectionCursor:
def __init__(self, section_element):
"""Create a SectionCursor instance using the 'section_element' as
the parent element."""
self.section_element = section_element
self.current_node = self.section_element.firstChild
self.char_pos = 0

def forward(self, skip=1):
"""Move the cursor forward 'skip' character positions."""
if self.current_node.nodeType == Node.TEXT_NODE:
self.char_pos += skip
if self.char_pos > len(self.current_node.data):
self.next_node()
else: self.next_node()

def backward(self, skip=1):
"""Move the cursor backward 'skip' character positions."""
if self.current_node.nodeType == Node.TEXT_NODE:
self.char_pos -= skip
if self.char_pos < 0:
self.previous_node()
else: self.previous_node()

def next_node(self):
"""Move the cursor to the next node; either the first child or next
sibling."""
if self.current_node.hasChildNodes():
self.current_node = self.current_node.firstChild
elif self.current_node.nextSibling is not None:
self.current_node = self.current_node.nextSibling
else: return False
self.char_pos = 0
return True

def previous_node(self):
"""Move the cursor to the previous node; either the previous sibling
or the parent."""
if self.current_node.previousSibling is not None:
self.current_node = self.current_node.previousSibling
elif self.current_node.parentNode != self.section_element:
self.current_node = self.current_node.parentNode
else: return False
if self.current_node.nodeType == Node.TEXT_NODE:
self.char_pos = len(self.current_node.data) - 1
else:
self.char_pos = 0
return True

def jump_to(self, node, char_pos=0):
"""Jump to a node and character position."""
self.current_node = node
self.char_pos = char_pos

def insert_node(self, ref_doc, new_node):
"""Insert a node (new_node); ref_doc is an instance of the Document
class."""
if self.current_node.nodeType == Node.TEXT_NODE:
parent_node = self.current_node.parentNode
text_node = self.current_node
next_node = text_node.nextSibling

preceeding_portion =
ref_doc.createTextNode(text_node.data[:self.char_pos])
proceeding_portion =
ref_doc.createTextNode(text_node.data[self.char_pos:])

parent_node.replaceChild(preceeding_portion, text_node)
parent_node.insertBefore(new_node, next_node)
parent_node.insertBefore(proceeding_portion, next_node)
# where is the cursor?
else:
parent_node = self.current_node.parent_element
parent_node.insertBefore(new_node, self.current_node)
# where is the cursor?

def append_child_node(self, ref_doc, new_node):
pass

def insert_element(self, ref_doc, tag_name, attrs=None):
"""Insert an element called tag_name and with the attributes in the
attrs dictionary; ref_doc is an instance of the Document class."""
new_element = ref_doc.createElement(tag_name)
if attrs is not None:
for name, value in attrs.items():
new_element.setAttribute(name, value)
self.insert_node(ref_doc, new_element)

def insert_text(self, ref_doc, text):
"""Insert the text in 'text'; ref_doc is an instance of the Document
class."""
new_text = ref_doc.createTextNode(text)
self.insert_node(ref_doc, new_text)

def remove_node(self):
"""Remove the current node."""
condemned_node = self.current_node
if not self.next_node():
self.previous_node()
parent_node = condemned_node.parentNode
old_child = parent_node.removeChild(condemned_node)
old_child.unlink()

def remove_text(self, ref_doc, count=None):
"""Remove count (or all) characters from the current cursor
position."""
if self.current_node.nodeType != Node.TEXT_NODE:
return False

text = self.current_node.data
new_text = text[:self.char_pos]
if count is not None:
new_text += text[self.char_pos + count:]

new_text_node = ref_doc.createTextNode(new_text)
parent_node = self.current_node.parentNode
self.current_node = parent_node.replaceChild(new_text_node,
self.current_node)
#self.char_pos = 0
==========================================

I've noticed that when you print any minidom node (except a Text node)
it shows the node's memory address. But it doesn't do this with Text
nodes. Does anyone know why this is? If I assign a Text node from one
DOM tree to a variable, I don't get a copy do I? I hope I just get
another reference to the original node.

Cheers,
Richard
 
R

Richard Lewis

Here is the *complete* code for my SectionCursor class:

In case anyone's interested, I've just noticed a logical error in the
next_node() method:
=================================
def next_node(self):
if self.current_node.hasChildNodes():
self.current_node = self.current_node.firstChild
elif self.current_node.nextSibling is not None:
self.current_node = self.current_node.nextSibling
else:
while self.current_node.parentNode.nextSibling is None\
and self.current_node != self.section_element:
self.current_node = self.current_node.parentNode
if self.current_node != self.section_element:
self.current_node = self.current_node.parentNode.nextSibling
else: return False
self.char_pos = 0
return True
=================================

which doesn't solve the original problem. Though I think it may be
causing a (related) problem: it says the self.current_node.parentNode is
of NoneType. If there is a problem with assigning parts of an existing
DOM tree to other variables, might this be another symptom?

Cheers,
Richard
 
R

Richard Lewis

In case anyone's interested, I've just noticed a logical error in the
next_node() method:
OK, I'm beginning to wish I hadn't mentioned this now; I've changed the
insert_node() method as well:
================================
def insert_node(self, ref_doc, new_node):
if self.current_node.nodeType == Node.TEXT_NODE:
parent_node = self.current_node.parentNode
text_node = self.current_node
next_node = text_node.nextSibling

preceeding_portion =
ref_doc.createTextNode(text_node.data[:self.char_pos])
proceeding_portion =
ref_doc.createTextNode(text_node.data[self.char_pos:])

parent_node.replaceChild(preceeding_portion, text_node)
if next_node is None:
parent_node.appendChild(new_node)
parent_node.appendChild(proceeding_portion)
else:
parent_node.insertBefore(new_node, next_node)
parent_node.insertBefore(proceeding_portion, next_node)
# where is the cursor?
else:
parent_node = self.current_node.parentNode
next_node = self.current_node.nextSibling
if next_node is None:
parent_node.appendChild(new_node)
else:
parent_node.insertBefore(new_node, self.current_node)
# where is the cursor?
================================

I've done some more testing and it seems that, after a call to
insert_node() when the current_node is a Text node, current_node's
parentNode, nextSibling and firstChild properties become None (assuming
they weren't None before, which firstChild was).

Hmm. Um...er, yeah. I don't think anyones following me anyway....

I'll keep fiddling with it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,233
Members
46,821
Latest member
AleidaSchi

Latest Threads

Top