D
David Hirschfield
I've written a tree-like data structure that stores arbitrary python
objects. The objective was for the tree structure to allow any number of
children per node, and any number of root nodes...and for it to be
speedy for trees with thousands of nodes.
At its core, the structure is just a list of lists arranged so that if
node A has children B and C and node B has child D the data looks like:
A = [<A node data>, B, C]
B = [<B node data>, D]
C = [<C node data>]
where B, C and D are all lists with similar structures to A. I am
holding references to the individual nodes so that access to individual
nodes by reference is quick. Access by "tree path" is done by giving a
tuple of integers indicating where in the tree the node you want lies.
The path (1,2,5) indicates the 6th child of the 3rd child of the 2nd
root node. Everything involving basic tree functions works fine. Finding
any particular node this way is just a function of the depth of the node
in the tree, so it's very quick unless you have some degenerate tree
structure where nodes end up miles deep.
Here's my problem: I need to allow the tree to "hide" the roots, so that
the top-level appears to the outside world to be the children under the
root nodes, not the root nodes themselves. That means a path of (5,2)
indicates the 3rd child of the 6th "pseudo-root" node. Unfortunately, in
a tree with many root nodes, each containing many children (a common use
case for me) finding the 6th pseudo-root is going to be slow, and the
ways I've thought of to make it fast all require slow bookkeeping when
new nodes are inserted or removed at the pseudo-root level.
I'm running out of ideas, so if anyone has any thoughts on how to deal
with fudging which nodes are the roots efficiently...I'm all ears.
Thanks in advance,
-David
objects. The objective was for the tree structure to allow any number of
children per node, and any number of root nodes...and for it to be
speedy for trees with thousands of nodes.
At its core, the structure is just a list of lists arranged so that if
node A has children B and C and node B has child D the data looks like:
A = [<A node data>, B, C]
B = [<B node data>, D]
C = [<C node data>]
where B, C and D are all lists with similar structures to A. I am
holding references to the individual nodes so that access to individual
nodes by reference is quick. Access by "tree path" is done by giving a
tuple of integers indicating where in the tree the node you want lies.
The path (1,2,5) indicates the 6th child of the 3rd child of the 2nd
root node. Everything involving basic tree functions works fine. Finding
any particular node this way is just a function of the depth of the node
in the tree, so it's very quick unless you have some degenerate tree
structure where nodes end up miles deep.
Here's my problem: I need to allow the tree to "hide" the roots, so that
the top-level appears to the outside world to be the children under the
root nodes, not the root nodes themselves. That means a path of (5,2)
indicates the 3rd child of the 6th "pseudo-root" node. Unfortunately, in
a tree with many root nodes, each containing many children (a common use
case for me) finding the 6th pseudo-root is going to be slow, and the
ways I've thought of to make it fast all require slow bookkeeping when
new nodes are inserted or removed at the pseudo-root level.
I'm running out of ideas, so if anyone has any thoughts on how to deal
with fudging which nodes are the roots efficiently...I'm all ears.
Thanks in advance,
-David