kD-Trees implementation

M

Marcin Raczkowski

Hello.

I started playing with ruby kD-Trees implementation, an thoughts?


# to be extended to allow for searching / parasing nodes
class KDTree
attr_reader :root

def initialize(points)
@root = KDNode.new.parse(points)
end

def add_node(point)
@root.add(point)
end
end

# pretty standard tree node
class KDNode
attr_reader :left, :right
attr_reader :location

def initialize(location=nil, left=nil, right=nil)
@location = location
@left = left
@right = right
end

# parse list of points and build nodes
def parse(points, depth = 0)
@depth = depth # save depth for this node for removing
@dim = points[0].length # dimension based on point dimensions
axis = depth % @dim

points = points.sort_by{|point| point[axis]} # sorting by axis
half = points.length / 2 # algoithm suggest using median
# but in reality we don't need median value
# we just need to cut it in half

@location = points[half] # here we can store just index
@left = KDNode.parse(points[0..half-1], depth+1) unless half < 1
@right = KDNode.parse(points[half+1..-1], depth+1) unless half+1 >=
points.length
end

# add leaf to node, adding nodes will make tree unbalanced
def add(point, depth = 0)
@dim = point.length
axis = depth % @dim

if @location[axis] < point[axis]
@left ? @left.add(point) : @left = KDNode.new(point)
else
@right ? @right.add(point) : @right = KDNode.new(point)
end
end

# remove node from tree
def remove
self.parse(@left.to_a + @right.to_a, @depth)
# no idea if it should be @depth or @depth + 1
end

# list all leafs including this node
def to_a
@left.to_a + [@location] + @right.to_a
end
end

I'm thinking about storing only indexes in leaves, and storing point
list in kD-Tree.

any suggestions?
 
M

Marcin Raczkowski

new code - lets you print trees, and find k-nearest points (at least i
hope so ;))

require 'pp'

class KDTree
attr_reader :root
attr_reader :points

def initialize(points, dim)
@dim = dim
@root = KDNode.new(dim).parase(points)
end

def add_node(point)
@root.add(point)
end

def find(*range)
range.collect{ |pa|
pa = Range.new(pa.first, pa.last) if pa.kind_of? Array
}
@points = []
query(range, @root)
@points
end

def query(range, node)
axis = node.axis
median = node.location[axis]

if node.left && (median >= range[axis].begin)
query(range, node.left); #/* Skip left if max of left tree
(median) is out of range */
end
if node.right && (median <= range[axis].end)
query(range, node.right); #/* Skip right if min of right tree
(median) is out of range */
end
if (0..@dim-1).all?{|ax|
range[ax].include? node.location[ax]
}
@points << node.location;
end
end

def print
@root.print
end
end

class KDNode
attr_reader :left, :right
attr_reader :location

attr_reader :axis

def initialize(dim, location=nil, left=nil, right=nil)
@dim = dim
@location = location
@left = left
@right = right
end

def parase(points, depth = 0)
@axis = depth % @dim

points = points.sort_by{|point| point[@axis]}
half = points.length / 2

@location = points[half]
@left = KDNode.new(@dim).parase(points[0..half-1], depth+1) unless
half < 1
@right = KDNode.new(@dim).parase(points[half+1..-1], depth+1)
unless half+1 >= points.length
self
end

def add(point)
if @location[@axis] < point[@axis]
@left ? @left.add(point) : @left = KDNode.new(point)
else
@right ? @right.add(point) : @right = KDNode.new(point)
end
end

def remove
self.parse(@left.to_a + @right.to_a, @axis)
end

def to_a
@left.to_a + [@location] + @right.to_a
end

def print(l=0)
@left.print(l+1) if @left
puts(" "*l + @location.inspect)
@right.print(l+1) if @right
end
end

a = []

10.times do |x|
10.times do |y|
10.times do |z|
a << [x, y, z]
end
end
end

tree = KDTree.new(a, 3)
tree.print

puts " -------------- "

tree.find([0,4], [1,4], [5,7])

pp tree.points.sort_by{|po| po[0]}
 
C

Clifford Heath

Marcin said:
class KDNode .....
def parase(points, depth = 0) .....
points = points.sort_by{|point| point[@axis]}

Is this the standard way to build a kdtree?

Seems awfully wasteful to re-sort the values for
each level. Can't you build @dim copies of the
original array, each sorted one on that dim, then
build the kdtree over the partitions on that?

Clifford Heath.
 
M

Marcin Raczkowski

Clifford said:
Marcin said:
class KDNode ....
def parase(points, depth = 0) ....
points = points.sort_by{|point| point[@axis]}

Is this the standard way to build a kdtree?

Seems awfully wasteful to re-sort the values for
each level. Can't you build @dim copies of the
original array, each sorted one on that dim, then
build the kdtree over the partitions on that?

Clifford Heath.

Problem is more complex than you think.
Solution that have @dim copies are o(nlogn) but
if i have @dim copies sorted by axis, after first split i have to
somehow mark other copies - so next 2 splits will operate only on
elements that belongs to proper branch.

Maybny i understood it wrong and there's better solution, but for now
I'm focusing on writing proper algorithm , then i might think how to
optimize it.
 
C

Clifford Heath

Marcin said:
Problem is more complex than you think.

Yes, ok, I see that now, @dim copies doesn't help much.
I'm still interested to know what better method there
might be, if any.
for now
I'm focusing on writing proper algorithm , then i might think how to
optimize it.

Fair enough!
 
M

Marcin Raczkowski

Clifford said:
Yes, ok, I see that now, @dim copies doesn't help much.
I'm still interested to know what better method there
might be, if any.


Fair enough!
There are reaserch papers that descripbe O(nlogn) instead of (nlog^2n)
algorithms but They are usually full of complex math and 20-50 pages
long. Also one that i could understand Is based n pointers and
interlinked lists and it's kinda hard to implement in Ruby
 
C

Clifford Heath

Marcin said:
There are reaserch papers that descripbe O(nlogn) instead of (nlog^2n)
algorithms but They are usually full of complex math and 20-50 pages
long. Also one that i could understand Is based n pointers and
interlinked lists and it's kinda hard to implement in Ruby

The wikipedia page for kdtrees references an o(n) median-finding algorithm
in a book compiled by Cormen and others. I have the book, and the algorithm
doesn't look too scary. I might have a go at implementing it. But in any
case, the exact median is only needed if you demand an exactly-balanced tree.
Subsequent insertions will unbalance the tree anyhow in most cases, so it's
probably not necessary.

The algorithm for finding the i'th lowest entry given in the book is as follows.
(copyright, fair use provisions allow posting):

The SELECT algorithm determines the ith smallest of an input array of n > 1 elements by
executing the following steps. (If n = 1, then SELECT merely returns its only input value as
the ith smallest.)
1. Divide the n elements of the input array into ⌊n/5⌋ groups of 5 elements each and at
most one group made up of the remaining n mod 5 elements.
2. Find the median of each of the ⌈n/5⌉ groups by first insertion sorting the elements of
each group (of which there are at most 5) and then picking the median from the sorted
list of group elements.
3. Use SELECT recursively to find the median x of the ⌈n/5⌉ medians found in step 2.
(If there are an even number of medians, then by our convention, x is the lower
median.)
4. Partition the input array around the median-of-medians x using the modified version of
PARTITION. Let k be one more than the number of elements on the low side of the
partition, so that x is the kth smallest element and there are n-k elements on the high
side of the partition.
5. If i = k, then return x. Otherwise, use SELECT recursively to find the ith smallest
element on the low side if i < k, or the (i - k)th smallest element on the high side if i >
k.

That's it, not too scary.

Clifford Heath.
 
M

Marcin Raczkowski

Clifford said:
The wikipedia page for kdtrees references an o(n) median-finding algorithm
in a book compiled by Cormen and others. I have the book, and the algorithm
doesn't look too scary. I might have a go at implementing it. But in any
case, the exact median is only needed if you demand an exactly-balanced
tree.
Subsequent insertions will unbalance the tree anyhow in most cases, so it's
probably not necessary.

The algorithm for finding the i'th lowest entry given in the book is as
follows.
(copyright, fair use provisions allow posting):

The SELECT algorithm determines the ith smallest of an input array of n
executing the following steps. (If n = 1, then SELECT merely returns its
only input value as
the ith smallest.)
1. Divide the n elements of the input array into ⌊n/5⌋ groups of 5
elements each and at
most one group made up of the remaining n mod 5 elements.
2. Find the median of each of the ⌈n/5⌉ groups by first insertion
sorting the elements of
each group (of which there are at most 5) and then picking the median
from the sorted
list of group elements.
3. Use SELECT recursively to find the median x of the ⌈n/5⌉ medians
found in step 2.
(If there are an even number of medians, then by our convention, x is
the lower
median.)
4. Partition the input array around the median-of-medians x using the
modified version of
PARTITION. Let k be one more than the number of elements on the low side
of the
partition, so that x is the kth smallest element and there are n-k
elements on the high
side of the partition.
5. If i = k, then return x. Otherwise, use SELECT recursively to find
the ith smallest
element on the low side if i < k, or the (i - k)th smallest element on
the high side if i >
k.

That's it, not too scary.

Clifford Heath.

I didn't say it was scary, I'm not scared of complex math but since
finals are coming soon I don't have enought time.

I didn't have this book, and other sources were pretty confusing
(Reserch papers tend to be this way) - after i understood the algorithm
writing Ruby implementation was piece of cake

Anyway thanks for posting this piece of article, I'll try to implement
it ASAP.

about finding median - my algorithm doesn't find it either, but problem
is to find a fast way to select (with some degree of aproximation)
elements that were assigned to current branch
 
C

Clifford Heath

Marcin said:
Anyway thanks for posting this piece of article,

Not a problem - thanks for sharing your kdtree implementation!
about finding median - my algorithm doesn't find it either

But I think it does - you sort N elements and use the N/2'th
element, which is the median. That guarantees a balanced tree.

The r-tree (and priority r-tree) looks like a useful structure
too - the equivalent of a b-tree for spatial data.

Clifford Heath.
 
M

Marcin Raczkowski

about finding median - my algorithm doesn't find it either
But I think it does - you sort N elements and use the N/2'th
element, which is the median. That guarantees a balanced tree.
Well, I guess it's close enought - it doesn't guarantee balanced tree -
you have to take N+1 cases into consideration.

The r-tree (and priority r-tree) looks like a useful structure
too - the equivalent of a b-tree for spatial data.

Clifford Heath.

Well I started implementing it as brain-tease, and platform to test my
optimization techniques - so plan is to implement full set of operations
- and then start to optimize it as much as i can ;)

One of properties of kd-trees is important to me - fast finding
neighbours in k-dimensions
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top