Explaining Implementing a Binary Search Tree.

B

Bart Kastermans

I wrote a binary search tree in python, explaining as I was doing it
how and why I did it. I am very interested in receiving comments on
the code, process, and anything else that will improve my coding or
writing.

I wrote this all up in my blog at:

http://kasterma.wordpress.com/2008/06/15/implementing-a-binary-search-tree/

The code of the class has been copied below, but the description of
the process (mostly an attempt at approaching test driving development
for as far as I understand the term) has not been copied.

Any and all comments are appreciated.

Best,
Bart

*********** python code ************************


import re

class BSTree:
def __init__ (self, value = None):
self.value = value
self.left = None
self.right = None

def __str__ (self):
string = "("
if not self.value == None:
string += str (self.value)

if not (self.left == None and self.right == None):
if self.left != None:
string += str (self.left)
else:
string += "()"

if self.right != None:
string += str (self.right)
else:
string += "()"

string += ")"

return string

def FromString (self, string):
""" parses the string and builds the corresponding tree.

If the parsing fails we print an error message and make no
guarantees about the state of the tree.

"""

def readNumber (string, index):
""" reads the number starting at index.

Returns the number (as a string) that starts at index and
the index that is just after the number. It expects the
index to point to the first numberal.

"""

isdigit = re.compile ("\d")
number = ""
while isdigit.match (string [index]):
number += string[index]
index += 1

return number, index

def FromString_Help (string, index):
""" The function that actually does the parsing of the
string.

Variable index indicates where in the string we start
working,
it should be pointing to an open paren, and in returning
point
to just after the corresponding closing paren.

"""

if not string [index] == "(":
print "FromString_Help: ERROR: no opening paren",
print string, index

tree = BSTree ()

tree.value, index = readNumber (string, index + 1)

if string [index] == "(":
tree.left, index = FromString_Help (string, index)
tree.right, index = FromString_Help (string, index)

if not string[index] == ")":
print "FromString_Help: ERROR: no closing paren"
print string, index

if tree.value == '' and tree.left == None and tree.right
== None:
return None, index + 1
else:
tree.value = int (tree.value)

return tree, index + 1

index = 0
tree, index = FromString_Help (string, index)

if tree == None:
print "FromString: ERROR: empty tree passed"
return

self.value = tree.value
self.left = tree.left
self.right = tree.right

def inOrder (self):
""" gives a generator that runs through the tree inOrder """

values = []

if self.left != None:
values += self.left.inOrder ()

if self.value != None:
values.append (self.value)

if self.right != None:
values += self.right.inOrder ()

return values

def Insert (self, value):
""" add value to the tree in BSTree fashion.

We add the value so that the values of the tree will be in
order. We do not duplicate values in the tree.

"""
if self.value == None: # first value added to this tree
self.value = value

if self.value < value:
if self.right == None:
self.right = BSTree (value)
else:
self.right.Insert (value)

if self.value > value:
if self.left == None:
self.left = BSTree (value)
else:
self.left.Insert (value)

def Delete (self, value, parent = None, RL = None):
""" remove value from the tree in BSTree fashion.

Note: we might end up with an empty tree.

"""
if self.value == value:
if self.left == None and self.right == None:
if parent != None:
if RL == "right":
parent.right = None
else:
parent.left = None
else:
self.value = None # created an empty tree, note
only
# happens at the root.
return
if self.left == None:
self.value = self.right.value
self.left = self.right.left
self.right = self.right.right
elif self.left.right == None:
self.value = self.left.value
self.left = self.left.left
else:
moveup = self.left
while moveup.right != None:
# note that by the elif above at least one step is
taken
# we are here looking for the node to move up to
the root
moveup_parent = moveup
moveup = moveup.right
moveup_parent.right = moveup.left
self.value = moveup.value
return

if self.value < value:
self.right.Delete (value, self, "right")

if self.value > value:
self.left.Delete (value, self, "left")

def Search (self, value):
if self.value == value:
return "yes"
if self.value < value:
if self.right == None:
return "no"
else:
return self.right.Search (value)
if self.value > value:
if self.left == None:
return "no"
else:
return self.left.Search (value)
 
T

Terry Reedy

|I wrote a binary search tree in python, explaining as I was doing it
| how and why I did it. I am very interested in receiving comments on
| the code, process, and anything else that will improve my coding or
| writing.
|
| I wrote this all up in my blog at:
|
|
http://kasterma.wordpress.com/2008/06/15/implementing-a-binary-search-tree/
|
| The code of the class has been copied below, but the description of
| the process (mostly an attempt at approaching test driving development
| for as far as I understand the term) has not been copied.
|
| Any and all comments are appreciated.
|
| Best,
| Bart
|
| *********** python code ************************
|
|
| import re
|
| class BSTree:
| def __init__ (self, value = None):
| self.value = value
| self.left = None
| self.right = None

There are two possible approaches.
1. Define 1 tree class that is also used for subtrees -- what you did.
Complication: self.value of root node can be none, so you constantly
have to check self.value for None even though only possible for root node.
2. Define tree class and node class. This had other complications, but
removes that above and makes str easier. tree.str = '(' str(rootnode) ')'
and node.str= self.value '(' str(self.left) ')' '(' str(self.right) ')'.

If use '' instead of None, no conditionals are needed. (This might apply
partly to your version as well.) Or define class NullTree with a singleton
instance with no attributes and a str method returning '' and an inOrder
method yielding nothing. This would also eliminate the condifionals in the
inOrder method. Not sure what is best. With a good test suite, it is easy
to make sure alternative implementations 'work' before testing for speed.

| def __str__ (self):

string appending is an O(n**2) operations. The usual idiom, applied here,
would be slist = ['('], slist.append(str(self.value)), .... return
''.join(slist). In other words, collect list of pieces and join at end.

| string = "("
| if not self.value == None:
| string += str (self.value)
|
| if not (self.left == None and self.right == None):
| if self.left != None:
| string += str (self.left)
| else:
| string += "()"
|
| if self.right != None:
| string += str (self.right)
| else:
| string += "()"
|
| string += ")"
|
| return string
|
| def FromString (self, string):
| """ parses the string and builds the corresponding tree.
|
| If the parsing fails we print an error message and make no
| guarantees about the state of the tree.
|
| """
|
| def readNumber (string, index):
| """ reads the number starting at index.
|
| Returns the number (as a string) that starts at index and
| the index that is just after the number. It expects the
| index to point to the first numberal.
|
| """
|
| isdigit = re.compile ("\d")
| number = ""
| while isdigit.match (string [index]):
| number += string[index]
| index += 1
|
| return number, index
|
| def FromString_Help (string, index):
| """ The function that actually does the parsing of the
| string.
|
| Variable index indicates where in the string we start
| working,
| it should be pointing to an open paren, and in returning
| point
| to just after the corresponding closing paren.
|
| """
|
| if not string [index] == "(":
| print "FromString_Help: ERROR: no opening paren",
| print string, index
|
| tree = BSTree ()
|
| tree.value, index = readNumber (string, index + 1)
|
| if string [index] == "(":
| tree.left, index = FromString_Help (string, index)
| tree.right, index = FromString_Help (string, index)
|
| if not string[index] == ")":
| print "FromString_Help: ERROR: no closing paren"
| print string, index
|
| if tree.value == '' and tree.left == None and tree.right
| == None:
| return None, index + 1
| else:
| tree.value = int (tree.value)
|
| return tree, index + 1
|
| index = 0
| tree, index = FromString_Help (string, index)
|
| if tree == None:
| print "FromString: ERROR: empty tree passed"
| return
|
| self.value = tree.value
| self.left = tree.left
| self.right = tree.right
|
| def inOrder (self):
| """ gives a generator that runs through the tree inOrder """

Nope. Returns an inorder list of values.

For an actual generator, which I recommend, *yield* values.

|
| values = []

[delete this'


| if self.left != None:
| values += self.left.inOrder ()

for value in self.left.inOrder: yield value

| if self.value != None:
| values.append (self.value)

yield self.value

| if self.right != None:
| values += self.right.inOrder ()

for value in self.right.inOrder: yield value

| return values

[delete this]

The above with result in a stack of generators as deep as the tree. It
would probably be faster to eliminate the recursion with an explicit stack
of trees, but I cannot currently write that off the top of my head. (This
is part of my current writing project.) But again, a test suite eases
exploration of alternatives.

| def Insert (self, value):
| """ add value to the tree in BSTree fashion.
|
| We add the value so that the values of the tree will be in
| order. We do not duplicate values in the tree.
|
| """
| if self.value == None: # first value added to this tree
| self.value = value
|
| if self.value < value:
| if self.right == None:
| self.right = BSTree (value)
| else:
| self.right.Insert (value)
|
| if self.value > value:
| if self.left == None:
| self.left = BSTree (value)
| else:
| self.left.Insert (value)
|
| def Delete (self, value, parent = None, RL = None):
| """ remove value from the tree in BSTree fashion.
|
| Note: we might end up with an empty tree.
|
| """
| if self.value == value:
| if self.left == None and self.right == None:
| if parent != None:
| if RL == "right":
| parent.right = None
| else:
| parent.left = None
| else:
| self.value = None # created an empty tree, note
| only
| # happens at the root.
| return
| if self.left == None:
| self.value = self.right.value
| self.left = self.right.left
| self.right = self.right.right
| elif self.left.right == None:
| self.value = self.left.value
| self.left = self.left.left
| else:
| moveup = self.left
| while moveup.right != None:
| # note that by the elif above at least one step is
| taken
| # we are here looking for the node to move up to
| the root
| moveup_parent = moveup
| moveup = moveup.right
| moveup_parent.right = moveup.left
| self.value = moveup.value
| return
|
| if self.value < value:
| self.right.Delete (value, self, "right")
|
| if self.value > value:
| self.left.Delete (value, self, "left")
|
| def Search (self, value):
| if self.value == value:
| return "yes"
| if self.value < value:
| if self.right == None:
| return "no"
| else:
| return self.right.Search (value)
| if self.value > value:
| if self.left == None:
| return "no"
| else:
| return self.left.Search (value)
| --
| http://mail.python.org/mailman/listinfo/python-list
|
 
B

Bart Kastermans

Summary: can't verify big O claim, how to properly time this?

|I wrote a binary search tree in python, explaining as I was doing it
| how and why I did it. I am very interested in receiving comments on
| the code, process, and anything else that will improve my coding or
| writing.
|
| I wrote this all up in my blog at:
|
|http://kasterma.wordpress.com/2008/06/15/implementing-a-binary-search...
|
| The code of the class has been copied below, but the description of
| the process (mostly an attempt at approaching test driving development
| for as far as I understand the term) has not been copied.
|
| Any and all comments are appreciated.
|
| Best,
| Bart
|
| *********** python code ************************
|
|
| import re
|
| class BSTree:
| def __init__ (self, value = None):
| self.value = value
| self.left = None
| self.right = None

There are two possible approaches.
1. Define 1 tree class that is also used for subtrees -- what you did.
Complication: self.value of root node can be none, so you constantly
have to check self.value for None even though only possible for root node.
2. Define tree class and node class. This had other complications, but
removes that above and makes str easier. tree.str = '(' str(rootnode) ')'
and node.str= self.value '(' str(self.left) ')' '(' str(self.right) ')'.

If use '' instead of None, no conditionals are needed. (This might apply
partly to your version as well.) Or define class NullTree with a singleton
instance with no attributes and a str method returning '' and an inOrder
method yielding nothing. This would also eliminate the condifionals in the
inOrder method. Not sure what is best. With a good test suite, it is easy
to make sure alternative implementations 'work' before testing for speed.

Thanks for the idea. I would expect the separation to lead to
somewhat more
code, but all the "checking the root" code would be separated out in
the
tree class. The node class would be very smooth. I'll try this when
I have
some time (today I spend my "alloted" programming time on what is
below).

(also the comment about inOrder returning a generator was because I
tried to
figure it out, failed, and then got enough by doing it without yield.
I forgot
to bring my comment in line with my code. A generator
would certainly be nicer, and I'll work at understanding your
suggestion for
it.)
| def __str__ (self):

string appending is an O(n**2) operations. The usual idiom, applied here,
would be slist = ['('], slist.append(str(self.value)), .... return
''.join(slist). In other words, collect list of pieces and join at end.

This is interesting. I had never attempted to verify a big O
statement
before, and decided that it would be worth trying. So I wrote some
code to
collect data, and I can't find that it goes quadratic. I have the
graph
at

http://kasterma.wordpress.com/2008/06/16/complexity-of-string-concatenation/

It looks piecewise linear to me.

The code I used to collect the data is as follows:

*************************************************************************

import time

NUMBER = 100 # number of strings to concatenate at given length
JUMP = 500 # jump (and start length) of length of strings
END = 100001 # longest length string considered

def randomString (length):
""" returns a random string of letters from {a,b,c,d} of length
"""

string = ""

for i in range (0,length):
string += choice ("abcd")

return string

def randomStrings (number, length):
""" returns an array of number random strings all of length """

array = []

for i in range (0, number):
array.append (randomString (length))

return array

TimingData = []

for length in range (JUMP, END, JUMP):
array1 = randomStrings (NUMBER, length)
array2 = randomStrings (NUMBER, length)

starttime = time.clock ()
for i in range (0, NUMBER):
string = array1 + array2
endtime = time.clock ()
print "length", length, "done"

TimingData.append ([length, 1000* (endtime-starttime)])
# to get a better looking graph multiply by 1000

sagefile = open ('stringConcatGraph.sage', "w")
sagefile.write ("points =" + str (TimingData) + "\n")
sagefile.write ("graph = line (points)\n")
sagefile.write ("graph.show ()\n")
sagefile.close ()
 
I

Ian Kelly

This is interesting. I had never attempted to verify a big O
statement
before, and decided that it would be worth trying. So I wrote some
code to
collect data, and I can't find that it goes quadratic. I have the
graph
at

http://kasterma.wordpress.com/2008/06/16/complexity-of-string-concatenation/

It looks piecewise linear to me.

I don't think there's any question that it's quadratic. Just look at
the C code, and you'll see that every time you concatenate the entire
string has to be copied. Remember that the copy code uses memcpy from
the C library, though, so it's quite fast. If you're not seeing it
for relatively small strings, it's probably that the major time
component is the Python looping construct, not the copy operation.

I tried it at lengths of multiples of 10, and it remained linear up
until 1E6. At 1E7, it appears to turn quadratic. I tried 1E8, but it
hasn't finished yet. I expect it to take the better part of an hour.
20.320074199374631

Even though it doesn't become quadratic until 1E7, it's still up to an
order of magnitude faster to use str.join for smaller strings, though:
b = Timer("''.join(['a'] * 1000000)")
b.timeit(1)
0.044094989726346512

-Ian
 
G

Gabriel Genellina

Summary: can't verify big O claim, how to properly time this?

This is interesting. I had never attempted to verify a big O
statement
before, and decided that it would be worth trying. So I wrote some
code to
collect data, and I can't find that it goes quadratic.

In your test code, you're concatenating only two strings, that's a *single* operation, and takes time proportional to the total length.
The quadratic behavior appears when you do *several* concatenations in a row (like in your original code, where += was used several times to build a result).
If you want to verify it, try joining N strings of size M (for varying values of N and M), and plot total time vs. N (for a given M value), and total time vs. M (for a given N value) and finally total time vs. (N*M), see what happens and post your findings again.
 
B

Bart Kastermans

In your test code, you're concatenating only two strings, that's a *single* operation, and takes time proportional to the total length.
The quadratic behavior appears when you do *several* concatenations in a row (like in your original code, where += was used several times to build a result).
If you want to verify it, try joining N strings of size M (for varying values of N and M), and plot total time vs. N (for a given M value), and total time vs. M (for a given N value) and finally total time vs. (N*M), see what happens and post your findings again.

I did the work and found that for trees it does not go quadratic at
all, from the computations you suggest it is easy to get quadratic
behavior though. Also this does not depend in any way I can see on
the implementation by Python. Actual time might certainly improve by
doing it differently (changing the constants), but the big O results
won't.

For pictures and math I have put it up on my blog again see:
http://kasterma.wordpress.com/2008/06/21/complexity-of-string-concatenation-ii/

Thanks to everyone who commented so far on this subject. I had some
good fun with it so far.

Best,
Bart
 
B

Bart Kastermans

|    def __str__ (self):
string appending is an O(n**2) operations.  The usual idiom, applied here,
would be slist = ['('], slist.append(str(self.value)), .... return
''.join(slist).  In other words, collect list of pieces and join at end.

I did some timing of operations involved. Doing this I found that
the operation below to get a string representation for trees was
in fact not quadratic.

The final nail in the coffin of this comment is now at:
http://kasterma.wordpress.com/2008/06/30/strings-versus-lists/
(I put it there because the main part is a graph of timing data)

Appending for lists is slower than appending the strings.

This means that the operation using strings is faster.

Again, thanks for all the comments, I enjoyed working this out. Even
better would be to point out any mistakes in my arguments or code.

Best,
Bart
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,955
Messages
2,570,117
Members
46,705
Latest member
v_darius

Latest Threads

Top