A note on heapq module

B

bearophileHUGS

In few minutes I have just written this quite raw class, it lacks
doctring (the same of the functions of the heapq module), it may
contain bugs still, I haven't tested it much. It's just a simple
wrapper around some of the functions of the heapq module (nsmallest and
nlargest are missing). Usually I use classes only when I need then, I
like functional programming enough, and this class seems to just slow
down the functions of the heapq module. But I think an improved class
similar to this one may be useful (added, replacing isn't necessary)
into the heapq module, because it can avoid cetain errors: I know what
Heaps are and how they work, but some time ago I have written a bug
into small program that uses the functions of the heapq module, because
I have added an item to the head of the heap using a normal list
method, breaking the heap invariant. With a simple class like this one
all the methods of your object keep the heap invariant, avoiding that
kind of bugs. (If you are interested I can improve this class some,
it't not difficult.)


from heapq import heapify, heappush, heappop, heapreplace

class Heap(object):
def __init__(self, init=None):
if init is None:
self.h = []
elif isinstance(init, Heap):
self.h = init.h[:]
else:
self.h = list(init)
heapify(self.h)
def smallest(self):
return self.h[0]
def sort(self, cmp=None, key=None):
self.h.sort(cmp=cmp, key=key)
def heappush(self, item):
heappush(self.h, item)
def heappop(self):
return heappop(self.h)
def heapreplace(self, item):
return heapreplace(self.h, item)
def clear(self):
del self.h[:]
def __contains__(self, item):
return item in self.h
def __hash__(self):
raise TypeError("Heap objects are unhashable.")
def __iter__(self):
return self.h.__iter__()
def __eq__(self, other):
return isinstance(other, Heap) and self.h == other.h
def __ne__(self, other):
return not isinstance(other, Heap) or self.h != other.h
def __len__(self):
return len(self.h)
def __nonzero__(self):
return bool(self.h)
def __str__(self):
return str(self.h)
def __repr__(self):
return "Heap(%s)" % self.h if self.h else "Heap()"

Bye,
bearophile
 
S

Steven Bethard

some time ago I have written a bug
into small program that uses the functions of the heapq module, because
I have added an item to the head of the heap using a normal list
method, breaking the heap invariant.

I know I've had similar bugs in my code before.
from heapq import heapify, heappush, heappop, heapreplace

class Heap(object):
[snip implementation]

+1 for adding a Heap object like this to the collections module. If you
can make a little time to add some tests (you could probably steal a lot
of what you need from test_heapq.py) you should definitely submit it as
a patch for 2.6.

STeVe
 
B

bearophileHUGS

I think the sort has to be simplified, otherwise it can't keep the heap
invariant...

def sort(self):
self.h.sort()

Bye,
bearophile
 
J

Jussi Salmela

(e-mail address removed) kirjoitti:
I think the sort has to be simplified, otherwise it can't keep the heap
invariant...

def sort(self):
self.h.sort()

Bye,
bearophile
And __repr__ should be something like this:
=========
def __repr__(self):
if self.h:
return "Heap(%s)" % self.h
else:
return "Heap()"
==========
to make it compatible with versions earlier than 2.5

Cheers,
Jussi
 
S

Scott David Daniels

In few minutes I have just written this quite raw class, ....
I'd suggest some changes. It is nice to have Heaps with equal
contents equal no matter what order the inserts have been done.
Consider how you want Heap([1, 2, 3]) and Heap([3, 1, 2]) to behave.
Similarly, it is nice to have str and repr produce canonical
representations (I would skip the __str__ code, myself, though).
Also, subclasses should get their identities tweaked as so:
class Heap(object):
...
def sort(self, cmp=None, key=None):
self.h.sort(cmp=cmp, key=key) I'd remove this method.
...
def __eq__(self, other):
return isinstance(other, Heap) and self.h == other.h
return isinstance(other, self.__class__) and sorted(
self.h) == sorted(other.h)
def __ne__(self, other):
return not isinstance(other, Heap) or self.h != other.h
return not isinstance(other, self.__class__) and sorted(
self.h) != sorted(other.h)
...
def __str__(self):
return str(self.h)
return str(sorted(self.h))
def __repr__(self):
return "Heap(%s)" % self.h if self.h else "Heap()"
return "%s(%s)" % (self.__class__.__name__, sorted(self.h))
And I'd add:
def __lt__(self, other):
raise TypeError('no ordering relation is defined for %s'
% self.__class__.__name__)
__gt__ = __le__ = __ge__ = __lt__

--Scott David Daniels
(e-mail address removed)
 
S

Scott David Daniels

Scott David Daniels wrote:

Sorry, I blew the __ne__:
return not isinstance(other, self.__class__) and sorted(
self.h) != sorted(other.h)
Should be:
return not isinstance(other, self.__class__) or sorted(
self.h) != sorted(other.h)

--Scott David Daniels
(e-mail address removed)
 
B

bearophileHUGS

Scott David Daniels:
I'd suggest some changes. It is nice to have Heaps with equal
contents equal no matter what order the inserts have been done.
Consider how you want Heap([1, 2, 3]) and Heap([3, 1, 2]) to behave.
Similarly, it is nice to have str and repr produce canonical
representations (I would skip the __str__ code, myself, though).
Also, subclasses should get their identities tweaked as so:

My version was a *raw* one, just an idea, this is a bit better:
http://rafb.net/p/nLPPjo35.html
I like your changes. In the beginning I didn't want to put __eq__
__ne__ methods at all, because they are too much slow, but being them
O(n ln n) I think your solution is acceptable.

Some methods may have a name different from the heap functions:
def smallest(self):
def push(self, item):
def pop(self):
def replace(self, item):

Two things left I can see: I think the __init__ may have a boolean
inplace parameter to avoid copying the given list, so this class is
about as fast as the original heapify function (but maybe such thing is
too much dirty for a Python stdlib):

def __init__(self, sequence=None, inplace=False):
if sequence is None:
self._heap = []
elif isinstance(sequence, self.__class__):
self._heap = sequence._heap[:]
else:
if inplace and isinstance(sequence, list):
self._heap = sequence
else:
self._heap = list(sequence)
heapify(self._heap)

The second thing, I don't like much the __iter__ because it yields
unsorted items:

def __iter__(self):
return self._heap.__iter__()

Is this good? I think this can be accepted.

Thank you,
bye,
bearophile
 
A

Antoon Pardon

In few minutes I have just written this quite raw class, it lacks
doctring (the same of the functions of the heapq module), it may
contain bugs still, I haven't tested it much. It's just a simple
wrapper around some of the functions of the heapq module (nsmallest and
nlargest are missing). Usually I use classes only when I need then, I
like functional programming enough, and this class seems to just slow
down the functions of the heapq module. But I think an improved class
similar to this one may be useful (added, replacing isn't necessary)
into the heapq module, because it can avoid cetain errors: I know what
Heaps are and how they work, but some time ago I have written a bug
into small program that uses the functions of the heapq module, because
I have added an item to the head of the heap using a normal list
method, breaking the heap invariant. With a simple class like this one
all the methods of your object keep the heap invariant, avoiding that
kind of bugs. (If you are interested I can improve this class some,
it't not difficult.)

[ Heap class based on heapq ]

For me, your class has the same drawback as the heappush, heappop
procedurers: no way to specify a comparision function. Somewhere
in my experimental code I work with line segments in two dimensions.
Now in one place I want from the available segments the one in which the
first point is farthest to the left. In a second place I want from the
available segments the one in which the second point is farthest to the
left. Both can be done with a heap, but currently I can't get both
behaviours while using the same class and using the heapq module or
your Heap class.
 
N

Neil Cerutti

Scott David Daniels:
I'd suggest some changes. It is nice to have Heaps with equal
contents equal no matter what order the inserts have been
done. Consider how you want Heap([1, 2, 3]) and Heap([3, 1,
2]) to behave. Similarly, it is nice to have str and repr
produce canonical representations (I would skip the __str__
code, myself, though). Also, subclasses should get their
identities tweaked as so:

My version was a *raw* one, just an idea, this is a bit better:
http://rafb.net/p/nLPPjo35.html
I like your changes. In the beginning I didn't want to put
__eq__ __ne__ methods at all, because they are too much slow,
but being them O(n ln n) I think your solution is acceptable.

Some methods may have a name different from the heap functions:
def smallest(self):
def push(self, item):
def pop(self):
def replace(self, item):

Two things left I can see: I think the __init__ may have a
boolean inplace parameter to avoid copying the given list, so
this class is about as fast as the original heapify function
(but maybe such thing is too much dirty for a Python stdlib):

One more idea, cribbed from the linked list thread elsewhere: it
might be nice if your Heap could optionally use an underlying
collections.deque instead of a list.

I don't know how excellent Python's deque is, but it's possible a
deque would provide a faster heap than a contiguous array. C++'s
std::deque is the default implementation of C++'s
std::priority_queue for that reason (unless I'm confused again).
 
S

Steven Bethard

Antoon said:
For me, your class has the same drawback as the heappush, heappop
procedurers: no way to specify a comparision function.

Agreed. I'd love to see something like ``Heap(key=my_key_func)``.

STeVe
 
B

bearophileHUGS

Neil Cerutti:
One more idea, cribbed from the linked list thread elsewhere: it
might be nice if your Heap could optionally use an underlying
collections.deque instead of a list.
I don't know how excellent Python's deque is, but it's possible a
deque would provide a faster heap than a contiguous array. C++'s
std::deque is the default implementation of C++'s
std::priority_queue for that reason (unless I'm confused again).

If you have some minutes you can do few speed tests and show us the
code and the timing results...

Bye,
bearophile
 
N

Neil Cerutti

Neil Cerutti:

If you have some minutes you can do few speed tests and show us
the code and the timing results...

I'll see what I can come up with. I'll have to test using the
native Python implementation, which should work with deque,
rather than the optimized C implementation, which doesn't.

Unfortunately the inability to take advantage of the C
implementation of heapq might swamp any possible advantage of
using deque instead of list in a Heap class.
 
N

Neil Cerutti

Neil Cerutti:

If you have some minutes you can do few speed tests and show us
the code and the timing results...

collections.deque is the loser.

Here's the output of the program from my last run:

list: 5.81679827554
deque: 6.40347742423
C heapq: 2.24028186815

Here's the program. You can customize it somewhat to attempt to
model a real program. It builds up a heap of random integers to
some size, performs random pushes and pops for a while, and then
pops the heap down until it's empty.

import random
import timeit

OPCOUNT = 5000
HEAP_SIZE = 100

# Create a random sequence of pushes and pops.
pushes = 0
ops = []
for i in xrange(OPCOUNT):
x = random.randint(0, 1)
if x == 0 or pushes < HEAP_SIZE:
pushes += 1
ops.append(0)
else:
pushes -= 1
ops.append(1)
# Pop off the rest
for i in xrange(pushes):
ops.append(1)

def test(mod, cont):
for op in ops:
if op:
mod.heappop(cont)
else:
mod.heappush(cont, random.randint(0, 150))

# heapqd is the pure Python heapq module without the C implementation.
t1 = timeit.Timer("test(heapqd, list())",
"from __main__ import test; import heapqd")
t2 = timeit.Timer("test(heapqd, deque())",
"from __main__ import test; "\
"from collections import deque; "\
"import heapqd")
t3 = timeit.Timer("test(heapq, list())",
"from __main__ import test; import heapq")
print "list:", t1.timeit(100)
print "deque:", t2.timeit(100)
print "C heapq:", t3.timeit(100)
 
S

Steven Bethard

Steven Bethard:

It can be done, but the code becomes more complex and hairy:
http://rafb.net/p/iCCmDz16.html

Cool!

The current code fails when using unbound methods however::
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
AttributeError: type object 'Heap' has no attribute 'push'

I would have written the code like::

def smallest(self):
result = self._heap[0]
if self._key is not None:
result = result[2]
return result

def push(self, item):
if self._key is not None:
item = self._key(item), self._itemid, item
self._itemid += 1
heappush(self._heap, item)

def pop(self):
result = heappop(self._heap)
if self._key is not None:
result = result[2]
return result

def popn(self, n):
result = [heappop(self._heap) for _ in xrange(n)]
if self._key is not None:
result = [item[2] for item in result]
return result

def replace(self, item):
if self._key is not None:
item = self._key(item), self._itemid, item
self._itemid += 1
result = heapreplace(self._heap, item)
if self._key is not None:
result = result[2]
return result

This allows unbound methods to work, and substantially lowers the code
duplication. It does add a few extra if statements, but since they're
``is`` tests, they should be pretty fast.

STeVe
 
B

bearophileHUGS

Steven said:
The current code fails when using unbound methods however::

I don't like your solution, this class was already slow enough. Don't
use unbound methods with this class :)
Maybe there's a (better) solution to your problem: to make Heap a
function (or classmethod) that return sone of two possibile objects
created by one of two different classes that have different methods...

Beside that, I think __eq__ method needs more tests, because comparing
a Heap with key against another Heap without key may give some
problems...

I'll think about such problems/things.

Bye,
bearophile
 
S

Steven Bethard

I don't like your solution, this class was already slow enough. Don't
use unbound methods with this class :)
Maybe there's a (better) solution to your problem: to make Heap a
function (or classmethod) that return sone of two possibile objects
created by one of two different classes that have different methods...

That's probably viable. Maybe you should just have two separate
classes: Heap and KeyedHeap. The inplace= parameter doesn't really make
sense for a KeyedHeap anway, so you could just have::

Heap(sequence=None, inplace=False)
KeyedHeap(key, sequence=None)

Of course, this approach ends up with a bunch of code duplication again.

STeVe
 
P

Paul Rubin

Steven Bethard said:
Heap(sequence=None, inplace=False)
KeyedHeap(key, sequence=None)
Of course, this approach ends up with a bunch of code duplication again.

Maybe there's a way to use a metaclass that can make either type of
heap but they'd share most methods.
 
B

bearophileHUGS

bearophile:
I don't like your solution, this class was already slow enough. Don't
use unbound methods with this class :)

Sorry for raising this discussion after so much time. Another possibile
solution is to use the normal methods for the normal case, and replace
them only if key is present (this reduces problems with unbound methods
but you may use the wrong method).

Example:

def __init__(self, sequence=None, key=None, inplace=False):
...
if key is None:
...
else:
...
self.pop = self.__pop_key

...
def pop(self):
return heappop(self._heap)
def __pop_key(self):
return heappop(self._heap)[2]


Someone else has done something similar:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/e61ba58c9722bf51/

http://sourceforge.net/tracker/index.php?func=detail&aid=1162363&group_id=5470&atid=305470

Bye,
bearophile
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top