list conversion question

K

Kent Tenney

Howdy,

I am using PIL, and would like to
make a list of the pixel values
in an image, sorted by quantity of
pixels with that value.

im.histogram() returns a list which
is a pixel count for each brightness value.

So, I want to sort the histogram, and have
the resulting list contain
the indexes instead of the counts.

This seems like it'd be a fairly common
task, but I don't know what to call
it to look for guidance.

Any help is appreciated.

Thanks,
Kent
 
P

Paul McGuire

Kent Tenney said:
Howdy,

I am using PIL, and would like to
make a list of the pixel values
in an image, sorted by quantity of
pixels with that value.

im.histogram() returns a list which
is a pixel count for each brightness value.

So, I want to sort the histogram, and have
the resulting list contain
the indexes instead of the counts.

This seems like it'd be a fairly common
task, but I don't know what to call
it to look for guidance.

Any help is appreciated.

Thanks,
Kent
Assuming im.histogram() returns a list like [ 0, 1, 0, 5, 43, etc. ] how
about:

hist = [ 0, 1, 0, 5, 43 ]
values = [ i for i in enumerate(hist)]
values.sort(lambda a,b: cmp(b[1],a[1]))
indexes = [ a for a,b in values ]

-- Paul
 
A

Andrew Dalke

Paul said:
Assuming im.histogram() returns a list like [ 0, 1, 0, 5, 43, etc. ] how
about:

hist = [ 0, 1, 0, 5, 43 ]
values = [ i for i in enumerate(hist)]
values.sort(lambda a,b: cmp(b[1],a[1]))
indexes = [ a for a,b in values ]

or tweaked a bit for speed (a sort with a lambda is expensive)
and for clarity, IMO,

pairs = [(value, offset) for (offset, value) in enumerate(hist)]
pairs.sort()
indexes = [offset for (value, offset) in pairs]

In Python2.4 this is allowed
>>> hist = [ 0, 1, 0, 5, 43 ]
>>> [pair[0] for pair in sorted(enumerate(hist),
.... key=lambda pair: pair[1])]
[0, 2, 1, 3, 4]
Not yet sure that that's a good thing.

Andrew
(e-mail address removed)
 
P

Paul McGuire

Andrew Dalke said:
Paul said:
Assuming im.histogram() returns a list like [ 0, 1, 0, 5, 43, etc. ] how
about:

hist = [ 0, 1, 0, 5, 43 ]
values = [ i for i in enumerate(hist)]
values.sort(lambda a,b: cmp(b[1],a[1]))
indexes = [ a for a,b in values ]

or tweaked a bit for speed (a sort with a lambda is expensive)
and for clarity, IMO,

pairs = [(value, offset) for (offset, value) in enumerate(hist)]
pairs.sort()
indexes = [offset for (value, offset) in pairs]

In Python2.4 this is allowed
hist = [ 0, 1, 0, 5, 43 ]
[pair[0] for pair in sorted(enumerate(hist),
... key=lambda pair: pair[1])]
[0, 2, 1, 3, 4]
Not yet sure that that's a good thing.

Andrew
(e-mail address removed)

I assumed that the resulting list should be sorted in descending frequency
order. Probably the fastest way to do this (since we are tweaking for
speed) would be to just change the pairs list comp to "pairs = [(-value,
offset) for (offset, value) in enumerate(hist)]", or your Py 2.4 key clause
from "key=lambda pair: pair[1]" to "key=lambda pair: -pair[1]".

-- Paul
 
A

Andrew Dalke

Paul said:
Probably the fastest way to do this (since we are tweaking for
speed) would be to just change the pairs list comp to "pairs = [(-value,
offset) for (offset, value) in enumerate(hist)]", or your Py 2.4 key clause
from "key=lambda pair: pair[1]" to "key=lambda pair: -pair[1]".

Ahh, right. The new Python2.4 'sorted' also has a
"reversed" flag, so
>>> hist = [ 0, 1, 0, 5, 43 ]
>>> [pair[0] for pair in sorted(enumerate(hist),
.... key=lambda pair: pair[1],
.... reverse=True)]
[4, 3, 1, 0, 2]
This isn't the same as as reverse() after the sort(),

pairs = [(value, offset) for (offset, value) in enumerate(hist)]
pairs.sort()
indexes = [offset for (value, offset) in pairs]
indexes.reverse()

or

list(reversed([pair[0] for pair in sorted(enumerate(hist),
key=lambda pair: pair[1])]))

Both of these give

[4, 3, 1, 2, 0]

The difference is that sort is now a guaranteed
stable sort. Using 'reverse=True' means the index
to the first 0 (at index 0) appears before the index
to the seond 0 (at index 2). Using reverse() or
reversed() flips that direction.

Andrew
(e-mail address removed)



which would give


[0, 2, 1, 3, 4]

Andrew
 
J

Jp Calderone

Andrew said:
Paul said:
Assuming im.histogram() returns a list like [ 0, 1, 0, 5, 43, etc. ] how
about:

hist = [ 0, 1, 0, 5, 43 ]
values = [ i for i in enumerate(hist)]
values.sort(lambda a,b: cmp(b[1],a[1]))
indexes = [ a for a,b in values ]


or tweaked a bit for speed (a sort with a lambda is expensive)
and for clarity, IMO,

pairs = [(value, offset) for (offset, value) in enumerate(hist)]
pairs.sort()
indexes = [offset for (value, offset) in pairs]

In Python2.4 this is allowed
hist = [ 0, 1, 0, 5, 43 ]
[pair[0] for pair in sorted(enumerate(hist),
... key=lambda pair: pair[1])]
[0, 2, 1, 3, 4]

Even faster, though perhaps not as clear:
>>> import operator
>>> hist = [ 0, 1, 0, 5, 43 ]
>>> map(operator.itemgetter(0), sorted(enumerate(hist),
... key=operator.itemgetter(1)))
[0, 2, 1, 3, 4]

Jp
 
P

Peter Otten

A

Andrew Dalke

Peter said:
Yet another option, not benchmarked, perhaps clearer:
hist = [0, 1, 0, 5, 43]
indices = range(len(hist))
indices.sort(key=hist.__getitem__)
indices

[0, 2, 1, 3, 4]

Looks the fastest to me. Here's my results, with
the benchmark below. In this list:
sort0 = Paul McGuire's solution with a lambda
sort1 = my solution with the pair flipped so the
default sort works
sort2 = my solution using Python 2.4's "keys"
sort3 = Jp Calderone's variant of that with itemgetter
sort4 = your (Petter Otten's) solution

list of size 0
sort0 0.0932559967041
sort1 0.0773079395294
sort2 0.176142930984
sort3 0.23094701767
sort4 0.089989900589
list of size 10
sort0 0.866734981537
sort1 0.524667978287
sort2 0.553129911423
sort3 0.529242992401
sort4 0.265748023987
list of size 100
sort0 18.9233298302
sort1 5.04968500137
sort2 5.11950612068
sort3 4.75884985924
sort4 3.0580329895
list of size 1000
sort0 262.217221022
sort1 70.5779349804
sort2 69.9501719475
sort3 54.6528921127
sort4 39.5281660557

Here's my benchmark code


import random, operator

def make_random_list(n):
# Ensure there will be duplicates
choose_from = range(n//3+1)
return [random.choice(choose_from) for i in range(n)]

def do_sort0(hist):
values = [ i for i in enumerate(hist)]
values.sort(lambda a,b: cmp(b[1],a[1]))
return [ a for a,b in values ]

def do_sort1(hist):
pairs = [(value, offset) for (offset, value) in enumerate(hist)]
pairs.sort()
return [offset for (value, offset) in pairs]

def do_sort2(hist):
return [pair[0] for pair in sorted(enumerate(hist),
key=lambda pair: pair[1])]

def do_sort3(hist):
return map(operator.itemgetter(0), sorted(enumerate(hist),
key=operator.itemgetter(1)))

def do_sort4(hist):
indices = range(len(hist))
indices.sort(key=hist.__getitem__)
return indices

data = make_random_list(100)
assert (do_sort0(data) == do_sort1(data) == do_sort2(data) ==
do_sort3(data) == do_sort4(data))

import timeit

for size in [0, 10, 100, 1000]:
print " list of size", size
for i in range(5):
name = "sort%d" % i
t = timeit.Timer(
setup= ("import __main__\nhist=__main__.make_random_list(%d)" %
size),
stmt = "__main__.do_%s(hist)" % name)
print name, t.timeit(10000)

Andrew
(e-mail address removed)
 
R

Raymond Hettinger

[Peter Otten]
Even faster, though perhaps not as clear:
import operator
hist = [ 0, 1, 0, 5, 43 ]
map(operator.itemgetter(0), sorted(enumerate(hist),
... key=operator.itemgetter(1)))
[0, 2, 1, 3, 4]

Yet another option, not benchmarked, perhaps clearer:
hist = [0, 1, 0, 5, 43]
indices = range(len(hist))
indices.sort(key=hist.__getitem__)
indices
[0, 2, 1, 3, 4]

These are both fast and clean. Consider posting them to the ASPN
cookbook so they won't get lost.


Raymond Hettinger
 
R

Raymond Hettinger

[Peter Otten]
Yet another option, not benchmarked, perhaps clearer:
hist = [0, 1, 0, 5, 43]
indices = range(len(hist))
indices.sort(key=hist.__getitem__)
indices

[0, 2, 1, 3, 4]

[Andrew Dalke]
Looks the fastest to me. Here's my results, with
the benchmark below.
. . .

These results are good. Consider posting them in the ASPN Cookbook so they
won't get lost.


Raymond Hettinger
 
A

Alex Martelli

Andrew Dalke said:
In Python2.4 this is allowed
hist = [ 0, 1, 0, 5, 43 ]
[pair[0] for pair in sorted(enumerate(hist),
... key=lambda pair: pair[1])]
[0, 2, 1, 3, 4]
Not yet sure that that's a good thing.

it makes the fast (DSU) way of sorting easier, and can be sped up
further as:

[ p[0] for p in sorted(enumerate(hist), operator.itemgetter(1)) ]

The new itemgetter and attrgetter HOFs in module operator are quite good
for this kind of use (which is exactly why they got introduced).


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,206
Messages
2,571,069
Members
47,674
Latest member
scazeho

Latest Threads

Top