Loopless syntax for 2d in NumPy (or Numarray)

2

2mc

I am finding out all kinds of ways to do things in NumPy through the
many suggestions I have received. It's exciting. Thanks to all who
have replied on my other threads.

I'm still having trouble with one thing. Let me set a scenario and
see if anyone has any ideas.

Assume a multidimensional array (2d). This would be like a
spreadsheet of rows and columns. Further, assume hundreds of 'rows'
and 3 columns. Suppose I want a running list of the highest value in a
single column for 20 'rows'. So, starting at 'row' 19, the answer
would be the highest value from 'row' 0 to 'row' 19. Then, at 'row'
20, the answer would be the highest value from 'row' 1 to 'row' 20.
And, so on. Further, suppose I want this value for each 'column'.
The result would be a 3 'column' array with 19 less rows than the
source array and would contain a running list of highest values of
each column for the last 20 rows.

How would this be done without loops? Or, at least without looping
through every row.

If it can't be done, is this something that numarray could do?

Thanks a million.

Matt
 
T

Terry Reedy

2mc said:
Assume a multidimensional array (2d). This would be like a
spreadsheet of rows and columns. Further, assume hundreds of 'rows'
and 3 columns. Suppose I want a running list of the highest value in a
single column for 20 'rows'. So, starting at 'row' 19, the answer
would be the highest value from 'row' 0 to 'row' 19. Then, at 'row'
20, the answer would be the highest value from 'row' 1 to 'row' 20.
And, so on. Further, suppose I want this value for each 'column'.
The result would be a 3 'column' array with 19 less rows than the
source array and would contain a running list of highest values of
each column for the last 20 rows.

How would this be done without loops? Or, at least without looping
through every row.

Just curious: is this a real problem, or one you made up to stump
NumPy? Keep in mind that NumPy was written to do typical array
operations, linear algebra/analysis, ffts, and even some things not so
typical. A moving maximum is highly nonlinear, unusual, and likely to
need explicit looping to be done efficiently.

I think the way to avoid redoing the max from scratch with each move
of the window is to use a heap augmented by a circular index that
enables easy replacement of the departing number with the incoming
number. After the replacement, re-establish the heap property and
record the max. (For more on heaps, see heapq.py in the lib or
various algorithm books.)

Terry J. Reedy
 
2

2mc

Terry Reedy said:
Just curious: is this a real problem, or one you made up to stump
NumPy?

Yes, this is a real problem. Of course, it involves much more than
this. But, yes, I would like to get a running list of highest values
within a range of values.
Keep in mind that NumPy was written to do typical array operations, linear
algebra/analysis, ffts, and even some things not so typical. A moving
maximum is highly nonlinear, unusual, and likely to need explicit looping to
be done efficiently.

I think the way to avoid redoing the max from scratch with each move
of the window is to use a heap augmented by a circular index that
enables easy replacement of the departing number with the incoming
number. After the replacement, re-establish the heap property and
record the max. (For more on heaps, see heapq.py in the lib or
various algorithm books.)

I didn't know there was a function called heapq.py. I have 3 books:
Learning Python, Python Essential Reference, and Practical Python. I
also have read the Numerical Python manual on the numerical Python
website. None of these mentioned it.

I have a program that was written in a programming language called SPL
(Smart Programming Language). It is a Pascal-like scripting language
that worked with an older suite of programs called "Smartware 2000."
It is like Visual Basic for Applications except that it isn't OO. In
this language I use multidimensional arrays of quite large sizes and
explicitly declare loops to accomplish a lot of what I'm doing. I
keep the arrays entirely in RAM for speed purposes. But, because it
was a program designed to work with a suite of applications (though it
does not have to use any of them - just like Visual Basic) it has a
lot of overhead that slows the program down.

So, I want to port this program to something else that will speed the
process up. In investigating Python one of the comments I read about
it was that it was slower than other languages but easier to program -
that is, it was truly a rapid application developer. Because it was
slow, an extension module was created to provide true multidimensional
arrays with a computational speed only slightly slower than C++. So,
I thought to myself that I would sacrifice some speed through the
normal parts of the program for the gain of computaional speed using
this multidimensional array extension in those parts needing it and
also for the ease of application design. My concern is that I don't
inadvertently write the program in such a way as to slow it down
rather than taking advantage of these modules that provide assistance
with regard to speed. The language is just enough different that I'm
having trouble thinking in NumPy.

Your suggestion to use heapq was interesting. Basically, in my
current program I use something similar. I sort, enter new data into
oldest data's spot, resort, find the max, and repeat.

I have a couple of questions about heapq. The website says that it is
an array but that it works on lists. So, heapq would not work on
NumPy arrays, right? And, it is then a normal Python array rather
than a NumPy array, right? Is there a comparable speed increase using
heapq over explicit Python loops as there is in NumPy over explicit
Python loops?

I guess I'm looking for the fastest way in Python or in any Python
module to accomplish what I want to do.

Thanks for your response. I appreciate it.

Matt
 
T

Terry Reedy

2mc said:
"Terry Reedy" <[email protected]> wrote in message

Yes, this is a real problem. Of course, it involves much more than
this. But, yes, I would like to get a running list of highest values
within a range of values.

I didn't know there was a function called heapq.py.

It is a module added to the standard library perhaps in 2.2. Not
surprised if not in books.
Your suggestion to use heapq was interesting. Basically, in my
current program I use something similar. I sort, enter new data into
oldest data's spot, resort, find the max, and repeat.

Same idea. I think best resort method should be binary search for
insertion spot followed by shift-1 movement of off-by-1 block toward
empty spot and then insertion of new item in proper place.

The same operation for heaps would be O(logN) instead of O(N), but
with a higher constant, so it might or might not be faster for N=19.
(But I suspect it would be.)
I have a couple of questions about heapq. The website says that it is
an array but that it works on lists. So, heapq would not work on
NumPy arrays, right? And, it is then a normal Python array rather
than a NumPy array, right?

The algorithms, with adjustment, can use any type of random access
array, but there would not be any obvious benefit of shifting. heapq,
as implied by the q for queue, is intended for heap use as priority
queue. It has a replace_max function but not a
replace-arbitrary-element operation. Regardless of what array type
you use, you would have to write your own replace function. But it
should be a straight forward adjustment of the current replace
function -- once you understand heaps.
Is there a comparable speed increase using
heapq over explicit Python loops as there is in NumPy over explicit
Python loops?

No, heapq is written in Python and uses normal looping. But do keep
in mind that pure Python code, especially when using ints, floats, and
loops, can ofter run several times faster when psyco-ized.
Thanks for your response. I appreciate it.

Welcome,

Terry J. Reedy
 
2

2mc

Terry Reedy said:
It is a module added to the standard library perhaps in 2.2. Not
surprised if not in books.


Same idea. I think best resort method should be binary search for
insertion spot followed by shift-1 movement of off-by-1 block toward
empty spot and then insertion of new item in proper place.

The same operation for heaps would be O(logN) instead of O(N), but
with a higher constant, so it might or might not be faster for N=19.
(But I suspect it would be.)


The algorithms, with adjustment, can use any type of random access
array, but there would not be any obvious benefit of shifting. heapq,
as implied by the q for queue, is intended for heap use as priority
queue. It has a replace_max function but not a
replace-arbitrary-element operation. Regardless of what array type
you use, you would have to write your own replace function. But it
should be a straight forward adjustment of the current replace
function -- once you understand heaps.


No, heapq is written in Python and uses normal looping. But do keep
in mind that pure Python code, especially when using ints, floats, and
loops, can ofter run several times faster when psyco-ized.

Thanks for your responses. They have been most helpful. Thank you
for your kindness.

I've looked a little bit at Psyco. I have a couple of questions -
what else is new? :)

1. Would a psyco-ized normal Python program that performs calculations
be slower, faster, or as fast as the equivalent in NumPy?

2. Can Numerical Python be psyco-ized? I'm supposing it cannot.

Thanks again. I appreciate it.

Matt
 
F

Fernando Perez

2mc said:
How would this be done without loops? Or, at least without looping
through every row.

For those cases when you have no option but looping, the easiest approach out
there is probably weave.inline(). Sorry I can't give you too many more
details right now, but if you go here:

http://windom.colorado.edu/~fperez/python/python-c/

you should be able to get started quickly. For weave itself, go to

http://scipy.org/

weave.inline() is one amazing piece of work.

Cheers,

f
 
T

Terry Reedy

2mc said:
1. Would a psyco-ized normal Python program that performs calculations
be slower, faster, or as fast as the equivalent in NumPy?

This is not an either/or choice. Psyco is best used on a few
bottlenect functions. (Currently, it greatly expands the size of what
is works on.) If you did a running max with heaps, that would be a
candidate. Large array calcs done in NumPy as it is meant to be used
should be fastest.

If speed is a problem, also look at Weave as Fernando suggested. That
might be alternative for all or part of running max.

For spicific answers for your problem on your system, you will have to
test time yourself. New time_it make simple stuff pretty easy. We
have given you about as much general answer as is possible.
2. Can Numerical Python be psyco-ized? I'm supposing it cannot.

Correct, only the Python code calling it, but there would be little
point to that probably.

Terry J. Reedy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,816
Latest member
SapanaCarpetStudio

Latest Threads

Top