Python CPU

P

Paul Rubin

Gregory Ewing said:
What might help more is having bytecodes that operate on
arrays of unboxed types -- numpy acceleration in hardware.

That is an interesting idea as an array or functools module patch.
Basically a way to map or fold arbitrary functions over arrays, with a
few obvious optimizations to avoid refcount churning. It could have
helped with a number of things I've done over the years.
 
G

geremy condra

What might help more is having bytecodes that operate on
arrays of unboxed types -- numpy acceleration in hardware.

I'd be interested in seeing the performance impact of this, although I
wonder if it'd be feasible.

Geremy Condra
 
T

Terry Reedy

That is an interesting idea as an array or functools module patch.
Basically a way to map or fold arbitrary functions over arrays, with a
few obvious optimizations to avoid refcount churning. It could have
helped with a number of things I've done over the years.

For map, I presume you are thinking of an array.map(func) in system code
(C for CPython) equivalent to

def map(self,func):
for i,ob in enumerate(self):
self = func(ob)

The question is whether it would be enough faster. Of course, what would
really be needed for speed are wrapped system-coded funcs that map would
recognize and pass and received unboxed array units to and from. At that
point, we just about invented 1-D numpy ;-).

I have always thought the array was underutilized, but I see now that it
only offers Python code space saving at a cost of interconversion time.
To be really useful, arrays of unboxed data, like strings and bytes,
need system-coded functions that directly operate on the unboxed data,
like strings and bytes have. Array comes with a few, but very few,
generic sequence methods, like .count(x) (a special-case of reduction).
 
T

Terry Reedy

That is an interesting idea as an array or functools module patch.
Basically a way to map or fold arbitrary functions over arrays, with a
few obvious optimizations to avoid refcount churning. It could have
helped with a number of things I've done over the years.

For map, I presume you are thinking of an array.map(func) in system code
(C for CPython) equivalent to

def map(self,func):
for i,ob in enumerate(self):
self = func(ob)

The question is whether it would be enough faster. Of course, what would
really be needed for speed are wrapped system-coded funcs that map would
recognize and pass and received unboxed array units to and from. At that
point, we just about invented 1-D numpy ;-).

I have always thought the array was underutilized, but I see now that it
only offers Python code space saving at a cost of interconversion time.
To be really useful, arrays of unboxed data, like strings and bytes,
need system-coded functions that directly operate on the unboxed data,
like strings and bytes have. Array comes with a few, but very few,
generic sequence methods, like .count(x) (a special-case of reduction).


After posting this, I realized that ctypes makes it easy to find and
wrap functions in a shared library as a Python object (possibly with
parameter annotations) that could be passed to array.map, etc. No
swigging needed, which is harder than writing simple C functions. So a
small extension to array with .map, .filter, .reduce, and a wrapper
class would be more useful than I thought.
 
J

John Nagle

What might help more is having bytecodes that operate on
arrays of unboxed types -- numpy acceleration in hardware.

That sort of thing was popular in the era of the early
Cray machines. Once superscalar CPUs were developed,
the overhead on tight inner loops went down, and several
iterations of a loop could be in the pipeline at one time,
if they didn't conflict. Modern superscalar machines have
register renaming, so the same program-visible register on
two successive iterations can map to different registers within
the CPU, allowing two iterations of the same loop to execute
simultaneously. This eliminates the need for loop unrolling and
Duff's device.

John Nagle
 
P

Paul Rubin

John Nagle said:
That sort of thing was popular in the era of the early
Cray machines. Once superscalar CPUs were developed,
the overhead on tight inner loops went down, and several
iterations of a loop could be in the pipeline at one time,

Vector processors are back, they just call them GPGPU's now.
 
G

Gregory Ewing

Terry said:
So a
small extension to array with .map, .filter, .reduce, and a wrapper
class would be more useful than I thought.

Also useful would be some functions for doing elementwise
operations between arrays. Sometimes you'd like to just do
a bit of vector arithmetic, and pulling in the whole of
numpy as a dependency seems like overkill.
 
G

Gregory Ewing

geremy said:
I'd be interested in seeing the performance impact of this, although I
wonder if it'd be feasible.

A project I have in the back of my mind goes something
like this:

1) Design an instruction set for a Python machine and
a microcode architecture to support it

2) Write a simulator for it

3) Use the simulator to evaluate how effective it would
be if actually implemented, e.g. in an FPGA.

And if I get that far:

4) (optional) Get hold of a real FPGA and implement it
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,164
Messages
2,570,901
Members
47,439
Latest member
elif2sghost

Latest Threads

Top