Python CPU

B

Brad

Hi All,

I've heard of Java CPUs. Has anyone implemented a Python CPU in VHDL
or Verilog?

-Brad
 
N

Nobody

I've heard of Java CPUs. Has anyone implemented a Python CPU in VHDL
or Verilog?

Java is a statically-typed language which makes a distinction between
primitive types (bool, int, double, etc) and objects. Python is a
dynamically-typed language which makes no such distinction. Even something
as simple as "a + b" can be a primitive addition, a bigint addition, a
call to a.__add__(b) or a call to b.__radd__(a), depending upon the values
of a and b (which can differ for different invocations of the same code).

This is one of the main reasons that statically-typed languages exist, and
are used for most production software.
 
S

Stefan Behnel

Nobody, 01.04.2011 18:52:
Java is a statically-typed language which makes a distinction between
primitive types (bool, int, double, etc) and objects. Python is a
dynamically-typed language which makes no such distinction. Even something
as simple as "a + b" can be a primitive addition, a bigint addition, a
call to a.__add__(b) or a call to b.__radd__(a), depending upon the values
of a and b (which can differ for different invocations of the same code).

This is one of the main reasons that statically-typed languages exist, and
are used for most production software.

I doubt that the reason they are "used for most production software" is a
technical one.

Stefan
 
G

geremy condra

Nobody, 01.04.2011 18:52:

I doubt that the reason they are "used for most production software" is a
technical one.

I also suspect that there's some confusion between duck typing and
typelessness going on here.

Geremy Condra
 
J

John Nagle

On 4/1/2011 11:28 AM Emile van Sebille said...

Sorry - wrong url in the cut'n paste buffer -

http://tsheffler.com/software/python/

Neither of those is a hardware implementation of Python.
"Python on a chip" is a small Python-subset interpreter for
microcontrollers. That could be useful if it's not too slow.

Sheffler's software is a means for controlling and extending
Verilog simulations in Python. (Often, you're simulating some
hardware at the gate level which interfaces with other existing
hardware, say a disk, which can be simulated at a much coarser
level. Being able to write the simulator for the external devices
in Python is useful.)

John Nagle
 
B

BartC

Brad said:
Hi All,

I've heard of Java CPUs. Has anyone implemented a Python CPU in VHDL
or Verilog?

For what purpose, improved performance? In that case, there's still plenty
of scope for that on conventional CPUs.

The Java VM is fairly low level (I would guess, not being too familiar with
it), while the Python VM seems much higher level and awkward to implement
directly in hardware.

I don't think it's impossible, but the benefits probably would not match
those of improving, say, Cpython on conventional hardware. And if a Python
CPU couldn't also run non-Python code efficiently, then on a typical
workload with mixed languages, it could be much slower!

However, wasn't there a Python version that used JVM? Perhaps that might run
on a Java CPU, and it would be interesting to see how well it works.
 
G

Gregory Ewing

Brad said:
I've heard of Java CPUs. Has anyone implemented a Python CPU in VHDL
or Verilog?

Not that I know of.

I've had thoughts about designing one, just for the exercise.

It's doubtful whether such a thing would ever be of practical
use. Without as much money as Intel has to throw at CPU
development, it's likely that a Python chip would always be
slower and more expensive than an off-the-shelf CPU running
a tightly-coded interpreter.

It could be fun to speculate on what a Python CPU might
look like, though.
 
S

Steven D'Aprano

Not that I know of.

I've had thoughts about designing one, just for the exercise.

It's doubtful whether such a thing would ever be of practical use.
Without as much money as Intel has to throw at CPU development, it's
likely that a Python chip would always be slower and more expensive than
an off-the-shelf CPU running a tightly-coded interpreter.

I recall back in the late 80s or early 90s, Apple and Texas Instruments
collaborated to build a dual-CPU Lisp machine. I don't remember all the
details, but it was an Apple Macintosh II with a second CPU running (I
think) a TI Explorer (possibly on a Nubus card?), with an integration
layer that let the two hardware machines talk to each other. It was dual-
branded Apple and TI.

It was a major flop. It was released around the time that general purpose
CPUs started to get fast enough to run Lisp code faster than a custom-
made Lisp CPU could. I don't remember the actual pricing, so I'm going to
make it up... you got better performance from a standard Mac II with
software Lisp for (say) $12,000 than you got with a dedicated Lisp
machine for (say) $20,000.

(These are vaguely recalled 1980s prices. I'm assuming $10K for a Mac II
and $2K for the Lisp compiler. Of course these days a $400 entry level PC
is far more powerful than a Mac II.)

There were also Forth chips, which let you run Forth in hardware. I
believe they were much faster than Forth in software, but were killed by
the falling popularity of Forth.
 
S

Steven D'Aprano

However, wasn't there a Python version that used JVM? Perhaps that might
run on a Java CPU, and it would be interesting to see how well it works.

Not only *was* there one, but there still is: Jython. Jython is one of
the "Big Three" Python implementations:

* CPython (the one you're probably using)
* Jython (Python on Java)
* IronPython (Python on .Net)

with PyPy (Python on Python) catching up.

http://www.jython.org/
 
J

John Nagle

There were also Forth chips, which let you run Forth in hardware. I
believe they were much faster than Forth in software, but were killed by
the falling popularity of Forth.

The Forth chips were cute, and got more done with fewer gates than
almost anything else. But that didn't matter for long.
Willow Garage has a custom Forth chip they use in their Ethernet
cameras, but it's really a FPGA.

A tagged machine might make Python faster. You could have
unboxed ints and floats, yet still allow values of other types,
with the hardware tagging helping with dispatch. But it probably
wouldn't help all that much. It didn't in the LISP machines.

John Nagle
 
P

Paul Rubin

John Nagle said:
The Forth chips were cute, and got more done with fewer gates than
almost anything else. But that didn't matter for long.
Willow Garage has a custom Forth chip they use in their Ethernet
cameras, but it's really a FPGA.

You can order 144-core Forth chips right now,

http://greenarrays.com/home/products/index.html

They are asynchronous cores running at around 700 mhz, so you get an
astounding amount of raw compute power per watt and per dollar. But for
me at least, it's not that easy to figure out applications where their
weird architecture fits well.
 
W

Werner Thie

You probably heard of the infamous FORTH chips like the Harris RTX2000,
or ShhBoom, which implemented a stack oriented very low power design
before there were FPGAs in silicon. To my knowledge the RTX2000 is still
used for space hardened application and if I search long enough I might
fine the one I had sitting in my cellar.

The chip was at that time so insanely fast that it could produce video
signals with FORTH programs driving the IO pins. Chuck Moore, father of
FORTH developed the chip on silicon in FORTH itself.

Due to the fact, that the instruction sets of a FORTH machine, being a
very general stack based von Neumann system, I believe that starting
with an RTX2000 (which should be available in VHDL) one could quite fast
be at a point where things make sense, meaning not going for the
'fastest' ever CPU but for the advantage of having a decent CPU
programmable in Python sitting on a chip with a lot of hardware available.

Another thing worth to mention in this context is for sure the work
available on http://www.myhdl.org/doku.php.

Werner
 
J

John Nagle

You probably heard of the infamous FORTH chips like the Harris RTX2000,
or ShhBoom, which implemented a stack oriented very low power design
before there were FPGAs in silicon. To my knowledge the RTX2000 is still
used for space hardened application and if I search long enough I might
fine the one I had sitting in my cellar.

The chip was at that time so insanely fast that it could produce video
signals with FORTH programs driving the IO pins. Chuck Moore, father of
FORTH developed the chip on silicon in FORTH itself.

He did version 1, which had a broken integer divide operation.
(Divisors which were odd numbers produced wrong answers. Really.)
I came across one of those in a demo setup at a surplus store in
Silicon Valley, driving the CRT and with Moore's interface that
did everything with chords on three buttons.
Due to the fact, that the instruction sets of a FORTH machine, being a
very general stack based von Neumann system, I believe that starting
with an RTX2000 (which should be available in VHDL) one could quite fast
be at a point where things make sense, meaning not going for the
'fastest' ever CPU but for the advantage of having a decent CPU
programmable in Python sitting on a chip with a lot of hardware available.

Willow Garage has VHDL available for a Forth CPU. It's only 200
lines.

The Forth CPUs have three separate memories - RAM, Forth stack,
and return stack. All three are accessed on each cycle. Back before
microprocessors had caches, this was a win over traditional CPUs,
where memory had to be accessed sequentially for those functions.
Once caches came in, it was a lose.

It's interesting that if you wanted to design a CPU for Googles's
"nativeclient" approach for executing native code in the browser,
a separate return point stack would be a big help. Google's
"nativeclient" system protects return points, so that you can tell,
from the source code, all the places control can go. This is
a protection against redirection via buffer overflows, something
that's possible on x86 because the return points and other data
share the same stack.

Note that if you run out of return point stack, or parameter
stack, you're stuck. So there's a hardware limit on call depth.
National Semiconductor once built a CPU with a separate return
point stack with a depth of 20. Big mistake.

(All of this is irrelevant to Python, though. Most of Python's
speed problems come from spending too much time looking up attributes
and functions in dictionaries.)

John Nagle
 
N

Nobody

Note that if you run out of return point stack, or parameter
stack, you're stuck. So there's a hardware limit on call depth.
National Semiconductor once built a CPU with a separate return
point stack with a depth of 20. Big mistake.

The 8-bit PIC microcontrollers have a separate return stack. The PIC10 has
a 2-level stack, the PIC16 has 8 levels, and the PIC18 has 31 levels.

But these chips range from 16 bytes of RAM and 256 words of flash for a
PIC10, through 64-256 bytes of RAM and 1-4K words of flash for a PIC16, up
to 2KiB of RAM and 16K words of flash for a PIC18, so you usually run out
of something else long before the maximum stack depth becomes an issue.
 
D

Dennis Lee Bieber

The 8-bit PIC microcontrollers have a separate return stack. The PIC10 has
a 2-level stack, the PIC16 has 8 levels, and the PIC18 has 31 levels.

But these chips range from 16 bytes of RAM and 256 words of flash for a
PIC10, through 64-256 bytes of RAM and 1-4K words of flash for a PIC16, up
to 2KiB of RAM and 16K words of flash for a PIC18, so you usually run out
of something else long before the maximum stack depth becomes an issue.

Not an architecture on which to code a recursive Fibonacci sequence. <G>
 
G

Gregory Ewing

Paul said:
You can order 144-core Forth chips right now,

http://greenarrays.com/home/products/index.html

They are asynchronous cores running at around 700 mhz, so you get an
astounding amount of raw compute power per watt and per dollar. But for
me at least, it's not that easy to figure out applications where their
weird architecture fits well.

Hmmm... Maybe compiling Python to Forth would make sense?-)
 
G

Gregory Ewing

John said:
A tagged machine might make Python faster. You could have
unboxed ints and floats, yet still allow values of other types,
with the hardware tagging helping with dispatch. But it probably
wouldn't help all that much. It didn't in the LISP machines.

What might help more is having bytecodes that operate on
arrays of unboxed types -- numpy acceleration in hardware.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,164
Messages
2,570,901
Members
47,439
Latest member
elif2sghost

Latest Threads

Top