R
Rouslan Korneychuk
I mentioned before that I had a proof of concept to convert Python
bytecode to native machine code.
It's available at https://github.com/Rouslan/nativecompile
Now that I have a substantial number of the bytecode instructions
implemented, I thought I would share some benchmark results.
The first test performs a quicksort on a list of 100 numbers, 5000
times. The second calculates all the prime numbers up to 10000000. Each
test is run three times in a row, first with the interpreter, then with
the compiled code.
#### SCRIPT ONE ####
import time
import random
import nativecompile
bcode = compile('''
def quicksort(array):
if len(array) <= 1:
return array
pindex = len(array)//2
pivot = array[pindex]
less = []
greater = []
for i,x in enumerate(array):
if i != pindex:
(less if x <= pivot else greater).append(x)
return quicksort(less) + [pivot] + quicksort(greater)
in_ = list(range(100))
random.seed(346097)
random.shuffle(in_)
t = time.clock()
for x in range(5000):
out = quicksort(in_)
t = time.clock()-t
assert out == sorted(in_)
print('execution time: {}'.format(round(t,10)))
''','<string>','exec')
mcode = nativecompile.compile(bcode)
print('byte code')
for x in range(3): eval(bcode)
print()
print('machine code')
for x in range(3): mcode()
print()
#### OUTPUT ####
byte code
execution time: 1.77
execution time: 1.76
execution time: 1.77
machine code
execution time: 1.42
execution time: 1.42
execution time: 1.42
#### SCRIPT TWO ####
import time
import math
import nativecompile
bcode = compile('''
def primes_list(upto):
nums = [True] * (upto//2-1)
for i in range(3,math.floor(math.sqrt(upto))+1,2):
if nums[i//2-1]:
for j in range(i*3,upto,i*2):
nums[j//2-1] = False
primes = []
for i,n in enumerate(nums):
if n: primes.append((i+1)*2+1)
return primes
t = time.clock()
primes = primes_list(10000000)
t = time.clock()-t
print(primes[-1])
print('execution time: {}'.format(round(t,10)))
''','<string>','exec')
mcode = nativecompile.compile(bcode)
print('byte code')
for x in range(3): eval(bcode)
print()
print('machine code')
for x in range(3): mcode()
print()
#### OUTPUT ####
byte code
9999991
execution time: 3.47
9999991
execution time: 3.38
9999991
execution time: 3.36
machine code
9999991
execution time: 2.95
9999991
execution time: 2.96
9999991
execution time: 2.95
The results are not terribly impressive, but it's something.
Also, although I wasn't intending on doing anything more complicated
than getting rid of the interpreter loop, I'm starting to notice little
ways the code can be optimized without needing run-time analysis. The
most obvious is looping over a range object. I could do away with the
iterator and just use a native integer (and have the program fall back
to the iterator interface if 'range' didn't refer to the built-in range
object after all).
bytecode to native machine code.
It's available at https://github.com/Rouslan/nativecompile
Now that I have a substantial number of the bytecode instructions
implemented, I thought I would share some benchmark results.
The first test performs a quicksort on a list of 100 numbers, 5000
times. The second calculates all the prime numbers up to 10000000. Each
test is run three times in a row, first with the interpreter, then with
the compiled code.
#### SCRIPT ONE ####
import time
import random
import nativecompile
bcode = compile('''
def quicksort(array):
if len(array) <= 1:
return array
pindex = len(array)//2
pivot = array[pindex]
less = []
greater = []
for i,x in enumerate(array):
if i != pindex:
(less if x <= pivot else greater).append(x)
return quicksort(less) + [pivot] + quicksort(greater)
in_ = list(range(100))
random.seed(346097)
random.shuffle(in_)
t = time.clock()
for x in range(5000):
out = quicksort(in_)
t = time.clock()-t
assert out == sorted(in_)
print('execution time: {}'.format(round(t,10)))
''','<string>','exec')
mcode = nativecompile.compile(bcode)
print('byte code')
for x in range(3): eval(bcode)
print()
print('machine code')
for x in range(3): mcode()
print()
#### OUTPUT ####
byte code
execution time: 1.77
execution time: 1.76
execution time: 1.77
machine code
execution time: 1.42
execution time: 1.42
execution time: 1.42
#### SCRIPT TWO ####
import time
import math
import nativecompile
bcode = compile('''
def primes_list(upto):
nums = [True] * (upto//2-1)
for i in range(3,math.floor(math.sqrt(upto))+1,2):
if nums[i//2-1]:
for j in range(i*3,upto,i*2):
nums[j//2-1] = False
primes = []
for i,n in enumerate(nums):
if n: primes.append((i+1)*2+1)
return primes
t = time.clock()
primes = primes_list(10000000)
t = time.clock()-t
print(primes[-1])
print('execution time: {}'.format(round(t,10)))
''','<string>','exec')
mcode = nativecompile.compile(bcode)
print('byte code')
for x in range(3): eval(bcode)
print()
print('machine code')
for x in range(3): mcode()
print()
#### OUTPUT ####
byte code
9999991
execution time: 3.47
9999991
execution time: 3.38
9999991
execution time: 3.36
machine code
9999991
execution time: 2.95
9999991
execution time: 2.96
9999991
execution time: 2.95
The results are not terribly impressive, but it's something.
Also, although I wasn't intending on doing anything more complicated
than getting rid of the interpreter loop, I'm starting to notice little
ways the code can be optimized without needing run-time analysis. The
most obvious is looping over a range object. I could do away with the
iterator and just use a native integer (and have the program fall back
to the iterator interface if 'range' didn't refer to the built-in range
object after all).