ctype performance benchmark

aurora00 · Jul 17, 2009

I have done some performance benchmarking for Python's ctypes library.
I think you may find it interesting. I am planning to use ctypes as an
alternative to writing C extension module for performance enhancement.
Therefore my use case is slight different from the typical use case
for accessing existing third party C libraries. In this case I am both
the user and the implementer of the C library.

In order to determine what is the right granularity for context
switching between Python and C, I have done some benchmarking. I
mainly want to measure the function call overhead. So the test
functions are trivial function like returning the first character of a
string. I compare a pure Python function versus C module function
versus ctypes function. The tests are ran under Python 2.6 on Windows
XP with Intel 2.33Ghz Core Duo.

First of all I want to compare the function to get the first character
of a string. The most basic case is to reference it as the 0th element
of a sequence without calling any function. The produce the fastest
result at 0.0659 usec per loop.

$ timeit "'abc'[0]"

10000000 loops, best of 3: 0.0659 usec per loop

As soon as I build a function around it, the cost goes up
substantially. Both pure Python and C extension method shows similar
performance at around 0.5 usec. ctypes function takes about 2.5 times
as long at 1.37 usec.

$ timeit -s "f=lambda s: s[0]" "f('abc')"

1000000 loops, best of 3: 0.506 usec per loop

$ timeit -s "import mylib" "mylib.py_first('abc')"

1000000 loops, best of 3: 0.545 usec per loop

$ timeit -s "import ctypes; dll = ctypes.CDLL('mylib.pyd')"
"dll.first('abc')"

1000000 loops, best of 3: 1.37 usec per loop

I repeated the test with a long string (1MB). There are not much
difference in performance. So I can be quite confident that the
parameter is passed by reference (of the internal buffer).

$ timeit -s "f=lambda s: s[0]; lstr='abcde'*200000"
"f(lstr)"

1000000 loops, best of 3: 0.465 usec per loop

$ timeit -s "import mylib; lstr='abcde'*200000"
"mylib.py_first(lstr)"

1000000 loops, best of 3: 0.539 usec per loop

$ timeit -s "import ctypes; dll = ctypes.CDLL('mylib.pyd')"
-s "lstr='abcde'*200000"
"dll.first(lstr)"

1000000 loops, best of 3: 1.4 usec per loop

Next I have make some attempts to speed up ctypes performance. A
measurable improvement can be attained by eliminating the attribute
look up for the function. Curiously this shows no improvement in the
similar case for C extension.

$ timeit -s "import ctypes; dll = ctypes.CDLL('mylib.pyd');
-s "f=dll.first"
"f('abcde')"

1000000 loops, best of 3: 1.18 usec per loop

Secondary I have tried to specify the ctypes function prototype. This
actually decrease the performance significantly.

$ timeit -s "import ctypes; dll = ctypes.CDLL('mylib.pyd')"
-s "f=dll.first"
-s "f.argtypes=[ctypes.c_char_p]"
-s "f.restype=ctypes.c_int"
"f('abcde')"

1000000 loops, best of 3: 1.57 usec per loop

Finally I have tested passing multiple parameters into the function.
One of the parameter is passed by reference in order to return a
value. Performance decrease as the number of parameter increase.

$ timeit -s "charAt = lambda s, size, pos: s[pos]"
-s "s='this is a test'"
"charAt(s, len(s), 1)"

1000000 loops, best of 3: 0.758 usec per loop

$ timeit -s "import mylib; s='this is a test'"
"mylib.py_charAt(s, len(s), 1)"

1000000 loops, best of 3: 0.929 usec per loop

$ timeit -s "import ctypes"
-s "dll = ctypes.CDLL('mylib.pyd')"
-s "s='this is a test'"
-s "ch = ctypes.c_char()"
"dll.charAt(s, len(s), 1, ctypes.byref(ch))"

100000 loops, best of 3: 2.5 usec per loop

One style of coding that improve the performance somewhat is to build
a C struct to hold all the parameters.

$ timeit -s "from test_mylib import dll, charAt_param"
-s "s='this is a test'"
-s "obj = charAt_param(s=s, size=len(s), pos=3, ch='')"
"dll.charAt_struct(obj)"

1000000 loops, best of 3: 1.71 usec per loop

This may work because most of the fields in the charAt_param struct
are invariant in the loop. Having them in the same struct object save
them from getting rebuilt each time.

My overall observation is that ctypes function has an overhead that is
2 to 3 times to a similar C extension function. This may become a
limiting factor if the function calls are fine grained. Using ctypes
for performance enhancement is a lot more productive if the interface
can be made to medium or coarse grained.

A snapshot of the source code used for testing is available for
download

http://tungwaiyip.info/blog/2009/07/16/ctype_performance_benchmark

Wai Yip Tung

Stefan Behnel · Jul 17, 2009

My overall observation is that ctypes function has an overhead that is
2 to 3 times to a similar C extension function. This may become a
limiting factor if the function calls are fine grained. Using ctypes
for performance enhancement is a lot more productive if the interface
can be made to medium or coarse grained.

I think ctypes is ok for its niche: easy interfacing with C stuff from
straight Python code, without further dependencies. There will always (and
necessarily) be a call overhead involved. Anyone who needs to care about
per-call performance would use something else anyway (do I need to mention
Cython here?)

Stefan

Wai Yip · Jul 17, 2009

I started with ctypes because it is the battery included with the
Python standard library. My code is very fluid and I'm looking for
easy opportunity to optimize it. One example is to find the longest
common prefix among two strings. Right now I am comparing it character
by character with pure Python. It seems like an ideal low hanging
fruit. With ctype I can rewriting a few lines of Python into a few
lines of C. All the tools are available and no third party library is
needed.

It turned out the performance fail to match pure Python. This function
is called million of times in an inner loop. The overhead overwhelm
any performance gain with C. Eventually I have found success grouping
the data into larger chunk for each API call. This is what I
originally planned to do anyway. I am only sharing my experience here
that doing fine grained ctype function call has its limitation.

I have looked into Pyrex before and I was quite confused by the fusion
of C and Python langauage. Perhaps it is time for me to give it a
second look. I just heard of Cython now and it also look interesting.
I think it is helpful for both project to articulate more clearly what
they are, how they achieve the performance gain and to give more
guidance on good use cases. On the other hand, perhaps it is just me
are confused because I don't know enough of the Python internal.

Wai Yip

Deepcopying a byte string is quicker than copying it - problem?	1	Feb 27, 2014
Why __slots__ slows down attribute access?	2	Aug 23, 2011
Question about timeit	0	Jul 22, 2011
How to get time (milisecond) of a python IO execution	2	Sep 15, 2013
json vs. simplejson	4	May 13, 2009
generator expressions: performance anomaly?	10	Jan 16, 2005
Integer From A Float List?!?	1	Mar 5, 2005
"optimizing out" getattr	5	Sep 15, 2005

ctype performance benchmark

aurora00

Stefan Behnel

Wai Yip

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads