Testing for performance regressions

  • Thread starter Steven D'Aprano
  • Start date
S

Steven D'Aprano

I'm writing some tests to check for performance regressions (i.e. you
change a function, and it becomes much slower) and I was hoping for some
guidelines or hints.

This is what I have come up with so far:


* The disclaimers about timing code snippets that can be found in the
timeit module apply. If possible, use timeit rather than roll-you-own
timers.

* Put performance tests in a separate test suite, because they're
logically independent of regression tests and functional tests, and
therefore you might not want to run them all the time.

* Never compare the speed of a function to some fixed amount of time,
since that will depend on the hardware you are running on, but compare it
relative to some other function's running time. E.g.:

# Don't do this:
time_taken = Timer(my_func).timeit() # or similar
assert time_taken <= 10
# This is bad, since the test is hardware dependent, and a change
# in environment may cause this to fail even if the function
# hasn't changed.

# Instead do this:
time_taken = Timer(my_func).timeit()
baseline = Timer(simple_func).timeit()
assert time_taken <= 2*baseline
# my_func shouldn't be more than twice as expensive as simple_func
# no matter how fast or slow they are in absolute terms.


Any other lessons or hints I should know?

If it helps, my code will be targeting Python 3.1, and I'm using a
combination of doctest and unittest for the tests.


Thanks in advance,
 
G

geremy condra

I'm writing some tests to check for performance regressions (i.e. you
change a function, and it becomes much slower) and I was hoping for some
guidelines or hints.

This is what I have come up with so far:


* The disclaimers about timing code snippets that can be found in the
timeit module apply. If possible, use timeit rather than roll-you-own
timers.

Huh. In looking into timing attacks actually one of the biggest
lessons I learned was *not* to use timeit- that the overhead and
variance involved in using it will wind up consuming small changes in
behavior in ways that are fairly opaque until you really take it
apart.
* Put performance tests in a separate test suite, because they're
logically independent of regression tests and functional tests, and
therefore you might not want to run them all the time.

* Never compare the speed of a function to some fixed amount of time,
since that will depend on the hardware you are running on, but compare it
relative to some other function's running time. E.g.:

# Don't do this:
time_taken = Timer(my_func).timeit()  # or similar
assert time_taken <= 10
   # This is bad, since the test is hardware dependent, and a change
   # in environment may cause this to fail even if the function
   # hasn't changed.

# Instead do this:
time_taken = Timer(my_func).timeit()
baseline = Timer(simple_func).timeit()
assert time_taken <= 2*baseline
   # my_func shouldn't be more than twice as expensive as simple_func
   # no matter how fast or slow they are in absolute terms.


Any other lessons or hints I should know?

If you can get on it, emulab is great for doing network performance
and correctness testing, and even if you can't it might be worth
running a small one at your company. I wish I'd found out about it
years ago.

Geremy Condra
 
S

Steven D'Aprano

On Mon, Apr 4, 2011 at 7:45 PM, Steven D'Aprano


Huh. In looking into timing attacks actually one of the biggest lessons
I learned was *not* to use timeit- that the overhead and variance
involved in using it will wind up consuming small changes in behavior in
ways that are fairly opaque until you really take it apart.

Do you have more details?

I follow the advice in the timeit module, and only ever look at the
minimum value, and never try to calculate a mean or variance. Given the
number of outside influences ("What do you mean starting up a browser
with 200 tabs at the same time will affect the timing?"), I wouldn't
trust a mean or variance to be meaningful.
 
G

geremy condra

Do you have more details?

I follow the advice in the timeit module, and only ever look at the
minimum value, and never try to calculate a mean or variance. Given the
number of outside influences ("What do you mean starting up a browser
with 200 tabs at the same time will affect the timing?"), I wouldn't
trust a mean or variance to be meaningful.

I think it's safe to treat timeit as an opaque, medium-precision
benchmark with those caveats. If you need actual timing data though-
answering the question 'how much faster?' rather than 'which is
faster?' just taking actual timings seems to provide much, much better
answers. Part of that is because timeit adds the cost of the for loop
to every run- here's the actual code:

def inner(_it, _timer):
%(setup)s
_t0 = _timer()
for _i in _it:
%(stmt)s
_t1 = _timer()
return _t1 - _t0

(taken from Lib/timeit.py line 81)

where %(setup)s and %(stmt)s are what you passed in. Obviously, if the
magnitude of the change you're looking for is smaller than the
variance in the for loop's overhead this makes things a lot harder
than they need to be, and the whole proposition gets pretty dodgy for
measuring in the sub-millisecond range, which is where many timing
attacks are going to lay. It also has some problems at the opposite
end of the spectrum- timing large, long-running, or memory-intensive
chunks of code can be deceptive because timeit runs with the GC
disabled. This bit me a while back working on Graphine, actually, and
it confused the hell out of me at the time.

I'm also not so sure about the 'take the minimum' advice. There's a
reasonable amount of empirical evidence suggesting that timings taken
at the 30-40% mark are less noisy than those taken at either end of
the spectrum, especially if there's a network performance component.
YMMV, of course.

Geremy Condra
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top