In terms of generated machine code, how does hundreds of cases in a switch
differ from hundreds of if-elses? Do compilers or processors do any
optimization on such structured code? Do I need to worry about the
performance, usually?
I got curious about this so I decided to see what one compiler did on one
architecture. I chose the simplest kind of test that would be the most
likely to optimised -- consecutive integers from 0 to N. I wrote a script
that made programs of this form:
#include <stdlib.h>
extern int f(int x, int nc);
void use_if_stmt(int n)
{
int x = 0;
while (n--) {
int v = f(n, <nc>);
if (v == 0) x++;
else if (v == 1) x++;
}
}
void use_switch_stmt(int n)
{
int x = 0;
while (n--) {
int v = f(n, <nc>);
switch (v) {
case 0: x++; break;
case 1: x++; break;
}
}
}
int main(int argc, char **argv)
{
int n = argc > 1 ? strtol(argv[1], NULL, 0) : 10000000;
use_if_stmt(n);
use_switch_stmt(n);
return 0;
}
where the number of cases (and "if" choices) could be varied (<nc> gets
replaced by this value as well). The function f could be one of "low",
"spread", or "high" and the resulting code was linked with the following:
int spread(int n, int nc) { return n % nc; }
int low (int n, int nc) { return 0; }
int high (int n, int nc) { return n + nc; }
to simulate the situation where the value to be matched was always the
first ("low"); evenly spread over the range of choices ("spread"); or
higher than the highest case ("high"). The resulting programs where
compiled both with (gcc -O2) and without optimisation. gprof was used to
compare the running times of the two functions "use_if_stmt" and
"use_switch_stmt". I simply calculated the ration of the "if" time
to the "switch" time, so a result of 1 means that they both took
the same time. A value of 5 means the if version took five times
as long as the switch version.
Without optimisation the results were:
N cases all low spread all high
2 0.8 1.1 0.8
20 0.8 1.7 5.5
200 0.9 8.1 66.4
2000 0.9 112.6 3042.0
almost exactly as I would have guessed. When the value being matched is
the first case, "if" is always faster. With 200 cases and all of the
value off the top of the case range, "if" takes 66 times longer.
What surprised me was the optimised numbers:
N cases all low spread all high
2 1.1 0.9 1.0
20 1.3 0.7 0.9
200 1.2 0.9 1.1
2000 0.8 1.1 1.0
I think that is quite impressive!
gcc (GCC) 3.4.3 on Pentium(R) M processor 1.70GHz