P
Peter Hickman
Whenever the question of performance comes up with scripting languages
such as Ruby, Perl or Python there will be people whose response can be
summarised as "Write it in C". I am one such person. Some people take
offence at this and label us trolls or heretics of the true programming
language (take your pick).
I am assuming here that when people talk about performance they really
mean speed. Some will disagree but this is what I am talking about.
In this post I want to clear some things up and provide benchmarks as to
why you should take "Write it in C" seriously. Personally I like to
write my projects in Ruby or Perl or Python and then convert them to C
to get a performance boost. How much of a boost? Well here I will give
you some hard data and source code so that you can see just how much of
a boost C can give you.
The mini project in question is to generate all the valid Latin squares.
A Latin square is a grid of numbers (lets say a 9 x 9 grid) that has the
numbers 1 to 9 appear only once in each row and column. Sudoku grids are
a subset of Latin squares.
The approach taken is to create a list of all the permutations and then
build up a grid row by row checking that the newly added row does not
conflict with any of the previous rows. If the final row can be added
without problem the solution is printed and the search for the next one
starts. It is in essence depth first search. The first version of the
program that I wrote in Perl took 473 minutes to generate all the valid
5 x 5 Latin squares, version four of the program took 12 minutes and 51
seconds. The C version of the program took 5.5 seconds to produce
identical results. All run on the same hardware.
[Latin]$ time ./Latin1.pl 5 > x5
real 473m45.370s
user 248m59.752s
sys 2m54.598s
[Latin]$ time ./Latin4.pl 5 > x5
real 12m51.178s
user 12m14.066s
sys 0m7.101s
[Latin]$ time ./c_version.sh 5
real 0m5.519s
user 0m4.585s
sys 0m0.691s
This is what I mean when I say that coding in C will improve the
performance of your program. The improvement goes beyond percentages, it
is in orders of magnitude. I think that the effort is worth it. If a 5 x
5 grid with 120 permutations took 12 minutes in Perl, how long would a 6
* 6 grid with 720 permutations take? What unit of measure would you be
using for a 9 x 9 grid?
Size Permutations
==== ============
1 1
2 2
3 6
4 24
5 120
6 720
7 5040
8 40320
9 362880
Now lets look at first version of the code:
1 #!/usr/bin/perl -w
2 use strict;
3 use warnings;
4 use Algorithm:ermute;
5 my $width_of_board = shift;
6 my @permutations;
7 my $p = new Algorithm:ermute( [ 1 .. $width_of_board ] );
8 while ( my @res = $p->next ) {
9 push @permutations, [@res];
10 }
11 my $number_of_permutations = scalar(@permutations);
12 for ( my $x = 0 ; $x < $number_of_permutations ; $x++ ) {
13 add_a_line($x);
14 }
Lines 1 to 11 just build up a list of all the permutations using the
handy Algorithm:ermute module from CPAN. Lines 12 to 14 starts on the
first row of the solution by trying out all possible permutations for
the first row.
15 sub add_a_line {
16 my @lines = @_;
17 my $size = scalar(@lines);
18 my $ok = 1;
19 for ( my $x = 0 ; $x < $size ; $x++ ) {
20 for ( my $y = 0 ; $y < $size ; $y++ ) {
21 if ( $x != $y ) {
22 $ok = 0 unless compare( $lines[$x], $lines[$y] );
23 }
24 }
25 }
26 if ($ok) {
27 if ( $size == $width_of_board ) {
28 print join(':', map { p($_) } @lines) . "\n";
29 }
30 else {
31 for ( my $x = 0 ; $x < $number_of_permutations ;
$x++ ) {
32 add_a_line( @lines, $x );
33 }
34 }
35 }
36 }
The add_a_line() function first checks that none of the lines so far
conflict (have the same digit in the same column), if it passes and the
number of lines equals the size of the board then the result is printed
and another solution is looked for. Failing that another line is added
and add_a_line() is called.
Here is the function that tells if two lines conflict.
37 sub compare {
38 my ( $a, $b ) = @_;
39 my $ok = 1;
40 my @aa = @{ $permutations[$a] };
41 my @bb = @{ $permutations[$b] };
42 for ( my $x = 0 ; $x < $width_of_board ; $x++ ) {
43 $ok = 0 if $aa[$x] == $bb[$x];
44 }
45 return $ok == 1;
46 }
The p() function is a little utility to convert a list into a string for
display.
47 sub p {
48 my ($x) = @_;
49 my @a = @{ $permutations[$x] };
50 my $y = join( '', @a );
51 return $y;
52 }
Well I have just exposed some pretty crap code to eternal ridicule on
the internet, but there you have it. The code is crap, even non Perl
programmers will be able to point out the deficenties with this code. It
works, even though a 5 x 5 grid took 473 minutes to run. Lets try and
salvage some pride and show version four and see how we managed to speed
things up.
1 #!/usr/bin/perl -w
2 use strict;
3 use warnings;
4 use Algorithm:ermute;
5 my $width_of_board = shift;
6 my @permutations;
7 my @output;
8 my %compared;
9 my $p = new Algorithm:ermute( [ 1 .. $width_of_board ] );
10 while ( my @res = $p->next ) {
11 push @permutations, [@res];
12 push @output, join( '', @res );
13 }
14 my $number_of_permutations = scalar(@permutations);
Lines 1 to 14 are doing pretty much what version one was doing except
that a new list, @output, is being built up to precalculate the output
strings and remove the need for the p() function. A minor speed up but
useful.
15 for ( my $x = 0 ; $x < $number_of_permutations ; $x++ ) {
16 for ( my $y = 0 ; $y < $number_of_permutations ; $y++ ) {
17 my $ok = 1;
18 my @aa = @{ $permutations[$x] };
19 my @bb = @{ $permutations[$y] };
20 for ( my $z = 0 ; $z < $width_of_board ; $z++ ) {
21 if ( $aa[$z] == $bb[$z] ) {
22 $ok = 0;
23 last;
24 }
25 }
26 if ( $ok == 1 ) {
27 $compared{"$x:$y"} = 1;
28 }
29 }
30 }
Lines 15 to 30 introduces new code to precalculate the comparisons and
feed the results into a hash. Lines 31 to 33 start the work in the same
way as version one.
31 for ( my $x = 0 ; $x < $number_of_permutations ; $x++ ) {
32 add_a_line($x);
33 }
And now to the improved add_a_line() function. The code has been
improved to only check that the newly added line does not conflict with
any of the existsing lines rather than repeatedly comparing the existing
(valid) lines.
34 sub add_a_line {
35 my @lines = @_;
36 my $size = scalar(@lines);
37 my $ok = 1;
38 if ( $size > 1 ) {
39 for ( my $x = 0 ; $x < ( $size - 1 ) ; $x++ ) {
40 unless ( defined $compared{ $lines[$x] .':'.
$lines[-1] } ) {
41 $ok = 0;
42 last;
43 }
44 }
45 }
46 if ($ok) {
47 if ( $size == $width_of_board ) {
48 print join( ':', map { $output[$_] } @lines ) . "\n";
49 }
50 else {
51 for ( my $x = 0 ; $x < $number_of_permutations ;
$x++ ) {
52 add_a_line( @lines, $x );
53 }
54 }
55 }
56 }
These changes took us down from 473 minutes to just 12. The elimination
of unnessessary comparisons in add_a_line() helped as did the
precalculation of those comparisons. There are lessons to be learnt
here, write decent code and cache repetetive comparisons. There are no
great tricks, just that bad code can cost you dearly and simple things
can bring big improvements. So with such a massive improvement how could
we make our code any faster?
Write it in C.
Having learnt the lessons developing the code in Perl I am not going to
start the whole thing over in C. Using version four I used the
precalculation phase of the Perl scripts to write out a C header file
with data structures that would be useful for the C program.
1 #define WIDTH_OF_BOARD 5
2 #define NUMBER_OF_PERMUTATIONS 120
3 char *output_strings[] = {
4 "54321",
123 "12345",
124 };
125 bool compared[NUMBER_OF_PERMUTATIONS][NUMBER_OF_PERMUTATIONS] = {
126 {false, false, ...
245 {false, false, ...
246 };
247 int work[WIDTH_OF_BOARD];
This then leaves the C code itself. Lines 1 to 7 includes a load of
useful stuff, infact it is also probably including some quite
unneccessary stuff too, I just cut and paste it from another project.
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <stdbool.h>
4 #include <err.h>
5 #include <string.h>
6 #include <unistd.h>
7 #include <sys/types.h>
Line 8 is the header file that Perl precalculated.
8 #include "Latin.h"
Now the meat. The code is pretty much the same as version four just
adapted to C. No special C tricks, no weird pointer stuff, an almost
line for line translation of the Perl code.
9 void
10 add_a_row(int row)
11 {
12 bool is_ok;
13 int x,y;
14 if (row == WIDTH_OF_BOARD) {
15 for (x = 0; x < WIDTH_OF_BOARD; x++) {
16 if (x == 0) {
17 printf("%s", output_strings[work[x]]);
18 } else {
19 printf(":%s", output_strings[work[x]]);
20 }
21 }
22 puts("");
23 } else {
24 for (x = 0; x < NUMBER_OF_PERMUTATIONS; x++) {
25 work[row] = x;
26 is_ok = true;
27 if (row != 0) {
28 for( y = 0; y < row; y++ ) {
29 if(compared[work[row]][work[y]] == false) {
30 is_ok = false;
31 break;
32 }
33 }
34 }
35 if (is_ok == true) {
36 add_a_row(row + 1);
37 }
38 }
39 }
40 }
41 int
42 main(int argc, char *argv[])
43 {
44 add_a_row(0);
45 }
And the C version ran in 5.5 seconds. In fact the 5.5 seconds includes
the Perl program that does all the precalculation to write the Latin.h
header file, the compiling of the C source and finally the running of
the program itself. So we have not cheated by doing some work outside
the timings.
Just think of it, 12 minutes down to 5.5 seconds without having to write
any incomprehensible C code. Because we all know that C code is
completely incomprehensible with it doing all that weird pointer stuff
all the time.
Now the Perl code could be improved, there are tricks that could be
pulled out of the bag to trim something off the 12 minutes. Perhaps
another language would be faster? But how far could you close the gap
from 12 minutes to 5.5 seconds?
Just to up the ante I added -fast -mcpu=7450 to the compiler (gcc
optimized for speed on an G4 Macintosh) and ran it again.
[Latin]$ time ./c_version.sh 5 > x5
real 0m3.986s
user 0m2.810s
sys 0m0.540s
Another 30% performance improvement without changing any code.
Lets review the languages we have used and their advantages. C is very
fast without any stupid tricks. C will give you better control over the
amount of memory you use (the Perl code eats up massive amounts of
memory in comparison, the 9 x 9 grid threw an out of memory error on my
1Gb machine).
It is much easier to develop in Perl. Any error message you get is
likely to at least give you a clue as to what the problem might be. C
programmers have to put up with the likes of 'Bus error' or
'Segmentation fault', which is why C programmers are grouches. Perl also
allows you to significantly improve your code without major rewrites.
There is a module called Memoize that can wrap a function and cache the
calls all by adding two extra lines to your code, the same is true for
most scripting languages.
So what am I recommending here, write all your programs in C? No. Write
all your programs in Perl? No. Write them in your favourite scripting
language to refine the code and then translate it into C if the
performance falls short of your requirements. Even if you intend to
write it in C all along hacking the code in Perl first allows you to
play with the algorithm without having to worry about memory allocation
and other such C style house keeping. Good code is good code in any
language.
If you really really want that performance boost then take the following
advice very seriously - "Write it in C".
such as Ruby, Perl or Python there will be people whose response can be
summarised as "Write it in C". I am one such person. Some people take
offence at this and label us trolls or heretics of the true programming
language (take your pick).
I am assuming here that when people talk about performance they really
mean speed. Some will disagree but this is what I am talking about.
In this post I want to clear some things up and provide benchmarks as to
why you should take "Write it in C" seriously. Personally I like to
write my projects in Ruby or Perl or Python and then convert them to C
to get a performance boost. How much of a boost? Well here I will give
you some hard data and source code so that you can see just how much of
a boost C can give you.
The mini project in question is to generate all the valid Latin squares.
A Latin square is a grid of numbers (lets say a 9 x 9 grid) that has the
numbers 1 to 9 appear only once in each row and column. Sudoku grids are
a subset of Latin squares.
The approach taken is to create a list of all the permutations and then
build up a grid row by row checking that the newly added row does not
conflict with any of the previous rows. If the final row can be added
without problem the solution is printed and the search for the next one
starts. It is in essence depth first search. The first version of the
program that I wrote in Perl took 473 minutes to generate all the valid
5 x 5 Latin squares, version four of the program took 12 minutes and 51
seconds. The C version of the program took 5.5 seconds to produce
identical results. All run on the same hardware.
[Latin]$ time ./Latin1.pl 5 > x5
real 473m45.370s
user 248m59.752s
sys 2m54.598s
[Latin]$ time ./Latin4.pl 5 > x5
real 12m51.178s
user 12m14.066s
sys 0m7.101s
[Latin]$ time ./c_version.sh 5
real 0m5.519s
user 0m4.585s
sys 0m0.691s
This is what I mean when I say that coding in C will improve the
performance of your program. The improvement goes beyond percentages, it
is in orders of magnitude. I think that the effort is worth it. If a 5 x
5 grid with 120 permutations took 12 minutes in Perl, how long would a 6
* 6 grid with 720 permutations take? What unit of measure would you be
using for a 9 x 9 grid?
Size Permutations
==== ============
1 1
2 2
3 6
4 24
5 120
6 720
7 5040
8 40320
9 362880
Now lets look at first version of the code:
1 #!/usr/bin/perl -w
2 use strict;
3 use warnings;
4 use Algorithm:ermute;
5 my $width_of_board = shift;
6 my @permutations;
7 my $p = new Algorithm:ermute( [ 1 .. $width_of_board ] );
8 while ( my @res = $p->next ) {
9 push @permutations, [@res];
10 }
11 my $number_of_permutations = scalar(@permutations);
12 for ( my $x = 0 ; $x < $number_of_permutations ; $x++ ) {
13 add_a_line($x);
14 }
Lines 1 to 11 just build up a list of all the permutations using the
handy Algorithm:ermute module from CPAN. Lines 12 to 14 starts on the
first row of the solution by trying out all possible permutations for
the first row.
15 sub add_a_line {
16 my @lines = @_;
17 my $size = scalar(@lines);
18 my $ok = 1;
19 for ( my $x = 0 ; $x < $size ; $x++ ) {
20 for ( my $y = 0 ; $y < $size ; $y++ ) {
21 if ( $x != $y ) {
22 $ok = 0 unless compare( $lines[$x], $lines[$y] );
23 }
24 }
25 }
26 if ($ok) {
27 if ( $size == $width_of_board ) {
28 print join(':', map { p($_) } @lines) . "\n";
29 }
30 else {
31 for ( my $x = 0 ; $x < $number_of_permutations ;
$x++ ) {
32 add_a_line( @lines, $x );
33 }
34 }
35 }
36 }
The add_a_line() function first checks that none of the lines so far
conflict (have the same digit in the same column), if it passes and the
number of lines equals the size of the board then the result is printed
and another solution is looked for. Failing that another line is added
and add_a_line() is called.
Here is the function that tells if two lines conflict.
37 sub compare {
38 my ( $a, $b ) = @_;
39 my $ok = 1;
40 my @aa = @{ $permutations[$a] };
41 my @bb = @{ $permutations[$b] };
42 for ( my $x = 0 ; $x < $width_of_board ; $x++ ) {
43 $ok = 0 if $aa[$x] == $bb[$x];
44 }
45 return $ok == 1;
46 }
The p() function is a little utility to convert a list into a string for
display.
47 sub p {
48 my ($x) = @_;
49 my @a = @{ $permutations[$x] };
50 my $y = join( '', @a );
51 return $y;
52 }
Well I have just exposed some pretty crap code to eternal ridicule on
the internet, but there you have it. The code is crap, even non Perl
programmers will be able to point out the deficenties with this code. It
works, even though a 5 x 5 grid took 473 minutes to run. Lets try and
salvage some pride and show version four and see how we managed to speed
things up.
1 #!/usr/bin/perl -w
2 use strict;
3 use warnings;
4 use Algorithm:ermute;
5 my $width_of_board = shift;
6 my @permutations;
7 my @output;
8 my %compared;
9 my $p = new Algorithm:ermute( [ 1 .. $width_of_board ] );
10 while ( my @res = $p->next ) {
11 push @permutations, [@res];
12 push @output, join( '', @res );
13 }
14 my $number_of_permutations = scalar(@permutations);
Lines 1 to 14 are doing pretty much what version one was doing except
that a new list, @output, is being built up to precalculate the output
strings and remove the need for the p() function. A minor speed up but
useful.
15 for ( my $x = 0 ; $x < $number_of_permutations ; $x++ ) {
16 for ( my $y = 0 ; $y < $number_of_permutations ; $y++ ) {
17 my $ok = 1;
18 my @aa = @{ $permutations[$x] };
19 my @bb = @{ $permutations[$y] };
20 for ( my $z = 0 ; $z < $width_of_board ; $z++ ) {
21 if ( $aa[$z] == $bb[$z] ) {
22 $ok = 0;
23 last;
24 }
25 }
26 if ( $ok == 1 ) {
27 $compared{"$x:$y"} = 1;
28 }
29 }
30 }
Lines 15 to 30 introduces new code to precalculate the comparisons and
feed the results into a hash. Lines 31 to 33 start the work in the same
way as version one.
31 for ( my $x = 0 ; $x < $number_of_permutations ; $x++ ) {
32 add_a_line($x);
33 }
And now to the improved add_a_line() function. The code has been
improved to only check that the newly added line does not conflict with
any of the existsing lines rather than repeatedly comparing the existing
(valid) lines.
34 sub add_a_line {
35 my @lines = @_;
36 my $size = scalar(@lines);
37 my $ok = 1;
38 if ( $size > 1 ) {
39 for ( my $x = 0 ; $x < ( $size - 1 ) ; $x++ ) {
40 unless ( defined $compared{ $lines[$x] .':'.
$lines[-1] } ) {
41 $ok = 0;
42 last;
43 }
44 }
45 }
46 if ($ok) {
47 if ( $size == $width_of_board ) {
48 print join( ':', map { $output[$_] } @lines ) . "\n";
49 }
50 else {
51 for ( my $x = 0 ; $x < $number_of_permutations ;
$x++ ) {
52 add_a_line( @lines, $x );
53 }
54 }
55 }
56 }
These changes took us down from 473 minutes to just 12. The elimination
of unnessessary comparisons in add_a_line() helped as did the
precalculation of those comparisons. There are lessons to be learnt
here, write decent code and cache repetetive comparisons. There are no
great tricks, just that bad code can cost you dearly and simple things
can bring big improvements. So with such a massive improvement how could
we make our code any faster?
Write it in C.
Having learnt the lessons developing the code in Perl I am not going to
start the whole thing over in C. Using version four I used the
precalculation phase of the Perl scripts to write out a C header file
with data structures that would be useful for the C program.
1 #define WIDTH_OF_BOARD 5
2 #define NUMBER_OF_PERMUTATIONS 120
3 char *output_strings[] = {
4 "54321",
123 "12345",
124 };
125 bool compared[NUMBER_OF_PERMUTATIONS][NUMBER_OF_PERMUTATIONS] = {
126 {false, false, ...
245 {false, false, ...
246 };
247 int work[WIDTH_OF_BOARD];
This then leaves the C code itself. Lines 1 to 7 includes a load of
useful stuff, infact it is also probably including some quite
unneccessary stuff too, I just cut and paste it from another project.
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <stdbool.h>
4 #include <err.h>
5 #include <string.h>
6 #include <unistd.h>
7 #include <sys/types.h>
Line 8 is the header file that Perl precalculated.
8 #include "Latin.h"
Now the meat. The code is pretty much the same as version four just
adapted to C. No special C tricks, no weird pointer stuff, an almost
line for line translation of the Perl code.
9 void
10 add_a_row(int row)
11 {
12 bool is_ok;
13 int x,y;
14 if (row == WIDTH_OF_BOARD) {
15 for (x = 0; x < WIDTH_OF_BOARD; x++) {
16 if (x == 0) {
17 printf("%s", output_strings[work[x]]);
18 } else {
19 printf(":%s", output_strings[work[x]]);
20 }
21 }
22 puts("");
23 } else {
24 for (x = 0; x < NUMBER_OF_PERMUTATIONS; x++) {
25 work[row] = x;
26 is_ok = true;
27 if (row != 0) {
28 for( y = 0; y < row; y++ ) {
29 if(compared[work[row]][work[y]] == false) {
30 is_ok = false;
31 break;
32 }
33 }
34 }
35 if (is_ok == true) {
36 add_a_row(row + 1);
37 }
38 }
39 }
40 }
41 int
42 main(int argc, char *argv[])
43 {
44 add_a_row(0);
45 }
And the C version ran in 5.5 seconds. In fact the 5.5 seconds includes
the Perl program that does all the precalculation to write the Latin.h
header file, the compiling of the C source and finally the running of
the program itself. So we have not cheated by doing some work outside
the timings.
Just think of it, 12 minutes down to 5.5 seconds without having to write
any incomprehensible C code. Because we all know that C code is
completely incomprehensible with it doing all that weird pointer stuff
all the time.
Now the Perl code could be improved, there are tricks that could be
pulled out of the bag to trim something off the 12 minutes. Perhaps
another language would be faster? But how far could you close the gap
from 12 minutes to 5.5 seconds?
Just to up the ante I added -fast -mcpu=7450 to the compiler (gcc
optimized for speed on an G4 Macintosh) and ran it again.
[Latin]$ time ./c_version.sh 5 > x5
real 0m3.986s
user 0m2.810s
sys 0m0.540s
Another 30% performance improvement without changing any code.
Lets review the languages we have used and their advantages. C is very
fast without any stupid tricks. C will give you better control over the
amount of memory you use (the Perl code eats up massive amounts of
memory in comparison, the 9 x 9 grid threw an out of memory error on my
1Gb machine).
It is much easier to develop in Perl. Any error message you get is
likely to at least give you a clue as to what the problem might be. C
programmers have to put up with the likes of 'Bus error' or
'Segmentation fault', which is why C programmers are grouches. Perl also
allows you to significantly improve your code without major rewrites.
There is a module called Memoize that can wrap a function and cache the
calls all by adding two extra lines to your code, the same is true for
most scripting languages.
So what am I recommending here, write all your programs in C? No. Write
all your programs in Perl? No. Write them in your favourite scripting
language to refine the code and then translate it into C if the
performance falls short of your requirements. Even if you intend to
write it in C all along hacking the code in Perl first allows you to
play with the algorithm without having to worry about memory allocation
and other such C style house keeping. Good code is good code in any
language.
If you really really want that performance boost then take the following
advice very seriously - "Write it in C".