C extension makes things slower

P

Phil Tomson

In general I've always seen things speed up when I've writtten C
extensions. I usually don't rewrite all of the methods in a class, just
certain speed critical ones.

In this case I've got a Point class:

#point.rb
module ACO
class Point
attr_accessor :x, :y
def initialize(x,y)
@x = x
@y = y
end

#calculate the distance between this point and other point
def -(other)
Math.sqrt((@x-other.x)**2+(@y-other.y)**2)
end

def ==(other)
return (@x == other.x) && (@y == other.y)
end

def to_s
"(#{@x},#{@y})"
end
end
end #ACO

if $0 == __FILE__
start_time = Time.now
100000.times do |i|
p1 = ACO::point.new(rand(50),rand(50))
p2 = ACO::point.new(rand(50),rand(50))
p1 == p2
p2 == p1
p1-p2
p2-p1
end
end_time = Time.now
puts "Total time for pure ruby point test: #{end_time - start_time}"
require 'ext/ACO_Ext' #load up the extension bundle
start_time = Time.now
100000.times do |i|
p1 = ACO::point.new(rand(50),rand(50))
p2 = ACO::point.new(rand(50),rand(50))
p1 == p2
p2 == p1
p1-p2
p2-p1
end
end_time = Time.now
puts "Total time for extension point test: #{end_time - start_time}"
end #point.rb


Here's the C code for the extension:
//aco_ext.c
#include "ruby.h"
#include <math.h>
#include <stdio.h>

static int id_x;
static int id_y;
static int id_equal;

static VALUE aco_point_diff(VALUE self, VALUE other){
double self_x = NUM2DBL(rb_iv_get(self, "@x"));
double self_y = NUM2DBL(rb_iv_get(self, "@y"));
double other_x= NUM2DBL(rb_iv_get(other, "@x"));
double other_y= NUM2DBL(rb_iv_get(other, "@y"));
double xdiffsq= pow( (self_x - other_x),2 );
double ydiffsq= pow( (self_y - other_y),2 );
return rb_float_new(sqrt( xdiffsq + ydiffsq ) );

}

static VALUE aco_point_init(VALUE self, VALUE x, VALUE y){
rb_iv_set(self,"@x",x);
rb_iv_set(self,"@y",y);
return self;
}


static VALUE aco_point_equal(VALUE self, VALUE other){
double self_x = NUM2DBL(rb_iv_get(self, "@x"));
double self_y = NUM2DBL(rb_iv_get(self, "@y"));
double other_x= NUM2DBL(rb_iv_get(other, "@x"));
double other_y= NUM2DBL(rb_iv_get(other, "@y"));
return ((self_x == other_x) && (self_y == other_y));
}


VALUE cACOMod;
VALUE cACOPoint;
VALUE cACOGraph;


void Init_ACO_Ext() {
printf("ACO_Ext initializing...\n");
cACOMod = rb_define_module("ACO");
cACOPoint = rb_define_class_under(cACOMod,"Point",rb_cObject);
cACOGraph = rb_define_class_under(cACOMod,"Graph",rb_cObject);
rb_define_method(cACOPoint,"initialize",aco_point_init,2);
rb_define_method(cACOPoint,"-",aco_point_diff,1);
rb_define_method(cACOPoint,"==",aco_point_equal,1);


id_x = rb_intern("x");
id_y = rb_intern("y");
id_equal = rb_intern("==");
}
//end of aco_ext.c


And the results:
l% ruby point.rb
Total time for pure ruby point test: 8.600752
ACO_Ext initializing...
Total time for extension point test: 6.086567

Using the extension it's 2.6 second _slower_ than the pure Ruby
implementation!?

Phil
 
P

Pit Capitain

Phil said:
And the results:
l% ruby point.rb
Total time for pure ruby point test: 8.600752
ACO_Ext initializing...
Total time for extension point test: 6.086567

Using the extension it's 2.6 second _slower_ than the pure Ruby
implementation!?

Phil, 6 seconds is _faster_ than 8.6 seconds...

Regards,
Pit
 
C

Charles Mills

Phil said:
In general I've always seen things speed up when I've writtten C
extensions. I usually don't rewrite all of the methods in a class, just
certain speed critical ones.

In this case I've got a Point class:
...

And the results:
l% ruby point.rb
Total time for pure ruby point test: 8.600752
ACO_Ext initializing...
Total time for extension point test: 6.086567

Using the extension it's 2.6 second _slower_ than the pure Ruby
implementation!?

Phil

Ruby defines an ID type, it may be a long (not a int) which could cause
breakage on 64bit systems.
If your interested in making the code faster you could use a C struct
and a struct RDATA to represent a point instead of a ruby Object. This
will save the hash lookups.

-Charlie
 
P

Phil Tomson

Phil, 6 seconds is _faster_ than 8.6 seconds...

<kidding> Yeah, but it's not as much faster as I wanted it to be
</kidding>

Sheepish grin... Ah, right you are. Things look rather backwards at 1AM.

However, here's the result of another run:

Total time for pure ruby point test: 8.051388
ACO_Ext initializing...
Total time for extension point test: 9.212984

There, I knew I wasn't hallucinating...

Anyway, I'm starting to think that the variance might have more to do
with loading of the machine I'm running on. Should probably use
benchmark instead. Also, it was quite consistently slower (the C
extension version) on my Powerbook, but then again I was running low on
battery so that might have had something to do with it (still odd,
though).

In reality, the code above is just a small part of a much bigger program
which includes more methods defined in C. When I ran that it was
consistently slower by several seconds (actually almost twice as
slow) when the extension was 'required'. That's what led me to start to
time different sections of the C extenstion seperately in order to find
the culprit.

Phil
 
J

Joe Van Dyk

=20
You're doing things like
=20
static VALUE aco_point_diff(VALUE self, VALUE other){
double self_x =3D NUM2DBL(rb_iv_get(self, "@x"));
double self_y =3D NUM2DBL(rb_iv_get(self, "@y"));
double other_x=3D NUM2DBL(rb_iv_get(other, "@x"));
double other_y=3D NUM2DBL(rb_iv_get(other, "@y"));
double xdiffsq=3D pow( (self_x - other_x),2 );
double ydiffsq=3D pow( (self_y - other_y),2 );
return rb_float_new(sqrt( xdiffsq + ydiffsq ) );
=20
}
=20
when you probably want something resembling
=20
typedef struct {
double x;
double y;
} ACO_Point_Struct;
=20
static VALUE
aco_point_diff(VALUE self, VALUE other)
{
ACO_Point_Struct *_self, *_other;
double dx, dy;
=20
Data_Get_Struct(self, ACO_Point_Struct, _self);
Data_Get_Struct(other, ACO_Point_Struct, _other);
=20
dx =3D _self->x - _other->x;
dy =3D _self->y - _other->y;
return rb_float_new(sqrt(dx * dx + dy * dy));
}
=20

Wow, neat. Does Data_Get_Struct extract out members of a Ruby object
and assign them to corresponding members of a C struct?
 
J

Joel VanderWerf

Joe said:
Wow, neat. Does Data_Get_Struct extract out members of a Ruby object
and assign them to corresponding members of a C struct?

No. Ruby instance vars are stored separately from the C struct belonging
to a T_DATA object. In fact, in the case of T_DATA (and also T_ARRAY,
T_PROC, etc.), instance variables are stored in a big global hash,
indexed first by object id then by ivar name, IIRC. In the case of
T_OBJECT (an instance of a normal user-defined ruby class) the ivars are
stored ina hash in the object itself.

When you allocate a T_DATA with the Data_Make_Struct (or
Data_Wrap_Struct) macro, you also allocate (or pass in) a pointer to
your own struct. The Data_Get_Struct macro, given a T_DATA, merely gives
you back this pointer.

The README.EXT in the ruby source covers this stuff if you want more detail.
 
P

Phil Tomson

You're doing things like

static VALUE aco_point_diff(VALUE self, VALUE other){
double self_x = NUM2DBL(rb_iv_get(self, "@x"));
double self_y = NUM2DBL(rb_iv_get(self, "@y"));
double other_x= NUM2DBL(rb_iv_get(other, "@x"));
double other_y= NUM2DBL(rb_iv_get(other, "@y"));
double xdiffsq= pow( (self_x - other_x),2 );
double ydiffsq= pow( (self_y - other_y),2 );
return rb_float_new(sqrt( xdiffsq + ydiffsq ) );

}

when you probably want something resembling

typedef struct {
double x;
double y;
} ACO_Point_Struct;

static VALUE
aco_point_diff(VALUE self, VALUE other)
{
ACO_Point_Struct *_self, *_other;
double dx, dy;

Data_Get_Struct(self, ACO_Point_Struct, _self);
Data_Get_Struct(other, ACO_Point_Struct, _other);

dx = _self->x - _other->x;
dy = _self->y - _other->y;
return rb_float_new(sqrt(dx * dx + dy * dy));
}

Thanks, that's _MUCH_ (and consistently) faster:

$ ruby point.rb
Total time for pure ruby point test: 6.977688
ACO_Ext initializing...
Total time for extension point test: 2.777282


And now my real application that uses point is much faster too:
$ruby TSP.rb
-->best tour in gen 0: 8667.97295111485
-->global best tour is: 8128.7749762166
Total time: 3.635439

....used to be something like 22 seconds.

Oh, for the sake of complete documentation in case someone has similar
questions in the future:

I had to add an allocate function so here's the new version of aco_ext.c:

//aco_ext.c
#include "ruby.h"
#include <math.h>
#include <stdio.h>

static int id_x;
static int id_y;
static int id_equal;

typedef struct {
double x;
double y;
} ACO_Point_Struct;

static void aco_point_free(void* p){
free(p);
}

static VALUE aco_point_alloc(VALUE klass) {
VALUE object;
ACO_Point_Struct* point = (ACO_Point_Struct*)malloc(sizeof(
ACO_Point_Struct));
object = Data_Wrap_Struct(klass,0,aco_point_free,point);
return object;
}

static VALUE aco_point_init(VALUE self, VALUE x, VALUE y){
ACO_Point_Struct* _self ;
//double x, y;
Data_Get_Struct(self, ACO_Point_Struct, _self);
_self->x = NUM2DBL(x);
_self->y = NUM2DBL(y);
return self;
}

static VALUE
aco_point_diff(VALUE self, VALUE other)
{
ACO_Point_Struct *_self, *_other;
double dx, dy;

Data_Get_Struct(self, ACO_Point_Struct, _self);
Data_Get_Struct(other, ACO_Point_Struct, _other);

dx = _self->x - _other->x;
dy = _self->y - _other->y;
return rb_float_new(sqrt(dx * dx + dy * dy));
}

static VALUE aco_point_equal(VALUE self, VALUE other){
ACO_Point_Struct *_self;
ACO_Point_Struct *_other;
double x, y;
Data_Get_Struct(self, ACO_Point_Struct, _self);
Data_Get_Struct(other, ACO_Point_Struct, _other);
return((_self->x == _other->x) && (_self->y == _other->y));
}

VALUE cACOMod;
VALUE cACOPoint;
VALUE cACOGraph;

void Init_ACO_Ext() {
printf("ACO_Ext initializing...\n");
cACOMod = rb_define_module("ACO");
cACOPoint = rb_define_class_under(cACOMod,"Point",rb_cObject);
cACOGraph = rb_define_class_under(cACOMod,"Graph",rb_cObject);
rb_define_alloc_func(cACOPoint, aco_point_alloc);
rb_define_method(cACOPoint,"initialize",aco_point_init,2);
rb_define_method(cACOPoint,"-",aco_point_diff,1);
rb_define_method(cACOPoint,"==",aco_point_equal,1);

id_x = rb_intern("x");
id_y = rb_intern("y");
id_equal = rb_intern("==");
}
//end aco_ext.c


Phil
 
C

Charles Mills

Phil said:
Thanks, that's _MUCH_ (and consistently) faster:

$ ruby point.rb
Total time for pure ruby point test: 6.977688
ACO_Ext initializing...
Total time for extension point test: 2.777282


And now my real application that uses point is much faster too:
$ruby TSP.rb
-->best tour in gen 0: 8667.97295111485
-->global best tour is: 8128.7749762166
Total time: 3.635439

...used to be something like 22 seconds.

Oh, for the sake of complete documentation in case someone has similar
questions in the future:

I had to add an allocate function so here's the new version of aco_ext.c:

//aco_ext.c
#include "ruby.h"
#include <math.h>
#include <stdio.h>

static int id_x;
static int id_y;
static int id_equal;

typedef struct {
double x;
double y;
} ACO_Point_Struct;

static void aco_point_free(void* p){
free(p);
}

static VALUE aco_point_alloc(VALUE klass) {
VALUE object;
ACO_Point_Struct* point = (ACO_Point_Struct*)malloc(sizeof(
ACO_Point_Struct));
object = Data_Wrap_Struct(klass,0,aco_point_free,point);
return object;
}

static VALUE aco_point_init(VALUE self, VALUE x, VALUE y){
ACO_Point_Struct* _self ;
//double x, y;
Data_Get_Struct(self, ACO_Point_Struct, _self);
_self->x = NUM2DBL(x);
_self->y = NUM2DBL(y);
return self;
}

static VALUE
aco_point_diff(VALUE self, VALUE other)
{
ACO_Point_Struct *_self, *_other;
double dx, dy;

Data_Get_Struct(self, ACO_Point_Struct, _self);
Data_Get_Struct(other, ACO_Point_Struct, _other);

dx = _self->x - _other->x;
dy = _self->y - _other->y;
return rb_float_new(sqrt(dx * dx + dy * dy));
}

static VALUE aco_point_equal(VALUE self, VALUE other){
ACO_Point_Struct *_self;
ACO_Point_Struct *_other;
double x, y;
Data_Get_Struct(self, ACO_Point_Struct, _self);
Data_Get_Struct(other, ACO_Point_Struct, _other);
return((_self->x == _other->x) && (_self->y == _other->y));
}

VALUE cACOMod;
VALUE cACOPoint;
VALUE cACOGraph;

void Init_ACO_Ext() {
printf("ACO_Ext initializing...\n");
cACOMod = rb_define_module("ACO");
cACOPoint = rb_define_class_under(cACOMod,"Point",rb_cObject);
cACOGraph = rb_define_class_under(cACOMod,"Graph",rb_cObject);
rb_define_alloc_func(cACOPoint, aco_point_alloc);
rb_define_method(cACOPoint,"initialize",aco_point_init,2);
rb_define_method(cACOPoint,"-",aco_point_diff,1);
rb_define_method(cACOPoint,"==",aco_point_equal,1);

id_x = rb_intern("x");
id_y = rb_intern("y");
id_equal = rb_intern("==");
}
//end aco_ext.c


Phil

As it stands right now if you do something like:
point - "hey"
you will probably crash Ruby or worse.

Another option would be to do (untested/uncheck code):

#define IsACO_Point(v) (TYPE(v) == T_DATA && RDATA(v)->dfree ==
aco_point_free)

static ACO_Point_Struct
aco_to_aco(VALUE other)
{
ACO_Point_Struct p;
if (IsACO_Point(other)) {
ACO_Point_Struct *p_ptr;
Data_Get_Struct(self, ACO_Point_Struct, p_ptr);
p = *p_ptr;
} else {
VALUE x, y;
x = rb_convert_type(T_FLOAT, rb_funcall2(other, id_x, 0, 0),
"to_f", "Float");
y = rb_convert_type(T_FLOAT, rb_funcall2(other, id_y, 0, 0),
"to_f", "Float");
p.x = RFLOAT(x)->value;
p.y = RFLOAT(y)->value;
}
return p;
}

then use aco_to_aco() whenever your not sure of the type/class of an
argument (ie when it is not self).
doing this also leaves duck typing options open :)

-Charlie
 
M

Mauricio Fernández

Oh, for the sake of complete documentation in case someone has similar
questions in the future:
[...]

Just one minor simplification (also, for the record):
static void aco_point_free(void* p){
free(p);
}
static VALUE aco_point_alloc(VALUE klass) {
VALUE object;
ACO_Point_Struct* point = (ACO_Point_Struct*)malloc(sizeof(
ACO_Point_Struct));
object = Data_Wrap_Struct(klass,0,aco_point_free,point);
==============
-1
This way, free() will be used (and you hence need no aco_point_free
function).
 
C

Charles Mills

Mauricio said:
Oh, for the sake of complete documentation in case someone has similar
questions in the future:
[...]

Just one minor simplification (also, for the record):
static void aco_point_free(void* p){
free(p);
}
static VALUE aco_point_alloc(VALUE klass) {
VALUE object;
ACO_Point_Struct* point = (ACO_Point_Struct*)malloc(sizeof(
ACO_Point_Struct));
object = Data_Wrap_Struct(klass,0,aco_point_free,point);
==============
-1
This way, free() will be used (and you hence need no aco_point_free
function).

Unless you want to make the optimization I did and use the free
function as a way of determining what kind of struct is in your T_DATA
wrapper.

-Charlie
 
M

Mauricio Fernández

As it stands right now if you do something like:
point - "hey"
you will probably crash Ruby or worse.

Another option would be to do (untested/uncheck code):

#define IsACO_Point(v) (TYPE(v) == T_DATA && RDATA(v)->dfree ==
aco_point_free)

point - "hey" would actually be OK since Data_Get_Struct first does a
Check_Type(obj, T_DATA), but point - SomeOtherRDataType.new would indeed
bomb. Good catch.
 
P

Phil Tomson

Oh, for the sake of complete documentation in case someone has similar
questions in the future:
[...]

Just one minor simplification (also, for the record):
static void aco_point_free(void* p){
free(p);
}
static VALUE aco_point_alloc(VALUE klass) {
VALUE object;
ACO_Point_Struct* point = (ACO_Point_Struct*)malloc(sizeof(
ACO_Point_Struct));
object = Data_Wrap_Struct(klass,0,aco_point_free,point);
==============
-1
This way, free() will be used (and you hence need no aco_point_free
function).

So you're saying it should be:
object = Data_Wrap_Struct(klass,0,-1,point);

or:
object = Data_Wrap_Struct(klass,0,0,point);

?

Phil
 
C

Charles Mills

Phil said:
Oh, for the sake of complete documentation in case someone has similar
questions in the future:
[...]

Just one minor simplification (also, for the record):
static void aco_point_free(void* p){
free(p);
}
static VALUE aco_point_alloc(VALUE klass) {
VALUE object;
ACO_Point_Struct* point = (ACO_Point_Struct*)malloc(sizeof(
ACO_Point_Struct));
object = Data_Wrap_Struct(klass,0,aco_point_free,point);
==============
-1
This way, free() will be used (and you hence need no aco_point_free
function).

So you're saying it should be:
object = Data_Wrap_Struct(klass,0,-1,point);

or:
object = Data_Wrap_Struct(klass,0,0,point);

?

Phil

I think he is saying:
object = Data_Wrap_Struct(klass,0,free,point);

Using -1 instead of free is not supported in 1.9.

-Charlie
 
N

nobuyoshi nakada

Hi,

At Fri, 22 Jul 2005 13:10:57 +0900,
Charles Mills wrote in [ruby-talk:149150]:
I think he is saying:
object = Data_Wrap_Struct(klass,0,free,point);

Using -1 instead of free is not supported in 1.9.

Still supported, of course.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,992
Messages
2,570,220
Members
46,807
Latest member
ryef

Latest Threads

Top