convert fd to IO object in extension?

Toby DiPasquale · Oct 16, 2005

Hi all,

I'm writing a C extension for Ruby to bind to the epoll(4) facility in
later Linux kernels (one for *BSD's kqueue will be coming after that) and
I've hit a snag.

epoll_wait(2) returns a list of epoll_event records, each of which
contains (among other things) the fd that the particular event fired on.
However, unlike select(2), the whole point of epoll(4) is that you
don't need to maintain and pass in big arrays of fds each time you do a
call. Unfortunately for me, those input arrays are exactly the method by
which Kernel#select can map the results back into the output arrays.

So, is there a way in the Ruby source to map a fd to its associated IO
object? I do have the option of maintaining a list/hash when the IO
objects are inserted, but this would negate the main advantage of epoll(4)
so I'd understandably not like to have to do that. Thanks in advance.

ts · Oct 16, 2005

T> So, is there a way in the Ruby source to map a fd to its associated IO
T> object?

Perhaps you can try with ObjectSpace::each_object, something like this

moulon% ruby -e 'ObjectSpace::each_object(IO) {|io| puts "#{io} #{io.to_i}"}'
#<IO:0xb7d69fec> 2
#<IO:0xb7d6a000> 1
#<IO:0xb7d6a014> 0
moulon%

Guy Decoux

ES · Oct 16, 2005

Toby said:
Hi all,

I'm writing a C extension for Ruby to bind to the epoll(4) facility in
later Linux kernels (one for *BSD's kqueue will be coming after that) and
I've hit a snag.

You may want to peek at Zed A. Shaw's wrapping of Provos' libevent (which
abstracts the architectures you mention) [1]. That said...

epoll_wait(2) returns a list of epoll_event records, each of which
contains (among other things) the fd that the particular event fired on.
However, unlike select(2), the whole point of epoll(4) is that you
don't need to maintain and pass in big arrays of fds each time you do a
call. Unfortunately for me, those input arrays are exactly the method by
which Kernel#select can map the results back into the output arrays.

So, is there a way in the Ruby source to map a fd to its associated IO
object? I do have the option of maintaining a list/hash when the IO
objects are inserted, but this would negate the main advantage of epoll(4)
so I'd understandably not like to have to do that. Thanks in advance.

You can use IO#for_fd to create an IO object for a given file descriptor;
scour the source for the C function or just use rb_funcall. This, of
course, should only be done once after which you refer to the socket
using the IO object (you can get the FD again with IO#to_i).

I am not sure what you mean by maintaining a Hash; while an explicit
data structure is by no means necessary, you (presumably) have to have
some means of referring back to the sockets. This is what the IO object
would provide, however you store it. Please clarify if I am understanding
incorrectly.

If you find you need to do some special processing on the FD for each
read/write/update, you might do well to just subclass IO and augment
it with that functionality.

Toby DiPasquale

E

[1] http://www.zedshaw.com/projects/ruby_event/index.html

Toby DiPasquale · Oct 16, 2005

You can use IO#for_fd to create an IO object for a given file descriptor;
scour the source for the C function or just use rb_funcall. This, of
course, should only be done once after which you refer to the socket
using the IO object (you can get the FD again with IO#to_i).

In a C extension, this is suboptimal. You can instead use:

<code>
#include "ruby.h"
[...]
int fd;
OpenFile *fptr;

if (!rb_respond_to( io_obj, rb_intern( "sysread")))
rb_raise( rb_eTypeError, "instance of IO needed");
GetOpenFile( rb_io_get_io( io_obj), fptr);
fd = fileno( fptr->f);
</code>

Since both Socket and File are subclasses of IO, this will work for both
network sockets and disk file I/O. The above is correct because sysread is
currently only implemented by instances of IO. Caching the return value of
the rb_intern( "sysread") call is also preferable (see ruby-1.8.3/marshal.c).

I am not sure what you mean by maintaining a Hash; while an explicit
data structure is by no means necessary, you (presumably) have to have
some means of referring back to the sockets. This is what the IO object
would provide, however you store it. Please clarify if I am understanding
incorrectly.

What I meant was that I'd have to maintain a Hash mapping fd's to IO
instances. This Hash would be inserted into when the caller wanted to insert
an IO object into the epoll device. Sorry for the confusion.

At any rate, it turns out that

a) Ruby/Event is dead, and
b) Zed was maintaining just such a Hash for the same reason in
Ruby/Event.

I'll just build a Hash that maps fd's to IO instances and take the memory
hit.

If you find you need to do some special processing on the FD for each
read/write/update, you might do well to just subclass IO and augment
it with that functionality.

Speaking of that, that brings me to another question I had. I was trying
to extend IO as follows:

module EpollExt
attr_accessor :epoller
alias_method

ld_close, :close
def close
@epoller.delete( self)
old_close
end
end

So, then the user of the library would have to extend their IO instances
with that module before they could be used with the Epoll object, like so:

my_epoll = Epoll.new
[...]
x = File.open( "/tmp/my-tmp", "w")
x.extend( EpollExt)
x.epoller = my_epoll
[...]

However, this fails because at the time of definition, the module doesn't
have a close method. I want to shadow the close method of IO in order to
automatically call the cleanup method in the associated Epoll object.
Anybody know a way of doing that? I feel like I've seen this kind of thing
before, but I can't remember where.

ES · Oct 16, 2005

Toby said:
You can use IO#for_fd to create an IO object for a given file descriptor;
scour the source for the C function or just use rb_funcall. This, of
course, should only be done once after which you refer to the socket
using the IO object (you can get the FD again with IO#to_i).

Click to expand...

In a C extension, this is suboptimal. You can instead use:

<code>
#include "ruby.h"
[...]
int fd;
OpenFile *fptr;

if (!rb_respond_to( io_obj, rb_intern( "sysread")))
rb_raise( rb_eTypeError, "instance of IO needed");
GetOpenFile( rb_io_get_io( io_obj), fptr);
fd = fileno( fptr->f);
</code>

Since both Socket and File are subclasses of IO, this will work for both
network sockets and disk file I/O. The above is correct because sysread is
currently only implemented by instances of IO. Caching the return value of
the rb_intern( "sysread") call is also preferable (see ruby-1.8.3/marshal.c).
Works.

I am not sure what you mean by maintaining a Hash; while an explicit
data structure is by no means necessary, you (presumably) have to have
some means of referring back to the sockets. This is what the IO object
would provide, however you store it. Please clarify if I am understanding
incorrectly.

Click to expand...

What I meant was that I'd have to maintain a Hash mapping fd's to IO
instances. This Hash would be inserted into when the caller wanted to insert
an IO object into the epoll device. Sorry for the confusion.

At any rate, it turns out that

a) Ruby/Event is dead, and
b) Zed was maintaining just such a Hash for the same reason in
Ruby/Event.

I'll just build a Hash that maps fd's to IO instances and take the memory
hit.

Ah, I see.. I am still somewhat curious. I am assuming that you are already
storing those IO instances somewhere you can reach them. That being the case
and the IO instance *already referencing the FD*, why do you need explicit
mapping?

If for some reason you need to work with the sockets separately from the
IO instances (though I can not imagine why), you should still be able to
use the IO to read and write data normally. Any other socket manipulation
you can do independently of the IO, just with the sockets themselves and
then just have the IO reference its FD. Perhaps I am missing some crucial
insight to your architecture, though?

If you find you need to do some special processing on the FD for each
read/write/update, you might do well to just subclass IO and augment
it with that functionality.

Click to expand...

Speaking of that, that brings me to another question I had. I was trying
to extend IO as follows:

module EpollExt
attr_accessor :epoller
alias_method ld_close, :close
def close
@epoller.delete( self)
old_close
end
end

So, then the user of the library would have to extend their IO instances
with that module before they could be used with the Epoll object, like so:

my_epoll = Epoll.new
[...]
x = File.open( "/tmp/my-tmp", "w")
x.extend( EpollExt)
x.epoller = my_epoll
[...]

However, this fails because at the time of definition, the module doesn't
have a close method. I want to shadow the close method of IO in order to
automatically call the cleanup method in the associated Epoll object.
Anybody know a way of doing that? I feel like I've seen this kind of thing
before, but I can't remember where.

You can use the hook method Module#included.

E

Toby DiPasquale · Oct 16, 2005

<code>
#include "ruby.h"
[...]
int fd;
OpenFile *fptr;

if (!rb_respond_to( io_obj, rb_intern( "sysread")))
rb_raise( rb_eTypeError, "instance of IO needed");
GetOpenFile( rb_io_get_io( io_obj), fptr);
fd = fileno( fptr->f);
</code>

Since both Socket and File are subclasses of IO, this will work for both
network sockets and disk file I/O. The above is correct because sysread is
currently only implemented by instances of IO. Caching the return value of
the rb_intern( "sysread") call is also preferable (see
ruby-1.8.3/marshal.c).

Click to expand...

Works.

rb_io_get_io() is not available to extensions, so just get rid of that call
(leaving the io_obj part) since you've already verified that its an IO instance
by then (and thus GetOpenFile() and the rest will succeed). I discovered this
after writing the email ;-)

Ah, I see.. I am still somewhat curious. I am assuming that you are already
storing those IO instances somewhere you can reach them. That being the case
and the IO instance *already referencing the FD*, why do you need explicit
mapping?

Yeah, I wasn't ;-)

I was asking the original question in order to _avoid_ having to build up
such a store of IO instances. Here's what happens now:

0. Caller creates an instance of Epoll

1. Caller opens IO object of some sort

2. Caller calls Epoll#update on the new IO object, which will internally
add the IO object to a Hash and also call epoll_ctl( EPOLL_CTL_ADD) to add
it to the epoll device.

3. Caller calls Epoll#poll which internally calls epoll_wait(2) and when
that returns, builds an array of 2-cell arrays, the first cell containing
the IO object that matched the fd returned from epoll_wait(2) (resolved
with the help of the Hash above) and the second cell containing the event
bitmask returned for that fd. For every fd returned by epoll_wait(2),
there will be a 2-cell array in the returned array from Epoll#poll.

I thought I could avoid having to store all active IO objects if two things
were true:

a) epoll removes the fd from its device when its close(2)'d
b) Ruby had a way to reverse-map a fd to an active IO instance

I knew a) was true, but b) is not. Thus, I am now using a Hash, just like
Zed in Ruby/Event. Consequently, this is why I want to hook into IO#close,
so I can call Epoll#delete in order to remove the IO instance from the
Hash (which is now necessary, since I'm keeping track of them all).

If for some reason you need to work with the sockets separately from the
IO instances (though I can not imagine why), you should still be able to
use the IO to read and write data normally. Any other socket manipulation
you can do independently of the IO, just with the sockets themselves and
then just have the IO reference its FD. Perhaps I am missing some crucial
insight to your architecture, though?

No I don't need to do this. My only concern was to take the fd that
epoll_wait(2) returns and somehow map it back to the IO instance so that
it would be usable in the calling Ruby script.

You can use the hook method Module#included.

This isn't working for me when I use Object#extend instead of
Module#include.

adidas~> irb
irb(main):001:0> module A
irb(main):002:1> def A.included( mod)
irb(main):003:2> puts "#{self} included in #{mod}"
irb(main):004:2> end
irb(main):005:1> end
=> nil
irb(main):006:0> x = "String"
=> "String"
irb(main):007:0> x.extend A
=> "String"
irb(main):008:0> module Enumerable
irb(main):009:1> include A
irb(main):010:1> end
A included in Enumerable
=> Enumerable
irb(main):011:0> quit
adidas~>

Any thoughts on making it work with Object#extend?

Select Eof extension files based on text list of filenames with if condition	0	May 4, 2022
Threaded IO trouble	12	Aug 6, 2008
"Dummy" IO object to push and pull data?	34	Jan 2, 2010
select.epoll question	5	Feb 7, 2013
How to use Densenet121 in monai	0	Feb 16, 2024
Counter-intuitive io vs no-io time readings	6	Apr 9, 2014
IO#dup	4	Sep 16, 2006
I want to Display Excel As HTML In js	2	Feb 24, 2023

convert fd to IO object in extension?

Toby DiPasquale

ts

ES

Toby DiPasquale

ES

Toby DiPasquale

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads