Drb communication problem and crash

L

Laurent Francioli

Hi,

So first of all a little context of what I'm trying to do.
I have Rails app that needs quite a bit of computation and I want to run
the different queries in a number of different processes. To do so, I'm
trying to implement the following system:

Rails --> Drb Query Dispatcher --> Drb Query Runner

Rails sends a job to the query dispatcher which load balances the jobs
over serveral query runners.

The whole system works and then suddenly hangs. When it hangs I get the
following message on the Drb Query Dispatcher:

message type 0x54 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x43 arrived from server while idle
message type 0x5a arrived from server while idle

Then I can see that there is still some action for some of the queries
until it freezes completely.

Did anyone encounter similar problems? Or knows where I could fine at
least the signification of these messages?

Thanks alot!
I'll be glad to give more information if needed.

PS: I have tried to implement this with the Slave library but ran into
even more trouble with logs just making nonsense (looked like some
memory corruption somewhere)
 
E

Eric Hodel

So first of all a little context of what I'm trying to do.
I have Rails app that needs quite a bit of computation and I want
to run
the different queries in a number of different processes. To do so,
I'm
trying to implement the following system:

Rails --> Drb Query Dispatcher --> Drb Query Runner

Rails sends a job to the query dispatcher which load balances the jobs
over serveral query runners.

The whole system works and then suddenly hangs. When it hangs I get
the
following message on the Drb Query Dispatcher:

message type 0x54 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x43 arrived from server while idle
message type 0x5a arrived from server while idle

I don't see where this message is coming from in DRb, or ruby.
Then I can see that there is still some action for some of the queries
until it freezes completely.

Did anyone encounter similar problems? Or knows where I could fine at
least the signification of these messages?

grep your code for 'while idle', that will help.
 
L

Laurent Francioli

Hi,

Thanks for your quick answer! Well the message definitely doesn't come
from my code. If it really doesn't come from Drb neither Ruby, maybe it
is a system message?

Also, I've read your Seattle.rb presentation slides and on one of your
slides you seem to say that ACL shouldnt be used and could cause
deadlocks; is this right? I'm asking cause we're using it in our system
to restrain the accepted calls from the localhost only.

Another thing I noticed is that my version not using the Slave lib
actually does produce the same behavior (variables mix-up, etc). It
looks like the communication between the server and clients has some
troubles. I also noticed that the problems occur more often with
increasing number of servers running.
Hope that helps a bit...

I'll keep you posted if I get new clues or even better...a fix!

Thanks!
Laurent
 
L

Laurent Francioli

So I finally found the problem! The message I repported actually came
from Postgres.

The problem was that I had a connection to the DB at the moment of the
fork (both using the Slave lib and my own forking stuff). It seems that
this was somehow passed onto the child processes and interfered with the
child access to the DB. I'm not 100% sure why since the child processes
were actually creating their own connections anyway.
But I'm sure it came from there tho since it is completely stable now!

Thanks alot for your quick reply!
Laurent
 
E

Eric Hodel

Thanks for your quick answer! Well the message definitely doesn't come
from my code. If it really doesn't come from Drb neither Ruby,
maybe it
is a system message?

Also, I've read your Seattle.rb presentation slides and on one of your
slides you seem to say that ACL shouldnt be used and could cause
deadlocks; is this right? I'm asking cause we're using it in our
system
to restrain the accepted calls from the localhost only.

I don't recall, which URL?
 
E

Eric Hodel

So I finally found the problem! The message I repported actually came
from Postgres.

The problem was that I had a connection to the DB at the moment of the
fork (both using the Slave lib and my own forking stuff). It seems
that
this was somehow passed onto the child processes and interfered
with the
child access to the DB. I'm not 100% sure why since the child
processes
were actually creating their own connections anyway.
But I'm sure it came from there tho since it is completely stable now!

If it still had the file descriptor open, it would be copied.
 
E

Eric Hodel

Ok, it's really
old...http://blog.segment7.net/articles/2006/04/22/drb-an-
introduction-and-overview
and as I said earlier, since I only had the slides and not the
commment
on them I couldn't be sure :)

Btw, really nice presentation! It find it pretty difficult to get good
doc on Drb and that's a great piece!

Ah, even with an ACL it is still possible for people to do bad stuff
to your DRb processes. ACLs by themselves won't cause deadlocks, but
they can't prevent malice.
 
L

Laurent Francioli

Eric said:
Ah, even with an ACL it is still possible for people to do bad stuff
to your DRb processes. ACLs by themselves won't cause deadlocks, but
they can't prevent malice.

Ok, thanks for the explanation! :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,202
Messages
2,571,057
Members
47,665
Latest member
salkete

Latest Threads

Top