A memcached-like server in Ruby - feasible?

M

Marcin Raczkowski

why don't you try Gemstone or other object-oriented databases?

besides, memcache isn't THAT much faster then database, it's faster
becouse it can store objects in memory, but if you need queries it
looses all it's advantages.

greets
 
Y

Yohanes Santoso

Tom Machinski said:
Thanks, but if I'm already caching at the local process level, I might
as well cache to in-memory Ruby objects; the entire data-set isn't
that huge for a high-end server RAM capacity: about 500 MB all in all.

Caching to in-memory ruby objects does not automatically confer the
smartness you was describing. The sqlite is for the smartness.

I think Ara T Howard in the other thread was quite spot-on in
summarising your need.

YS.
 
M

M. Edward (Ed) Borasky

Tom said:
I might have impressed you with a somewhat inflated view of how large
our data-set is :)

We have about 100K objects, occupying ~500KB per object. So all in
all, the total weight of our dataset is no more than 500MBs. We might
grow to maybe twice that in the next 2 years. But that's it.

So it's very feasible to keep the entire data-set in *good* RAM for a
reasonable cost.

I was just thinking ... Erlang has an in-RAM database capability called
"Mnesia". Perhaps it could be ported to Ruby or one could write an
ActiveRecord connector to a Mnesia database.
Good point. Unfortunately, MySQL 5 doesn't appear to be able to take
hints. We've analyzed our queries and there's some strategies there we
could definitely improve by manual hinting, but alas we'd need to
switch to an RDBMS that supports those.

I wonder if you could trick PostgreSQL into putting its database in a
RAM disk. :) Seriously, though, if you're on Linux, you could probably
tweak PostgreSQL and the Linux page cache to get the whole database in
RAM while still having it safely stored on hard drives. I suppose you
could also do that for MySQL, but PostgreSQL is simply a better RDBMS.
 
M

M. Edward (Ed) Borasky

Tom said:
The problem is that for a perfectly normalized database, those queries
are *heavy*.

We're using straight, direct SQL (no ActiveRecord calls) there, and
several DBAs have already looked into our query strategy. Bottom line
is that each query on the normalized database is non-trivial, and they
can't reduce it to less than 0.2 secs / query. As we have 5+ of these
queries per page, we'd need one MySQL server for every
request-per-second we want to serve. As we need at least 50 reqs/sec,
we'd need 50 MySQL servers (and probably something similar in terms of
web servers). We can't afford that.

We can only improve the queries TTC by replicating data inside the
database, i.e. de-normalizing it with internal caching at the table
level (basically, that amounts to replicating certain columns from
table `bars` in table `foos`, thus saving some very heavy JOINs).

But if we're already de-normalizing, caching and replicating data, we
might as well create another layer of de-normalized, processed data
between the database and the Rails servers. That way, we will need
less MySQL servers, output requests faster (as the layer would hold
the data in an already processed state), and save a much of the
replication / clustering overhead.

-Tom
MapReduce and Starfish?
 
M

Marcin Raczkowski

Marcin said:
why don't you try Gemstone or other object-oriented databases?

besides, memcache isn't THAT much faster then database, it's faster
becouse it can store objects in memory, but if you need queries it
looses all it's advantages.

greets
here's simple object oriented DB i wrote ages ago, just throw out state
machine stuff and have phun, searching is done in ruby and really easy.
it took me one day if i remember correctly so that should be indication
of time you'd need to make simple ODB

# Module responsible for handling requests informations
#
# Information status:
# - :waiting (waiting for server response)
# - :progress (server reported progress)
# - :results (server returned results)
# - :error (j.w)
# - :timeout (request timed out)
# - :collect - to be garbage collected (right now for debuging purposes)
module Informations
# default time to live of message (used when expire is set to :ttl)
attr_accessor :default_ttl
# default timeout - time betwen ANY actions sent by server
attr_accessor :default_timeout
# use garbage collecting?
attr_accessor :gc

def init(ttl=30, timeout=30, gc=true)
@gc = gc
@default_ttl = ttl
@default_timeout = timeout
@informations={}
end

# creates new informations about request, id is request id,
# hash should contain additional informations (the'll be merged)
def new_info(id, hash)
#hash=hash.dup
#hash.delete:)data)
info={}
info[:id]=id
info[:status]=:waiting
info[:timeout]=@default_timeout
info[:last_action]=info[:start]=Time.now
info[:expire]=:new
info[:ttl]=@default_ttl
info.merge! hash

@informations[id] = info
end

# information state machine
# checks message status - and takes care of checking state transitions
# if transition is wrong it's ignored (no exception is rised!!)
#
# new info is returned
def change_status(info, state)
case info[:status]
when :waiting, :progress
if [:progress, :results, :error, :timeout].include? state
info[:status]=state
info[:stop]=Time.now unless state == :progress
info[:last_action]=Time.now
end
when :results, :error, :timeout
if state == :collect
info[:status]=state
info[:last_action]=Time.now
end
end
info
end

# checks if message timed out
def timeout?(info)
change_status(info, :timeout) if ([:wait, :progress].include?
info[:status]) && (Time.now > info[:last_action] + info[:timeout])
end

# finds information with id
#
# takes care of marking msg, as timed out/ to be collected
# returns info
def find(id)
self.timeout?(@informations[id])

begin
info = @informations[id].dup

#return nil if info[:state]==:collect # don't return expired infos
if info[:expire]==:first
@gc ? change_status(@informations[id], :collect) :
@informations.delete(id)
end
if (info[:expire]==:ttl) && (Time.now < info[:last_action] +
info[:ttl])
@gc ? change_status(@informations[id], :collect) :
@informations.delete(id)
end

rescue Exception
info=nil
end


#info[:last_action]=Time.now preventing expire ?
info
end

# finds all message matching criteria block
# or checks if :server_id and :name provided in hash match
#
# block should return true if information should be returned
#
# Examples:
# find_all({:name=>"actions", :server_id=>"121516136171356151"})
# find_all() {|i| i[:last_action] > Time.now-60 }
# returns all informations that state changed minute or less ago
# find_all() {|i| i[:status]==:error}
# returns all messages that returned errors
# gc! if find_all() {|i| i[:status]==:collect}.size > 1000
# clears old messages when there's more then 1000 of them
def find_all(hash={})
res = []
@informations.each_pair { |k,v|
if block_given?
res << self.find(k) if yield(v.dup)
else
catch:)no_match) {
# add more here!!
[:server_id, :name].each { |x|
throw:)no_match) if hash[x] && hash[x]!=v[x]
}
res << self.find(k)
}
end
}
res.empty? ? nil : res
end

# clears all messages marked for collection
def gc!
@informations.each_pair { |k,v|
@informations.delete(k) if v[:status]==:collect
}
end
end
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,233
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top