multi-threaded access to shared memory space

Greg Willits · Jun 30, 2008

I have a pure Ruby project (no Rails) where I would like multiple
"tasks" (ruby processes more or less) to run in parallel (collectively
taking advantage of multiple CPU cores) while accessing a shared memory
space of data structures.

OK, that's a mouthful.

- single machine, multiple cores (4 or 8)

- step one: pre-load a number of arrays and hashes (could be a couple GB
worth in total) into memory

- step two: launch several independent Ruby scripts to search and read
from the data pool in order to aggregate data in new sets to be written
to text files.

Ruby 1.8's threading would seem poorly suited to this. Can 1.9 run
multiple threads each accesing the same RAM-space while using all cores
of the machine?

I've looked at memcache, but it seems like it could store and retrieve
one of my pool's arrays, but it cannot look inside that array and
retrieve just a single row of it? It would want to return the whole
array, yes? (not good if that array is 100MB).

-- gw

Eric Hodel · Jun 30, 2008

I have a pure Ruby project (no Rails) where I would like multiple
"tasks" (ruby processes more or less) to run in parallel (collectively
taking advantage of multiple CPU cores) while accessing a shared
memory
space of data structures.

OK, that's a mouthful.

- single machine, multiple cores (4 or 8)

- step one: pre-load a number of arrays and hashes (could be a
couple GB
worth in total) into memory

- step two: launch several independent Ruby scripts to search and read
from the data pool in order to aggregate data in new sets to be
written
to text files.

Ruby 1.8's threading would seem poorly suited to this. Can 1.9 run
multiple threads each accesing the same RAM-space while using all
cores
of the machine?

At present, 1.9 has a global VM lock, so only one C thread can be
running ruby code at a time.

I've looked at memcache, but it seems like it could store and retrieve
one of my pool's arrays, but it cannot look inside that array and
retrieve just a single row of it? It would want to return the whole
array, yes? (not good if that array is 100MB).

memcache is just a cache and not designed to be used as a persistent
store. It may loose your data if you are not careful.

You're probably looking for something mmap and several forked
cooperative processes.

Tim Pease · Jun 30, 2008

I have a pure Ruby project (no Rails) where I would like multiple
"tasks" (ruby processes more or less) to run in parallel (collectively
taking advantage of multiple CPU cores) while accessing a shared
memory
space of data structures.

OK, that's a mouthful.

- single machine, multiple cores (4 or 8)

- step one: pre-load a number of arrays and hashes (could be a
couple GB
worth in total) into memory

- step two: launch several independent Ruby scripts to search and read
from the data pool in order to aggregate data in new sets to be
written
to text files.

Ruby 1.8's threading would seem poorly suited to this. Can 1.9 run
multiple threads each accesing the same RAM-space while using all
cores
of the machine?

I've looked at memcache, but it seems like it could store and retrieve
one of my pool's arrays, but it cannot look inside that array and
retrieve just a single row of it? It would want to return the whole
array, yes? (not good if that array is 100MB).

Take a look at mmap

<http://raa.ruby-lang.org/project/mmap/>

Blessings,
TwP

Eleanor McHugh · Jun 30, 2008

I have a pure Ruby project (no Rails) where I would like multiple
"tasks" (ruby processes more or less) to run in parallel (collectively
taking advantage of multiple CPU cores) while accessing a shared
memory
space of data structures.

OK, that's a mouthful.

- single machine, multiple cores (4 or 8)

- step one: pre-load a number of arrays and hashes (could be a
couple GB
worth in total) into memory

- step two: launch several independent Ruby scripts to search and read
from the data pool in order to aggregate data in new sets to be
written
to text files.

Ruby 1.8's threading would seem poorly suited to this. Can 1.9 run
multiple threads each accesing the same RAM-space while using all
cores
of the machine?

I've looked at memcache, but it seems like it could store and retrieve
one of my pool's arrays, but it cannot look inside that array and
retrieve just a single row of it? It would want to return the whole
array, yes? (not good if that array is 100MB).

If you want to stay in pure Ruby, take a look at DRb and Rinda. Even
if not directly applicable they should give you some inspiration.

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

Charles Oliver Nutter · Jul 1, 2008

Greg said:
Ruby 1.8's threading would seem poorly suited to this. Can 1.9 run
multiple threads each accesing the same RAM-space while using all cores
of the machine?

No, but JRuby's threads can.

- Charlie

ara.t.howard · Jul 1, 2008

I have a pure Ruby project (no Rails) where I would like multiple
"tasks" (ruby processes more or less) to run in parallel (collectively
taking advantage of multiple CPU cores) while accessing a shared
memory
space of data structures.

OK, that's a mouthful.

- single machine, multiple cores (4 or 8)

- step one: pre-load a number of arrays and hashes (could be a
couple GB
worth in total) into memory

- step two: launch several independent Ruby scripts to search and read
from the data pool in order to aggregate data in new sets to be
written
to text files.

Ruby 1.8's threading would seem poorly suited to this. Can 1.9 run
multiple threads each accesing the same RAM-space while using all
cores
of the machine?

I've looked at memcache, but it seems like it could store and retrieve
one of my pool's arrays, but it cannot look inside that array and
retrieve just a single row of it? It would want to return the whole
array, yes? (not good if that array is 100MB).

-- gw

tim is right i think, mmap is a great approach. i've used the
following paradigm many times for processing large datasets:

mmap in the file
decide the chunk size
fork n processes working on each chunk

because mmap is carried across the fork you don't do any data
copying. actually the memory won't even be paged in until the
children read them.

this is really ideal if the children can write the output - in
otherwords if the children don't have to return data to the parent
since returning a huge chunk of data can be expensive.

you might easily end up being IO bound and not CPU bound - in the
similar processing i've done i've often found that the work scales
best with the number of disk controllers, not the number of cpus -
something worth considering

another approach to consider is to put all the input (or pathnames to
it) into an sqlite database and then launch processes to work on it.
this may not seem sexy but it has some huge advantages: namely that
you'll be able to maintain state across runs which will allow you to
make programming errors but still be making forward progress. this
isn't glamerous but it's very powerful as it allows incremental
development and even coordination of ruby with other languages - like c.

one last suggestion if you have a stack of linux machines available

. install rq
. submit a bunch of jobs that process a chunk of data

go home for the day ;-)

with rq you should be able to setup a linux cluster in a few minutes
and just submit a slow ruby script to 10 machines running 4 jobs each
no problem. you could also use rq on an 8 core machine to manage the
jobs for you

food for thought.

ref:

http://www.linuxjournal.com/article/7922
http://codeforpeople.com/lib/ruby/rq/rq-3.1.0/README
(rq 3.4.0 has a bug in it so use 3.1 if you decide to try that route)

a @ http://codeforpeople.com/

Safe file I/O to shared file (or SQLite) from multi-threaded webserver	8	Jan 1, 2010
How to use shared memory with fork() ?	5	Feb 17, 2012
druby and shared memory!!	1	May 22, 2006
Shared memory between servers.	3	Jul 24, 2006
jni, native methods and access to shared memory	0	Jul 2, 2011
Is it possible to access memory as a block ?	1	Aug 8, 2010
shared memory and multidimensional arrays...	2	Feb 19, 2008
Multiprocessing, shared memory vs. pickled copies	21	Apr 4, 2011

multi-threaded access to shared memory space

Greg Willits

Eric Hodel

Tim Pease

Eleanor McHugh

Charles Oliver Nutter

ara.t.howard

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads