using narray and mmap with HUGE data sets

A

Ara.T.Howard

scientific rubyists-

i have played with using mmap with narray before with some success, but thought
this was so neat i'd post it for posterity:


- i have a huge satelite mosaic (actually the ones i'm using are 4 times as
big as this one - eg just under a gig each):

jib:~/fa/thailand > ls -ltar etm_mosaics/L72001195_19720020208_b70.mos
-rw-rw-r-- 1 ahoward ahoward 184816141 Oct 1 13:32 etm_mosaics/L72001195_19720020208_b70.mos


- here's a litte program that takes a list of scanlines and shows how many
elements were non zero and zero:

jib:~/fa/thailand > cat mmap_narray_test.rb
require 'mmap'
require 'narray'

path, samples, lines = ARGV.shift, Integer(ARGV.shift), Integer(ARGV.shift)
mmap = Mmap::new path, 'r', Mmap::MAP_SHARED
narray = NArray::to_na mmap.to_str, NArray::BYTE, lines, samples

while((line = ARGV.shift))
line = Integer line
scanline = narray[line, true]
non_zero, zero = scanline.ne(0).where2

puts <<-yaml
-
scanline: #{ line }
elements : #{ samples }
non_zero : #{ non_zero.size }
zero : #{ zero.size }
yaml
end

- running is plenty fast

jib:~/fa/thailand > time ruby mmap_narray_test.rb \
etm_mosaics/L72001195_19720020208_b70.mos 10441 17701 183 184 185

-
scanline: 183
elements : 10441
non_zero : 6100
zero : 4341
-
scanline: 184
elements : 10441
non_zero : 6100
zero : 4341
-
scanline: 185
elements : 10441
non_zero : 6102
zero : 4339

real 0m0.809s
user 0m0.200s
sys 0m0.610s


obviously you could do this specific task using some itelligent seeking and
some expensive unpacking - i just thought it was really cool that both mmap
and narray could work together so well. using them together gives you nice
logical access to huge datasets without performing any un-needed i/o while
offering all of narray's capabilities. for example i'm using this to maintain
a set of about 10 files that total 10GB for some code that needs to loop over
sections of these files in relatively small chunks (10000 x 10000 tiles)
gathering some stats along the way. using mmap and narray enabled me to write
the code in about 15 minutes while completely ignoring the fact that machine
i'm running on only has 4GB of ram. could be faster in c but that wouldn't
take me 15 minutes to write.

cheers.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================
 

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top