SparseFile

Erik Veenstra · Jan 6, 2007

I had to send huge files over a network to another machine.
Most of these files were image files for QEMU: typically 4 GB,
of which only a small portion (~ 400 MB) was used. Both client
and server were Ruby programs on Linux boxes, communicating via
FTP.

I thought it was a good idea to use sparse files [1], so I
searched for a SparseFile class, couldn't find one and wrote
one myself:

http://www.erikveen.dds.nl/rubycodesnippets/index.html#4.0.0

It seems to work pretty well... ;]

Any thoughts? Ideas? Comments? Do you want it to be available
as a library/gem? Should it become part of Ruby Facets? Trans?

gegroet,
Erik V. - http://www.erikveen.dds.nl/

[1] http://en.wikipedia.org/wiki/Sparse_file

Erik Veenstra · Jan 6, 2007

I had to send huge files over a network to another machine.

(That's a bit misleading. SparseFile is about saving disk space
on the server; not about saving bandwidth....)

A quick test on Windows2000/NTFS: Cygwin works as expected,
including the savings. On plain Windows, the full size of the
file is allocated, although the checksums of the files are
correct.

For now, it looks safe to use SparseFile in your
platform-agnostic application. What about OS/X? BSD?

gegroet,
Erik V. - http://www.erikveen.dds.nl/

ara.t.howard · Jan 6, 2007

I had to send huge files over a network to another machine.
Most of these files were image files for QEMU: typically 4 GB,
of which only a small portion (~ 400 MB) was used. Both client
and server were Ruby programs on Linux boxes, communicating via
FTP.

I thought it was a good idea to use sparse files [1], so I
searched for a SparseFile class, couldn't find one and wrote
one myself:

http://www.erikveen.dds.nl/rubycodesnippets/index.html#4.0.0

It seems to work pretty well... ;]

Any thoughts? Ideas? Comments? Do you want it to be available
as a library/gem? Should it become part of Ruby Facets? Trans?

gegroet,
Erik V. - http://www.erikveen.dds.nl/

[1] http://en.wikipedia.org/wiki/Sparse_file

why not simply zip them? of course one can also read/write directly to zipped
files. the reason i ask is that we do this alot. what would you consider
trade-offs to be with SparseFile?

cheers.

-a

Eric Hodel · Jan 6, 2007

I had to send huge files over a network to another machine.
Most of these files were image files for QEMU: typically 4 GB,
of which only a small portion (~ 400 MB) was used. Both client
and server were Ruby programs on Linux boxes, communicating via
FTP.

I thought it was a good idea to use sparse files [1], so I
searched for a SparseFile class, couldn't find one and wrote
one myself:

http://www.erikveen.dds.nl/rubycodesnippets/index.html#4.0.0

It seems to work pretty well... ;]

Any thoughts? Ideas? Comments? Do you want it to be available
as a library/gem? Should it become part of Ruby Facets? Trans?

You can have a gem hosted on Rubyforge in about 20 minutes with Hoe.
(Provided you have a project to store it in.)

Erik Veenstra · Jan 6, 2007

why not simply zip them? of course one can also read/write

directly to zipped files. the reason i ask is that we do this
alot.

* Availability: A zipped file is not readily available (for
QEMU).

* Speed: It takes an awful lot of time to zip and unzip
(typically 4 GB per file!)

* Size: The sparse file is smaller than the zipped file (using
gzip). (Although I didn't expect that... ;])

* Code: Better readable code:

io = TCPServer.new("0.0.0.0", 4444).accept
SparseFile.open("file") do |f|
while (not io.closed? and data = io.gets)
f.write(data)
end
end

what would you consider trade-offs to be with SparseFile?

* It might me just a little bit slower than writing to ordinary
files, because of the extra check and the housekeeping. (But
it is much, much faster than using ZIP-files...)

* Over time, after thousands or millions of random-access
writes to the same file, the holes are gone. ZIP-files might
stay small. (But ZIP-files are very, very slow...) Notice
this is not a disadvantage compared to ordinary files.

gegroet,
Erik V. - http://www.erikveen.dds.nl/

----------------------------------------------------------------

$ ls -lh winxp.img
-rw-r--r-- 1 erik erik 4.0G 2007-01-05 17:29 winxp.img

$ du -h winxp.img
997M winxp.img

$ gzip < winxp.img | wc -c
1467534047 # That's more than the sparse file!

$ time gzip < winxp.img > /dev/null
real 5m55.910s # 5 Minutes!
user 4m51.922s
sys 0m6.816s

----------------------------------------------------------------

Robert Klemme · Jan 6, 2007

I had to send huge files over a network to another machine.
Most of these files were image files for QEMU: typically 4 GB,
of which only a small portion (~ 400 MB) was used. Both client
and server were Ruby programs on Linux boxes, communicating via
FTP.

I thought it was a good idea to use sparse files [1], so I
searched for a SparseFile class, couldn't find one and wrote
one myself:

http://www.erikveen.dds.nl/rubycodesnippets/index.html#4.0.0

It seems to work pretty well... ;]

Any thoughts? Ideas? Comments? Do you want it to be available
as a library/gem? Should it become part of Ruby Facets? Trans?

Looks good! Do you think it's feasible and reasonable to include it
into the std lib's File class? Could be a flag in open mode, i.e.

File.open("foo", "wsb") do |io|
....

Maybe functionality can go into a Module and which will be included if
the flag is present?

Kind regards

robert

Erik Veenstra · Jan 7, 2007

Looks good! Do you think it's feasible and reasonable to

include it into the std lib's File class? Could be a flag in
open mode, i.e.

Reasonable? Well, I think so. It's just a special kind of file
handling. However, it's not too special.

Feasible? I don't know. Somebody who understandes the code in
file.c could answer that. That's not me... ;]

Is somebody able and willing to patch file.c? I would do it
myself, if I was a C programmer...

File.open("foo", "wsb") do |io|

That sounds reasonable.

gegroet,
Erik V. - http://www.erikveen.dds.nl/

Olivier · Jan 7, 2007

Hi,

I didn't know about this kind of files, so I googled around for some
explanations. Then, I looked at your code, and I have a question :

What's the purpose of this strange String in the following line ?

SPARSE_BLOCKS[length] ||= "%SCRIPT_SPARSEFILE%00" * length

Is it juste a fake string, or does it have a particular meaning ?

Thanks for any answer !

Erik Veenstra · Jan 7, 2007

What's the purpose of this strange String in the following

line ?

SPARSE_BLOCKS[length] ||= "%SCRIPT_SPARSEFILE%00" * length

Oops... Should have been:

SPARSE_BLOCKS[length] ||= "\000" * length

The Ruby code which generates my site, does a gsub with a
string as the replacement. That means that "\0" has a special
meaning... ;]

Using gsub in the block form seems to work..

Thanks.

gegroet,
Erik V. - http://www.erikveen.dds.nl/

Olivier · Jan 7, 2007

Le dimanche 07 janvier 2007 22:07, Erik Veenstra a =E9crit=A0:

What's the purpose of this strange String in the following
line ?

SPARSE_BLOCKS[length] ||=3D "%SCRIPT_SPARSEFILE%00" * length

Click to expand...

Oops... Should have been:

SPARSE_BLOCKS[length] ||=3D "\000" * length

The Ruby code which generates my site, does a gsub with a
string as the replacement. That means that "\0" has a special
meaning... ;]

Using gsub in the block form seems to work..

Thanks.

gegroet,
Erik V. - http://www.erikveen.dds.nl/

Oh yes, I really thought I was missing something

Thanks !

=2D-
Olivier Renaud

Erik Veenstra · Jan 8, 2007

I updated the implementation [1] of SparseFile [2] and added
some unit tests [3].

Any thoughts? Ideas? Comments? I welcome any feedback.

gegroet,
Erik V. - http://www.erikveen.dds.nl/

[1] http://localhost:1234/rubycodesnippets/index.html#4.1.0
[2] http://localhost:1234/rubycodesnippets/index.html#4.0.0
[3] http://localhost:1234/rubycodesnippets/index.html#4.3.0

Mauricio Fernandez · Jan 8, 2007

I updated the implementation [1] of SparseFile [2] and added
some unit tests [3].

Any thoughts? Ideas? Comments? I welcome any feedback.

Instead of
SPARSE_BLOCKS[length] ||= "\000" * length

unless data == SPARSE_BLOCKS[length]
@file.pos = @pos
@file.write(data)
end

What about

unless data.count("\0") == data.size
@file.pos = @pos
@file.write(data)
end

? Saves some mem if you have lots of "null blocks" of different sizes (or use
huge blocks), and the loop shouldn't be much slower than a strcmpish one:

while (s < send) {
if (table[*s++ & 0xff]) {
i++;
}
}

But a C implementation that detects leading \0s and skips them would be best.
In Ruby it'd be
idx = data.index(/[^\0]/)
if idx
@file.pos += idx
@file.write(data[idx..-1])
else
@file.write(data)
end
but the regexp matching and the aref would make it way too slow.
I'll post it tomorrow if I don't forget.

Erik Veenstra · Jan 8, 2007

unless data.count("\0") == data.size

OK.

I experimented with this as well:

i = data.index(/[^\000]/)
if i
@file.pos = @pos + i
@file.write(data.gsub(/^\000*|\000*$/, ""))
end

In theory, it could give better results. But, of course, it is
definitely slower: 10.6 seconds instead of 9.5 seconds for one
particular test.

I'll stick to "unless data.count("\0") == data.size".

gegroet,
Erik V. - http://www.erikveen.dds.nl/

[ANN] RubyScript2Exe 0.5.2	2	Apr 15, 2007
[ANN] AllInOneRuby 0.2.11	0	Apr 15, 2007
[ANN] RubyScript2Exe 0.4.2	0	Mar 8, 2006
[ANN] ForkAndReturn 0.1.1	0	Jul 20, 2008
[ANN] RubyScript2Exe 0.4.3	10	Jun 27, 2006
[ANN] AllInOneRuby 0.2.4	2	Jun 14, 2005
Building, Packing and Distributing Ruby Applications	5	Jan 15, 2005
[ANN] RubyScript2Exe 0.5.1	3	Aug 6, 2006

SparseFile

Erik Veenstra

Erik Veenstra

ara.t.howard

Eric Hodel

Erik Veenstra

Robert Klemme

Erik Veenstra

Olivier

Erik Veenstra

Olivier

Erik Veenstra

Mauricio Fernandez

Erik Veenstra

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads