Getting the actual size of a sparse file

D

Daniel Berger

Hi,

How do you get the true size of a sparse file? Using /var/log/lastlog
on Ubuntu as an example I see this with "ls -lh"

287K lastlog

With "ls -sh" I see this:

40K lastlog

A File.stat call reveals this:

#<File::Stat
dev=0x801,
ino=5249695,
mode=0100664 (file rw-rw-r--),
nlink=1,
uid=0 (root),
gid=43 (utmp),
rdev=0x0 (0, 0),
size=292876,
blksize=4096,
blocks=80,
atime=Mon Jan 03 16:03:24 -0700 2011 (1294095804),
mtime=Thu Oct 21 11:34:51 -0600 2010 (1287682491),
ctime=Thu Oct 21 11:34:51 -0600 2010 (1287682491)>

Multiplying blocks * blksize doesn't seem to match up, either.

How do I arrive at 40k?

Also, how would one go about detecting a sparse file?

Regards,

Dan
 
P

Perry Smith

I would go find the source for Ubuntu's ls and see what does it do for
the -s option.

Note that -s is output in blocks.
 
M

Matthew Bloch

True, but I'd like to use pure Ruby (not system calls) if possible, at
least for *nix systems. Or will this require an extension?

dd if=/dev/zero bs=1 seek=100M count=0 of=out2

irb(main):011:0> stat=File.stat("out2")
=> #<File::Stat dev=0xfc0c, ino=160, mode=0100644, nlink=1, uid=1006,
gid=1006, rdev=0x0, size=104857600, blksize=4096, blocks=0, atime=Wed
Jan 05 18:02:08 +0000 2011, mtime=Wed Jan 05 18:02:08 +0000 2011,
ctime=Wed Jan 05 18:02:08 +0000 2011>

irb(main):013:0> [stat.blocks*stat.blksize, stat.size]
=> [0, 104857600]

Gives you allocated size & filesystem size.
 
D

Daniel Berger

I would go find the source for Ubuntu's ls and see what does it do for
the -s option.

Note that -s is output in blocks.

Yeah, looks like ls -s defaults to a block size of 1.

Hm, how does this look?

class File
def self.sparse?(file)
stats =3D File.stat(file)
stats.size > stats.blocks * stats.blksize
end
end
 
J

Johan Holmberg

Hi,

How do you get the true size of a sparse file? Using /var/log/lastlog
on Ubuntu as an example I see this with "ls -lh"

287K lastlog

With "ls -sh" I see this:

40K lastlog

A File.stat call reveals this:

#<File::Stat
=A0dev=3D0x801,
=A0ino=3D5249695,
=A0mode=3D0100664 (file rw-rw-r--),
=A0nlink=3D1,
=A0uid=3D0 (root),
=A0gid=3D43 (utmp),
=A0rdev=3D0x0 (0, 0),
=A0size=3D292876,
=A0blksize=3D4096,
=A0blocks=3D80,
=A0atime=3DMon Jan 03 16:03:24 -0700 2011 (1294095804),
=A0mtime=3DThu Oct 21 11:34:51 -0600 2010 (1287682491),
=A0ctime=3DThu Oct 21 11:34:51 -0600 2010 (1287682491)>

Multiplying blocks * blksize doesn't seem to match up, either.

See stat(2):

The st_blocks field indicates the number of blocks allocated to =
the
file, 512-byte units. (This may be smaller than st_size/512 when th=
e
file has holes.)

The st_blksize field gives the "preferred" blocksize for efficient f=
ile
system I/O. (Writing to a file in smaller chunks may cause an inef=
fi-
cient read-modify-rewrite.)

So "blksize" has nothing to do with the size of the "blocks". They are
always counted in 512-byte units.

/Johan Holmberg
 
D

Daniel Berger

See stat(2):

=A0 =A0 =A0 =A0The st_blocks field indicates the number of =A0blocks =A0a= llocated =A0to =A0the
=A0 =A0 =A0 =A0file, =A0512-byte =A0units. =A0(This may be smaller than s= t_size/512 when the
=A0 =A0 =A0 =A0file has holes.)

=A0 =A0 =A0 =A0The st_blksize field gives the "preferred" blocksize for e= fficient file
=A0 =A0 =A0 =A0system =A0I/O. =A0(Writing to a file in smaller chunks may= cause an ineffi-
=A0 =A0 =A0 =A0cient read-modify-rewrite.)

So "blksize" has nothing to do with the size of the "blocks". They are
always counted in 512-byte units.

Oh, wow, I don't think I knew that. It's strikes me as particularly
bizarre that they would return some notion of a "preferred block size"
instead of the actual block size. Seriously, what's the use of that?

Now I need to check other platforms (Solaris, HP-UX) to see if they
use 512 byte convention.

Is this something that's universal? Or is it something I can get via a
C call somewhere?

Regards,

Dan
 
G

Gary Wright

Oh, wow, I don't think I knew that. It's strikes me as particularly
bizarre that they would return some notion of a "preferred block size"
instead of the actual block size. Seriously, what's the use of that?

Now I need to check other platforms (Solaris, HP-UX) to see if they
use 512 byte convention.

Is this something that's universal? Or is it something I can get via a
C call somewhere?

I think you'll want to read up on the stat() system call. The POSIX
standard leaves a bit of wiggle room though since while it does specify
that st_blocks must be returned it doesn't specify the size of the blocks.

I'm not sure I understand your concern about 'actual' vs. 'preferred'.
I'm guessing they would be the same in almost any rational implementation
but the main reason for having the information is to perform I/O in
efficiently sized chunks. In that case, the 'preferred' block size would
seem to be what you want even if the 'actual' block size was different.

Gary Wright
 
J

Johan Holmberg

Oh, wow, I don't think I knew that. It's strikes me as particularly
bizarre that they would return some notion of a "preferred block size"
instead of the actual block size. Seriously, what's the use of that?

I think the two fields "st_blocks" and "st_blksize" just happens to
use the same word ("block") in two slightly different meanings. To
count the "st_blocks" in 512-byte units seem to be an arbitrary
convention, unrelated to the "physical block size" used for files.
Now I need to check other platforms (Solaris, HP-UX) to see if they
use 512 byte convention.

Is this something that's universal? Or is it something I can get via a
C call somewhere?

I looked in "Advanced UNIX Programming, 2nd ed" by Rochkind, and there
the "st_blocks" field is described as the number of 512-byte blocks
allocated for a file. So I guess this is a universal thing for UN*X
(Linux, Mac OS X, Solaris, etc.).

The Rochkind book also mentions that "st_blksize is in the stat
structure so that an implementation can vary it by file if it chooses
to do so".

Regards,
/Johan Holmberg
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,740
Latest member
AdolphBig6

Latest Threads

Top