-s vs du - different results

Z

Zebee Johnstone

As people recommended using stat, I've tried.

But I seem to get different results to du, and different
to what my CD burning prog says.

#!/usr/bin/perl -w
use strict;
use File::Find;
my $total;
my $dir = shift;
find(\&wanted, $dir);

print "total = $total \n";

sub wanted {
$total += -s $File::Find::name;
}

produces:
total = 695543582

Running du -sb on the directory given to that program gets me:
750284800

So what am I missing about -s? That's a huge discrepancy, so
there's something that's not being counted.

I am running it as root, so it's not a permissions problem.

am I overflowing some buffer somewhere?

Zebee
 
U

Uri Guttman

ZJ> print "total = $total \n";

ZJ> sub wanted {
ZJ> $total += -s $File::Find::name;
ZJ> }

ZJ> produces:
ZJ> total = 695543582

ZJ> Running du -sb on the directory given to that program gets me:
ZJ> 750284800

ZJ> So what am I missing about -s? That's a huge discrepancy, so
ZJ> there's something that's not being counted.

du is not the same as -s.

du measures real blocks in use. unix files (notably dbm types as well as
others) can have gaps so the maximum offset (what -s sees) can be much
greater than the actual storage used. du gets into the inode itself and
finds all the allocated blocks and counts them.

uri
 
J

Jürgen Exner

Zebee said:
But I seem to get different results to du, and different
to what my CD burning prog says. [...]
$total += -s $File::Find::name;
produces:
total = 695543582

Which apparently is the sum of the sizes of all files.
Running du -sb on the directory given to that program gets me:
750284800

Which is how much space all files together occupy on the disk.
So what am I missing about -s?

Nothing. You are just calculating two different things.
That's a huge discrepancy, so
there's something that's not being counted.

Not at all. It's just the trivial fact that usually the size of a file and
the amount of disk space it occupies are not identical and in some cases can
be very different, e.g. for sparse files.

jue
 
S

Sam Holden

As people recommended using stat, I've tried.

But I seem to get different results to du, and different
to what my CD burning prog says.

#!/usr/bin/perl -w
use strict;
use File::Find;
my $total;
my $dir = shift;
find(\&wanted, $dir);

print "total = $total \n";

sub wanted {
$total += -s $File::Find::name;
}

produces:
total = 695543582

Running du -sb on the directory given to that program gets me:
750284800

So what am I missing about -s? That's a huge discrepancy, so
there's something that's not being counted.

I am running it as root, so it's not a permissions problem.

am I overflowing some buffer somewhere?

-s is doing a stat(), which will give different answers than
"du -sb" in the presence of symbolic links.

What happens with:

sub wanted {
lstat $_;
$total += -s _;
}

?
 
S

Sam Holden

ZJ> print "total = $total \n";

ZJ> sub wanted {
ZJ> $total += -s $File::Find::name;
ZJ> }

ZJ> produces:
ZJ> total = 695543582

ZJ> Running du -sb on the directory given to that program gets me:
ZJ> 750284800

ZJ> So what am I missing about -s? That's a huge discrepancy, so
ZJ> there's something that's not being counted.

du is not the same as -s.

du measures real blocks in use. unix files (notably dbm types as well as
others) can have gaps so the maximum offset (what -s sees) can be much
greater than the actual storage used. du gets into the inode itself and
finds all the allocated blocks and counts them.

The '-b' option to GNU du changes that behaviour to calculate the
"apparent size" and not the disk usage (which is silly for a program named
"du", but that's another issue). I don't know the various flavours of
du, but the non-GNU ones I have access to don't have a '-b' option at all.
So it's likely (given my small sample) that the OP is using GNU du.

Also, wouldn't that result in "du" giving a smaller total, not a larger
total?
 
Z

Zebee Johnstone

In comp.lang.perl.misc on Wed, 25 Aug 2004 04:30:10 GMT
Uri Guttman said:
du is not the same as -s.

du measures real blocks in use. unix files (notably dbm types as well as
others) can have gaps so the maximum offset (what -s sees) can be much
greater than the actual storage used. du gets into the inode itself and
finds all the allocated blocks and counts them.

Much greater? So shouldn't -s therefore come up with a bigger size?

But it came up with a much smaller one.

Or am I misunderstanding what you mean by offset?

Is there a perl method that does the right thing? If -s is
undercounting then it's not very helpful to find sizes...

So it might well be back to du!

Zebee
 
Z

Zebee Johnstone

In comp.lang.perl.misc on Wed, 25 Aug 2004 04:31:43 GMT
Jürgen Exner said:
Not at all. It's just the trivial fact that usually the size of a file and
the amount of disk space it occupies are not identical and in some cases can
be very different, e.g. for sparse files.

Which is something not being counted :) If only unused blocks...

I want to take enough files to fit on a CD and put those files in
a directory and then make an CD from the directory.

If -s won't do it, what will? Or do I just use du in backticks?

Zebee
 
Z

Zebee Johnstone

In comp.lang.perl.misc on 25 Aug 2004 04:57:09 GMT
Sam Holden said:
-s is doing a stat(), which will give different answers than
"du -sb" in the presence of symbolic links.

There aren't many of those in the given dir
What happens with:

sub wanted {
lstat $_;
$total += -s _;
}

total = 695472130
compared to the simple -s which was total = 695543582
and du which was 750284800

Zebee
 
Z

Zebee Johnstone

In comp.lang.perl.misc on 25 Aug 2004 05:05:23 GMT
Sam Holden said:
The '-b' option to GNU du changes that behaviour to calculate the
"apparent size" and not the disk usage (which is silly for a program named
"du", but that's another issue). I don't know the various flavours of
du, but the non-GNU ones I have access to don't have a '-b' option at all.
So it's likely (given my small sample) that the OP is using GNU du.

It's a linux box, so use gnu du, although the info page has nothing about
"apparent size" but says " `du' reports the amount of disk space used by
the specified files and for each subdirectory (of directory arguments). "
and -b, --bytes print size in bytes


I used -b because otherwise it gives it in 1024 byte chunks:
[root@clone backups]# du -s burn
732700 burn

Zebee
 
U

Uri Guttman

ZJ> sub wanted {
ZJ> $total += -s $File::Find::name;
ZJ> }ZJ> produces:
ZJ> total = 695543582ZJ> Running du -sb on the directory given to that program gets me:
ZJ> 750284800ZJ> So what am I missing about -s? That's a huge discrepancy, so
ZJ> there's something that's not being counted.
SH> The '-b' option to GNU du changes that behaviour to calculate the
SH> "apparent size" and not the disk usage (which is silly for a program named
SH> "du", but that's another issue). I don't know the various flavours of
SH> du, but the non-GNU ones I have access to don't have a '-b' option at all.
SH> So it's likely (given my small sample) that the OP is using GNU du.

SH> Also, wouldn't that result in "du" giving a smaller total, not a larger
SH> total?

good point but it still is a real discrepancy. gnu du says:

-b, --bytes
print size in bytes

so it prints the size used in bytes. it still isn't -s.

ls -l pad_bench.pl
-rw-r--r-- 1 uri staff 523 May 3 03:25 pad_bench.pl
perl -le 'print -s "pad_bench.pl"'
523
du -sb pad_bench.pl
1024 pad_bench.pl

so du will add up the unused bytes in the trailing blocks. and it will
still skip missing blocks (maybe he has none of those in that dir tree).

uri
 
S

Sam Holden

[snip du and perl's '-s' giving different results]
good point but it still is a real discrepancy. gnu du says:

-b, --bytes
print size in bytes

so it prints the size used in bytes. it still isn't -s.

ls -l pad_bench.pl
-rw-r--r-- 1 uri staff 523 May 3 03:25 pad_bench.pl
perl -le 'print -s "pad_bench.pl"'
523
du -sb pad_bench.pl
1024 pad_bench.pl

so du will add up the unused bytes in the trailing blocks. and it will
still skip missing blocks (maybe he has none of those in that dir tree).

My du doesn't seem to do that:

; ls -l resolver.pl
-rw-r--r-- 1 sholden pgrad 436 Jul 18 15:58 resolver.pl
; perl -le 'print -s "resolver.pl"'
436
; du -sb resolver.pl
436 resolver.pl
;

But I see that it's a version thing...

; du --version
du (coreutils) 5.2.1
; ./du --version
du (fileutils) 4.1
; du -bs resolver.pl
436 resolver.pl
; ./du -bs resolver.pl
4096 resolver.pl

So the --apparent-size was added sometime between those two versions
and -b changed to be:

-b, --bytes equivalent to `--apparent-size --block-size=1'

The joys of incompatable unix tools - someone should write a portable
scripting language to avoid these issues...

For the OP: You could use blocks count in the stat result, but I
don't know how to determine the blocksize for the filesystem. Plus
it if you are creating a CD image, then you are creating a new
filesystem whose blocksize may well be different so the
count may be useless anyway?
 
M

Martien Verbruggen

In comp.lang.perl.misc on Wed, 25 Aug 2004 04:31:43 GMT


Which is something not being counted :) If only unused blocks...

I want to take enough files to fit on a CD and put those files in
a directory and then make an CD from the directory.

If -s won't do it, what will? Or do I just use du in backticks?

Have you tried

my ($block_size, $blocks) = (stat $_)[11, 12];
my $du_size = $block_size * $blocks;

If your find doesn't span multiple file systems, $block_size probably
is constant, and you can optimise that out. Even if you span multiple
file systems, the chance is still high that $block_size is constant.

Martien
 
S

Sam Holden

In comp.lang.perl.misc on Wed, 25 Aug 2004 04:31:43 GMT


Which is something not being counted :) If only unused blocks...

I want to take enough files to fit on a CD and put those files in
a directory and then make an CD from the directory.

If -s won't do it, what will? Or do I just use du in backticks?

Have you tried

my ($block_size, $blocks) = (stat $_)[11, 12];
my $du_size = $block_size * $blocks;

If your find doesn't span multiple file systems, $block_size probably
is constant, and you can optimise that out. Even if you span multiple
file systems, the chance is still high that $block_size is constant.

On my system multiplying those two numbers doesn't work.

; echo a >test.file
; perl -le 'print join "\t", (stat "test.file")[11, 12]'
4096 8
;

Clearly the file uses 1 block (it is only 1 byte) but it is
reported as using 8. It seems to be reporting the block
count in terms of 512 byte blocks even though 4096 byte
blocks are actually used.

Of course (as shown in my other posts) my system seems
a little strange (with a strangely behaving du...)

I'd still argue that taking the file size and rounding up to
a multiple of the blocksize of the CD file system you are
going to create is the only correct approach. But I
know next to nothing about the specifics of those file
systems so there could be tail packing or something to
ruin that approach...
 
U

Uri Guttman

SH> My du doesn't seem to do that:

SH> ; ls -l resolver.pl
SH> -rw-r--r-- 1 sholden pgrad 436 Jul 18 15:58 resolver.pl
SH> ; perl -le 'print -s "resolver.pl"'
SH> 436
SH> ; du -sb resolver.pl
SH> 436 resolver.pl
SH> ;

SH> But I see that it's a version thing...

SH> ; du --version
SH> du (coreutils) 5.2.1
SH> ; ./du --version
SH> du (fileutils) 4.1
SH> ; du -bs resolver.pl
SH> 436 resolver.pl
SH> ; ./du -bs resolver.pl
SH> 4096 resolver.pl

interesting. i have du (GNU fileutils) 4.0 on my sparc/solaris.

SH> So the --apparent-size was added sometime between those two versions
SH> and -b changed to be:

SH> -b, --bytes equivalent to `--apparent-size --block-size=1'

bah!

i have always used du with block counts as i wanted 'disk usage'. i
never cared about byte usage. in fact i always use the -k option for du
since i want to know storage that way.

SH> The joys of incompatable unix tools - someone should write a portable
SH> scripting language to avoid these issues...

hmmmm.

uri
 
Z

Zebee Johnstone

In comp.lang.perl.misc on Wed, 25 Aug 2004 14:13:04 GMT
Uri Guttman said:
i have always used du with block counts as i wanted 'disk usage'. i
never cared about byte usage. in fact i always use the -k option for du
since i want to know storage that way.

du (fileutils) 4.1
Written by Torbjorn Granlund, David MacKenzie, Larry McVoy, and Paul
Eggert.
Copyright (C) 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is
NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.

[zebee@clone zebee]$ ls -l netgear.cfg
-rw-r--r-- 1 zebee www 30452 Aug 28 2003 netgear.cfg

[zebee@clone zebee]$ du -b netgear.cfg
32768 netgear.cfg

[zebee@clone zebee]$ du netgear.cfg
32 netgear.cfg

[zebee@clone zebee]$ perl -le 'print -s "netgear.cfg"'
30452

is there a perl way to get block usage?

Zebee
 
U

Uri Guttman

ZJ> du (fileutils) 4.1
ZJ> Written by Torbjorn Granlund, David MacKenzie, Larry McVoy, and Paul
ZJ> Eggert.
ZJ> Copyright (C) 2001 Free Software Foundation, Inc.
ZJ> This is free software; see the source for copying conditions. There is
ZJ> NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
ZJ> PURPOSE.

ZJ> [zebee@clone zebee]$ ls -l netgear.cfg
ZJ> -rw-r--r-- 1 zebee www 30452 Aug 28 2003 netgear.cfg

ZJ> [zebee@clone zebee]$ du -b netgear.cfg
ZJ> 32768 netgear.cfg

ZJ> [zebee@clone zebee]$ du netgear.cfg
ZJ> 32 netgear.cfg

ZJ> [zebee@clone zebee]$ perl -le 'print -s "netgear.cfg"'
ZJ> 30452

ZJ> is there a perl way to get block usage?

see the other posts by sam. he is using a more recent du which makes -b
act more like -s. but that still won't handle gaps correctly. sam
recommend rounding up the -s to the next block size (or you could just
count blocks with a mod (%) operation on the block size). you really
need block counts IMO as that is what the cdrom will need. fractional
trailing blocks still take up whole blocks on most file systems (reiser
is one that doesn't do that).

uri
 
P

Paul Gaborit

À (at) Wed, 25 Aug 2004 14:36:59 GMT,
Zebee Johnstone said:
[zebee@clone zebee]$ perl -le 'print -s "netgear.cfg"'
30452

is there a perl way to get block usage?

Yes :

$ perl -le 'print +(stat "netgear.cfg")[12]'

But, is there a perl way to get block size ? ;-)
 
M

Martien Verbruggen

In comp.lang.perl.misc on Wed, 25 Aug 2004 04:31:43 GMT

That's a huge discrepancy, so
there's something that's not being counted.

Not at all. It's just the trivial fact that usually the size of a file and
the amount of disk space it occupies are not identical and in some cases can
be very different, e.g. for sparse files.

Which is something not being counted :) If only unused blocks...

I want to take enough files to fit on a CD and put those files in
a directory and then make an CD from the directory.

If -s won't do it, what will? Or do I just use du in backticks?

Have you tried

my ($block_size, $blocks) = (stat $_)[11, 12];
my $du_size = $block_size * $blocks;

If your find doesn't span multiple file systems, $block_size probably
is constant, and you can optimise that out. Even if you span multiple
file systems, the chance is still high that $block_size is constant.

On my system multiplying those two numbers doesn't work.

; echo a >test.file
; perl -le 'print join "\t", (stat "test.file")[11, 12]'
4096 8

That's correct. I was wrong. The block size in stat is not the size of
the blocks on disk, but the size of the file system I/O preferred
block read.

The nblocks field gives the count in blocks of 512 bytes, as in the
block size on the original UFS.
I'd still argue that taking the file size and rounding up to
a multiple of the blocksize of the CD file system you are
going to create is the only correct approach. But I
know next to nothing about the specifics of those file
systems so there could be tail packing or something to
ruin that approach...

(stat $_)[12] * 512 should, I believe, work on all Unix systems. It
seems to work on both the Linux and the Solaris system I've got here.
However, it does not work the same way under Cygwin, where the
multiplication rule that I came up with earlier holds up.

It looks like as long as you know what the "fundamental" block size of
your file system is, you can translate the number of blocks to space
taken up on the file system.

Since, indeed, the idea seems to be that these files are going to be
packed onto a CD, using the block size of the ISO9660 file system in
the way you describe should be ok. I don't know whether it's easy to
predict what directory sizes and such will be.

Martien
 
J

Joe Smith

Zebee said:
$total += -s $File::Find::name;
produces: total = 695543582
Running du -sb on the directory given to that program gets me:
750284800

For file systems where you know for sure that disk allocation
is based on 1K blocks, I have used something like this:

$bytes = -s $name;
$blocks = ($bytes + 1023) >> 10;
$total += $blocks << 10;

But that would not take into consideration the overhead for
large files (indirect blocks and double indirect blocks)
the way that (stat $name)[12]*512 does.

The overhead required for large files on the CD is something you
ought to include in your calculations.
-Joe
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,159
Messages
2,570,882
Members
47,419
Latest member
ArturoBres

Latest Threads

Top