deleting files

R

Roedy Green

I wrote a program to tidy up my hard disk. I run it as administrator.
It tells me there are a fair number of junk files I cannot delete. I
have a utility, presumably written in C, that much more rapidly scans
my drive for junk and manages to wipe out much of the junk I could
not.

I curious if anyone has experimented and could tell me:

1. why in the C utility is so much faster than my utility. My code is
basically just a bunch of File.list() with filters. What Java doing to
dither?

2. What is the utility doing to let it kill more files?
--
Roedy Green Canadian Mind Products
http://mindprod.com
To err is human, but to really foul things up requires a computer.
~ Farmer's Almanac
It is breathtaking how a misplaced comma in a computer program can
shred megabytes of data in seconds.
 
L

Luuk

I wrote a program to tidy up my hard disk. I run it as administrator.
It tells me there are a fair number of junk files I cannot delete. I
have a utility, presumably written in C, that much more rapidly scans
my drive for junk and manages to wipe out much of the junk I could
not.

I curious if anyone has experimented and could tell me:

1. why in the C utility is so much faster than my utility. My code is
basically just a bunch of File.list() with filters. What Java doing to
dither?

2. What is the utility doing to let it kill more files?

Whats the name of that utility, and how does you java-source look like

C is compiled code, and runs faster/more-optimized than Java which is
byte-compiled (as i remember correctly) to be platform independend, so
there a penalty at the last step....
 
L

Lew

Luuk said:
C is compiled code, and runs faster/more-optimized than Java which is
Baloney.

byte-compiled (as i [sic] remember [in]correctly) to be platform independend, so
there a penalty at the last step....

Java code gets compiled to machine code under certain circumstances,
so that statement is not universally true, even if we assume you meant
"compiled to bytecode" rather than the meaningless "byte-compiled".

We don't know from the OP if the JVM startup time is part of the
perception of slowness for the Java code. For all we know it might
run just as fast as the C version once it gets started.

We also do not know what, if any, algorithmic differences there are
between the Java program and the C utility. The C program might make
platform-specific low-level calls. The speed difference might have
absolutely nothing at all whatsoever in the least to do with byte code
vs. machine code, rather due to the difference between high-level and
low-level calls. This could also account for the observed difference
in behavior.

There's no way to answer the OP's question as it stands. He wants to
know the difference in speed between two programs, written in
different languages, with markedly different behaviors, with no source
code available for either. It's a bloody meaningless question -
apples and oranges.
 
A

Arne Vajhøj

I wrote a program to tidy up my hard disk. I run it as administrator.
It tells me there are a fair number of junk files I cannot delete. I
have a utility, presumably written in C, that much more rapidly scans
my drive for junk and manages to wipe out much of the junk I could
not.

I curious if anyone has experimented and could tell me:

1. why in the C utility is so much faster than my utility. My code is
basically just a bunch of File.list() with filters. What Java doing to
dither?

It is very difficult to say why some C source code we don't
know compiled with an unknown compiler is faster than some
Java code that we don't know running in an unknown Java
implementation.

Best guess must be:
2/3 probability that your Java code is not good
1/3 probability that your Java's File implementation is not good (*)

*) this type of utility is not a typical usage of Java so it may
not have gotten much attention.
2. What is the utility doing to let it kill more files?

Without knowing what files it is then it is impossible to guess.

Arne
 
A

Arne Vajhøj

Whats the name of that utility, and how does you java-source look like

C is compiled code, and runs faster/more-optimized than Java which is
byte-compiled (as i remember correctly) to be platform independend, so
there a penalty at the last step....

The Java byte code gets JIT compiled to native code, so that
should not be a problem.

But this program should not be CPU bound anyway, so it does not
matter.

Arne
 
L

Luuk

The Java byte code gets JIT compiled to native code, so that
should not be a problem.

But this program should not be CPU bound anyway, so it does not
matter.

Arne

Yes, i got the terms mixed up.....

But, like Lew sad, things in C get optimized quicker wit
plaform-specific calls..

Back to the original question, 'junk files' should only reside in
temporary folders, so you would not have to search them.. ;)
 
J

Joshua Cranmer

I wrote a program to tidy up my hard disk. I run it as administrator.
It tells me there are a fair number of junk files I cannot delete. I
have a utility, presumably written in C, that much more rapidly scans
my drive for junk and manages to wipe out much of the junk I could
not.

I curious if anyone has experimented and could tell me:

1. why in the C utility is so much faster than my utility. My code is
basically just a bunch of File.list() with filters. What Java doing to
dither?

It is possible that the utility could be reading disk blocks directly
and reading the entire directly listing as it exists on the disk instead
of using the OS APIs.

Another point of interest is that Java constructs the array by building
a 16-element array, growing that à la ArrayList by doubles, and then
copies that back into the output array, which could be slow for
directories with a few thousand directory entries.

A final thing to note that it is possible that the utility is moving
faster because the files it is checking is already cached. You should
try observing the speed difference if you when the unknown utility first
and then the Java utility later.
2. What is the utility doing to let it kill more files?

Of course, the real answer may be that the utility is doing something
completely different which is faster than your approach. Listing
directories can actually be very computationally expensive on FAT (and I
think NTFS as well?) simply because of how the filesystem works...
 
A

Arved Sandstrom

I wrote a program to tidy up my hard disk. I run it as administrator.
It tells me there are a fair number of junk files I cannot delete. I
have a utility, presumably written in C, that much more rapidly scans
my drive for junk and manages to wipe out much of the junk I could
not.

I curious if anyone has experimented and could tell me:

1. why in the C utility is so much faster than my utility. My code is
basically just a bunch of File.list() with filters. What Java doing to
dither?

2. What is the utility doing to let it kill more files?

What this really comes down to is (as others have suggested), what's
your definition of a "junk" file? It obviously differs from the common
approaches used by industrial junk file removal programs, because I
doubt any of those are going to be using too much (if any) of a
"String[] list(FilenameFilter filter)" mechanism...in whatever language
they are written in.

AHS
 
D

Dagon

Roedy Green said:
I wrote a program to tidy up my hard disk. I run it as administrator.
It tells me there are a fair number of junk files I cannot delete.

What does this mean? A File for which delete() returns false? There's a lot
of reasons this can happen.
I have a utility, presumably written in C, that much more rapidly scans
my drive for junk and manages to wipe out much of the junk I could
not.

Yup.
I curious if anyone has experimented and could tell me:
1. why in the C utility is so much faster than my utility. My code is
basically just a bunch of File.list() with filters. What Java doing to
dither?

You don't give us enough information about the utility or your code to expect
that anyone's experiments are directly relevant.

My WAG is that the utility is using OS-specific calls and knowledge that are
faster than the more general abstractions in the JDK.
2. What is the utility doing to let it kill more files?

It depends on why you can't delete the files to start with. Most likely it's
the same answer: OS-specific knowledge and calls in the utility.
 
L

Lew

Luuk said:
Back to the original question, 'junk files' should only reside in
temporary folders, so you would not have to search them.. ;)

Huh?

"Junk" files will reside all over the place, whatever you believe should
happen. Besides, one person's junk is another person's treasure.

Tell me, is /home/lew/somefile.txt~ a junk file? (Hint: It's a backup of an
old version that the editor creates when I modify the file.) At some point,
it probably is, but it certainly isn't in a temporary directory!
 
M

Mike Schilling

Luuk said:
Yes, i got the terms mixed up.....

But, like Lew sad, things in C get optimized quicker wit
plaform-specific calls..

I believe what Lew really said is that a Windows-specific program might be
using low-level, Windows-specific calls that the Windows JVM does not use.
Or it might be as simple as the C program being multi-threaded and the Java
one being single-threaded.
 
M

Mike Schilling

Lew said:
Huh?

"Junk" files will reside all over the place, whatever you believe should
happen. Besides, one person's junk is another person's treasure.

Tell me, is /home/lew/somefile.txt~ a junk file? (Hint: It's a backup of
an old version that the editor creates when I modify the file.) At some
point, it probably is, but it certainly isn't in a temporary directory!

I wish it were -- there are few things as irritating as starting an Ant or
maven build, going to lunch, and coming back to find that it died in two
seconds, unable to clean a directory because VIM had its temp file open.
 
M

Mike Schilling

Dagon said:
My WAG is that the utility is using OS-specific calls and knowledge that
are
faster than the more general abstractions in the JDK.

Mine too. (And given how long it takes to delete a big directory tree in
the Explorer, I'd guess that the utility is faster than that too.)
 
L

Luuk

Huh?

"Junk" files will reside all over the place, whatever you believe should
happen. Besides, one person's junk is another person's treasure.

I know someone who use to keep her 'treasure' documents in the TEMP
folder.... ;)
Tell me, is /home/lew/somefile.txt~ a junk file? (Hint: It's a backup
of an old version that the editor creates when I modify the file.) At
some point, it probably is, but it certainly isn't in a temporary
directory!

This file is not a junk file, because you configured your editor to keep
this backup-file. If you do not want it there, than reconfigure your
editor, or choose another editor.
It all comes down to the definition of 'junk'-files
 
L

Lew

This file is not a junk file, because you configured your editor to keep

What an arrogant statement! I never did any such thing!
this backup-file. If you do not want it there, than reconfigure your
editor, or choose another editor.

Oh, so that will make it not a "junk" file?

You are funny.
It all comes down to the definition of 'junk'-files

I guess you imposed your definition of "junk" on me, huh? Snap!

You have said nothing to prove or even indicate by evidence that you can
reliably find "junk" files (howsoever defined) only in temporary directories
(howsoever defined). In part that's because you can't.

It is true that "junk" files (howsoever defined) can reside anywhere. There
is no guarantee or even hint that they will reside only in temporary
directories (howsoever defined). You made that up.
 
L

Luuk

What an arrogant statement! I never did any such thing!


Oh, so that will make it not a "junk" file?

You are funny.


I guess you imposed your definition of "junk" on me, huh? Snap!

You have said nothing to prove or even indicate by evidence that you can
reliably find "junk" files (howsoever defined) only in temporary
directories (howsoever defined). In part that's because you can't.

It is true that "junk" files (howsoever defined) can reside anywhere.
There is no guarantee or even hint that they will reside only in
temporary directories (howsoever defined). You made that up.

Now you are imposing your definition of "junk" on me...

lol
 
L

Lew

Luuk said:
Now you are imposing your definition of "junk" on me...

lol

Indeed. :)

I'm a universal acceptor when it comes to that definition. The fact that
definitions vary accentuates the difficulty of making any assertion about
where junk can reside.
 
M

Mike Schilling

Lew said:
Indeed. :)

I'm a universal acceptor when it comes to that definition. The fact that
definitions vary accentuates the difficulty of making any assertion about
where junk can reside.

I know where I keep my junk, and I don’t want anything deleting it.
 
R

Roedy Green

C is compiled code, and runs faster/more-optimized than Java which is
byte-compiled (as i remember correctly) to be platform independend, so
there a penalty at the last step....

I usually use Jet to compile my Java to native code. I have examined
code and discovered that it is generally better than hand-coded
assembler since it takes into account pipelines, not just cycle counts
on individual instructions.

I would have expected finding files to delete would be an i/o bound
operation.


--
Roedy Green Canadian Mind Products
http://mindprod.com
To err is human, but to really foul things up requires a computer.
~ Farmer's Almanac
It is breathtaking how a misplaced comma in a computer program can
shred megabytes of data in seconds.
 
R

Roedy Green

Back to the original question, 'junk files' should only reside in
temporary folders, so you would not have to search them.. ;)

Batik program has a custom list of temp directories it empties. It
also scans for files with extension *.bak for example in all
directories.
--
Roedy Green Canadian Mind Products
http://mindprod.com
To err is human, but to really foul things up requires a computer.
~ Farmer's Almanac
It is breathtaking how a misplaced comma in a computer program can
shred megabytes of data in seconds.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,152
Members
46,698
Latest member
LydiaHalle

Latest Threads

Top