Need Help with Program in Perl on a Netware Server

F

fhadzocos

I have a huge folder with 44 thousand files I need to keep only the
last file for each type in that folder

ex

workstation1.001
workstation1.002
workstation2.49
workstation2.56
workstation20.560
workstation20.561
workstation20.562

In that example I need to keep workstation1.002 workstation2.56
workstation20.562 and this would continue until the end of the files.
There is only one folder to look in.
I have no idea how to go about doing this in perl. I'm very new to perl
and do not wish to make an inefficient program.

Thanks
Mobius1982
 
F

fhadzocos

Here are some of the real file names.

00A0D1B9233C_1153332333000_1.STR
00AD1B44961_1102616538000_556.STR

Thanks
Mobius1982
 
P

Paul Lalli

I have a huge folder with 44 thousand files I need to keep only the
last file for each type in that folder

ex

workstation1.001
workstation1.002
workstation2.49
workstation2.56
workstation20.560
workstation20.561
workstation20.562

In that example I need to keep workstation1.002 workstation2.56
workstation20.562 and this would continue until the end of the files.
There is only one folder to look in.
I have no idea how to go about doing this in perl. I'm very new to perl
and do not wish to make an inefficient program.

First, I strongly recommend you create a program that *works* first,
and make it efficient second.

My recommendation is that you loop through the directory, grabbing each
file name. For each file, split it to get the basename and suffix
(see: perldoc File::Basename). Check if the basename exists in your
hash. If not, add it, with a value of the suffix. If it does exist,
compare the current suffix with the one that is already stored. If the
current one is less, remove the current file (perldoc -f unlink). If
the current one is greater, remove the file represented by the basename
and stored suffix, and replace the value with the current suffix.

That English description may be a bit confusing, (and we're always
telling people to Speak Perl rather than English), so here's a short
script to get you started.

#!/usr/bin/perl
use strict;
use warnings;
use File::Basename;

opendir my $dh, $ARGV[0] or die "Cannot open directory $ARGV[0]: $!\n";

my %suffix_of;
while (my $file = readdir($dh)) {
next if $file =~ /^\.\.?$/ or ! -f "$ARGV[0]/$file";
my ($base, undef, $suffix) = fileparse($file, qr/\..*/);
$suffix =~ s/^\.//;
if (! exists $suffix_of{$base} ){
$suffix_of{$base} = $suffix;
}
elsif ($suffix_of{$base} > $suffix) {
unlink "$ARGV[0]/$file" or
warn "Could not remove $ARGV[0]/$file: $!\n";
}
else {
unlink "$ARGV[0]/$base.$suffix_of{$base}" or
warn "Could not remove $ARGV[0]/$base.$suffix_of{$base}:
$!\n";
$suffix_of{$base} = $suffix;
}
}


Paul Lalli
 
P

Paul Lalli

Here are some of the real file names.

*WHY* didn't you include real data in your original post?
00A0D1B9233C_1153332333000_1.STR
00AD1B44961_1102616538000_556.STR

In your original, you made it appear that by "last file" you meant
"file with the greatest suffix, numerically). How, exactly are you
determining "last file" now? And how are you grouping these files?

My program, which I spent a good 10 minutes writing, is not going to
work with your data. You have wasted my time by posting incorrect
requirements and data. That was very rude of you.

Paul Lalli
 
F

fhadzocos

I am still looking for the greatest numerically.

00A0D1B9233C_1153332333000_1.STR
00AD1B44961_1102616538000_556.STR

I just gave you two file names as examples for instance
00A0D1B9233C_1153332333000_2.STR
00A0D1B9233C_1153332333000_5.STR
00A0D1B9233C_1153332333000_7.STR
00A0D1B9233C_1153332333000_8.STR
00A0D1B9233C_1153332333000_9.STR

out of all of those I would only want 00A0D1B9233C_1153332333000_9.STR
and the rest to be deleted. And then look at the next
00AD1B44961_1102616538000_556.STR and find only the last file in that
numerically and delete the rest.

I'm not looking for a freebie and I appreiciate all your help. I'm a
co-op student at a company and have done very little in perl.
publishing my poor attempt which crashes heavily I thought would be a
waste of forum space. I will use your advice and publish when I'm
closer to something and ask for advice on how to fix those problems.

Once again thanks for your time.
Mobius1982
 
M

Mumia W.

I have a huge folder with 44 thousand files I need to keep only the
last file for each type in that folder

ex

workstation1.001
workstation1.002
workstation2.49
workstation2.56
workstation20.560
workstation20.561
workstation20.562

In that example I need to keep workstation1.002 workstation2.56
workstation20.562 and this would continue until the end of the files.
There is only one folder to look in.
I have no idea how to go about doing this in perl. I'm very new to perl
and do not wish to make an inefficient program.

Thanks
Mobius1982

I would create a hash for the filenames (without extensions).
For each file found, I'd put the base name in the hash as a
key, and the full file name would go in if either there were
no value for that hash key or if the new filename-extension
were numerically larger than the hash value's
filename-extension. This creates a list of files to keep (as
hash values).

Then I would make those files (the hash values) read-only and
delete every other file in the folder. Afterward, I'd remove
the read-only flags from the remaining files.
 
P

Paul Lalli

Mumia said:
Then I would make those files (the hash values) read-only and
delete every other file in the folder. Afterward, I'd remove
the read-only flags from the remaining files.

Why the read-only bit of this? What does that gain for you?

Paul Lalli
 
M

Mumia W.

Why the read-only bit of this? What does that gain for you?

Paul Lalli

Come to think about it, not much.

My algorithm only identifies files to keep--not files to
delete. For deleting files, I planned on using 'rm *', and
that means that the files to keep would have to be read-only.

But now that I think about it, 'rm *' won't like a parameter
list that's 44 thousand items big, so I'd have to use
opendir/closedir and unlink. And since that's the case, I
might as well check each file against the hash to test if it
should be deleted.

Oh well, TMTOWTDI.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,197
Messages
2,571,041
Members
47,643
Latest member
ashutoshjha_1101

Latest Threads

Top