A script to separate out file names from the path?

R

Rich Grise

I have a collection of about 6000 files that need to be reorganized.
These have been strewn all over the place, from CDs to various partitions
and subdirectories on different workstations, to a pile of various
subdirectories from our Samba server, and what-not.

They're all on different depths of subdir, and I'm almost certain that
there's a lot of redundancy - I've got a list that looks something like
this example:

/Collection/a/b/c/d/file1
/Collection/a/b/c/d/file2
/Collection/a/b/c/d/file3
/Collection/a/b/c/d/file4
/Collection/a/b/c/e/file4
/Collection/a/b/c/e/file5
/Collection/e/f/g/file4
/Collection/e/f/g/file5
/Collection/e/f/g/file6
/Collection/e/f/g/file7

and so on; as you can see, they're at different subdir depths;
what I want to do, if possible, is to take this array, split out
only the last component (after some unknown number of '/', but
the last one in the string), put it in the front of a new
string, then concatenate the original line;

The ultimate goal is to sort these by filename - I could kill
a lot of reduncancy pretty easy that way.

But it turns out, what I've been trying to do is use
for (<>) {
my @line = split(/\//,$_);
my $count = @line;
print (@line[$count-1], " : ", $_);
}

doesn't seem to accomplish what I think it should. Here's the
script I've got so far:

#!/usr/bin/perl

while (<>) {
$input = chop($_);
@line = split(/\//,$input);
$count = @line;
print ("count = ", $count, "\n");

# foreach $item(@line) {
# print (" item = ", $item);
# }
# print ("count = ", $count, " ");

# for ($i = 0; $i < $count; $i++) {
# print (" item ", $i, " = ", @line[$i], " ");
# }

# $myitem = @line[$count-1];

# print (@line[$count-1]);

# print ": ";
# print $input;
# print "\n";
}


As you can seem I've tried variations on this, and nothing I've
tried yet has done what I want.

Here's the input (example):

/Collection/a/b/c/d/file1
/Collection/a/b/c/d/file2
/Collection/a/b/c/d/file3
/Collection/a/b/c/d/file4
/Collection/a/b/c/e/file4
/Collection/a/b/c/e/file5
/Collection/e/f/g/file4
/Collection/e/f/g/file5
/Collection/e/f/g/file6
/Collection/e/f/g/file7

And here's what I want the output to look like:

file1 : /Collection/a/b/c/d/file1
file2 : /Collection/a/b/c/d/file2
file3 : /Collection/a/b/c/d/file3
file4 : /Collection/a/b/c/d/file4
file4 : /Collection/a/b/c/e/file4
file5 : /Collection/a/b/c/e/file5
file4 : /Collection/e/f/g/file4
file5 : /Collection/e/f/g/file5
file6 : /Collection/e/f/g/file6
file7 : /Collection/e/f/g/file7

Which I could sort, and track down the duplicates.

But I'm stuck on rearranging the strings. )-;

Would anyone wish to be so kind as to volunteer to do my homework for me?

Thanks,
Rich
 
J

J. Gleixner

Rich Grise wrote:
[...]
The ultimate goal is to sort these by filename - I could kill
a lot of reduncancy pretty easy that way.

But it turns out, what I've been trying to do is use
for (<>) {
my @line = split(/\//,$_);
my $count = @line;
print (@line[$count-1], " : ", $_);
}

You can use a negative index.

my @arr = qw(a b c d e);
print $arr[-1];

Will print: e

Note: It's $line[] not @line[].

And since split returns a list, you could get the last item:

my $last_item = ( split /\// ) [-1];

Would anyone wish to be so kind as to volunteer to do my homework for me?
No, however most people will help you learn the language so you can do
it yourself.
 
L

Lew Pitcher

Rich said:
I have a collection of about 6000 files that need to be reorganized.
These have been strewn all over the place, from CDs to various partitions
and subdirectories on different workstations, to a pile of various
subdirectories from our Samba server, and what-not.

They're all on different depths of subdir, and I'm almost certain that
there's a lot of redundancy - I've got a list that looks something like
this example:

/Collection/a/b/c/d/file1
/Collection/a/b/c/d/file2
/Collection/a/b/c/d/file3
/Collection/a/b/c/d/file4
/Collection/a/b/c/e/file4
/Collection/a/b/c/e/file5
/Collection/e/f/g/file4
/Collection/e/f/g/file5
/Collection/e/f/g/file6
/Collection/e/f/g/file7

and so on; as you can see, they're at different subdir depths;
what I want to do, if possible, is to take this array, split out
only the last component (after some unknown number of '/', but
the last one in the string), put it in the front of a new
string, then concatenate the original line;

The ultimate goal is to sort these by filename - I could kill
a lot of reduncancy pretty easy that way.

But it turns out, what I've been trying to do is use
for (<>) {
my @line = split(/\//,$_);
my $count = @line;
print (@line[$count-1], " : ", $_);
}

doesn't seem to accomplish what I think it should. Here's the
script I've got so far:
[snip]

I say why use complex tools when simple tools will suffice

Have you looked at the basename(1) and dirname(1) utilities?

lpitcher@merlin:~$ basename /Collection/a/b/c/d/file1.a
file1.a
lpitcher@merlin:~$ basename /Collection/a/b/c/d/file1
file1

lpitcher@merlin:~$ dirname /Collection/a/b/c/d/file1.a
/Collection/a/b/c/d
lpitcher@merlin:~$ dirname /Collection/a/b/c/d/file1
/Collection/a/b/c/d

Something as simple as

#!/bin/bash
echo `basename $1`: $1

might do the trick

HTH
 
J

John W. Krahn

Rich said:
I have a collection of about 6000 files that need to be reorganized.
These have been strewn all over the place, from CDs to various partitions
and subdirectories on different workstations, to a pile of various
subdirectories from our Samba server, and what-not.

They're all on different depths of subdir, and I'm almost certain that
there's a lot of redundancy - I've got a list that looks something like
this example:

/Collection/a/b/c/d/file1
/Collection/a/b/c/d/file2
/Collection/a/b/c/d/file3
/Collection/a/b/c/d/file4
/Collection/a/b/c/e/file4
/Collection/a/b/c/e/file5
/Collection/e/f/g/file4
/Collection/e/f/g/file5
/Collection/e/f/g/file6
/Collection/e/f/g/file7

and so on; as you can see, they're at different subdir depths;
what I want to do, if possible, is to take this array, split out
only the last component (after some unknown number of '/', but
the last one in the string), put it in the front of a new
string, then concatenate the original line;

The ultimate goal is to sort these by filename - I could kill
a lot of reduncancy pretty easy that way.

But it turns out, what I've been trying to do is use
for (<>) {
my @line = split(/\//,$_);
my $count = @line;
print (@line[$count-1], " : ", $_);

You are using an array slice when you should be using a scalar:

Found in /usr/lib/perl5/5.8.6/pod/perlfaq4.pod
What is the difference between $array[1] and @array[1]?

And you can use negative numbers to index from the end of the array:

print "$line[-1] : $_";

}

doesn't seem to accomplish what I think it should. Here's the
script I've got so far:

#!/usr/bin/perl

use warnings;
use strict;
while (<>) {
$input = chop($_);

You should use chomp instead of chop.
@line = split(/\//,$input);
$count = @line;
print ("count = ", $count, "\n");

# foreach $item(@line) {
# print (" item = ", $item);
# }
# print ("count = ", $count, " ");

# for ($i = 0; $i < $count; $i++) {
# print (" item ", $i, " = ", @line[$i], " ");
# }

# $myitem = @line[$count-1];

# print (@line[$count-1]);

# print ": ";
# print $input;
# print "\n";
}


#!/usr/bin/perl
use warnings;
use strict;

use File::Basename;

print map /\0(.+)/s,
sort
map basename( $_ ) . "\0$_",
<>;

__END__




John
 
P

Paul Lalli

Rich said:
for (<>) {
my @line = split(/\//,$_);
my $count = @line;
print (@line[$count-1], " : ", $_);
}

doesn't seem to accomplish what I think it should.

No, that would have worked perfectly well. It's just not at all what
you did.
Here's the
script I've got so far:

#!/usr/bin/perl

while (<>) {
$input = chop($_);

perldoc -f chop
chop VARIABLE
chop( LIST )
chop Chops off the last character of a string and returns
the character chopped.

Did you bother printing $index to see what it was? It's not the line
minus the trailing newline. It's the trailing newline.

You should be using chomp anyway.

while (my $input = <>) {
chomp $input;
#etc
}

Regardless, use File::Basename, as another responder suggested. This
wheel has already been written.

Paul Lalli
 
U

Uri Guttman

LP> I say why use complex tools when simple tools will suffice

LP> Have you looked at the basename(1) and dirname(1) utilities?

i say why use external shell commands when File::Basename is a core
module?

uri
 
R

Rich Grise

The module File::Basename is part of your standard Perl distribution.

Sorry for the bother - I just did it the old way in C, which I know is
heresy for the perl group. =:-O

/* relist.c */
/* reformats strings. */

#include <stdio.h>

char buffer[512];
char * bufp;

int main() {
while (bufp = gets(buffer)) {
bufp = strrchr(buffer, '/');
printf ("item ID = %s, data = %s\n", bufp + 1, buffer);
}
}

Thanks!
Rich
 
D

Dr.Ruud

Rich Grise schreef:
#include <stdio.h>

char buffer[512];
char * bufp;

int main() {
while (bufp = gets(buffer)) {
bufp = strrchr(buffer, '/');
printf ("item ID = %s, data = %s\n", bufp + 1, buffer);
}
}

Perl version:

while ( <> =~ m~(.+/(.+))~ ) {
printf "item ID = %s, data = %s\n", $2, $1 ;
}
 
T

Tad McClellan

["Followup-To:" header set to comp.lang.perl.misc.]

Here's the input (example):

/Collection/a/b/c/d/file1
/Collection/a/b/c/d/file2
/Collection/a/b/c/d/file3
/Collection/a/b/c/d/file4
/Collection/a/b/c/e/file4
/Collection/a/b/c/e/file5
/Collection/e/f/g/file4
/Collection/e/f/g/file5
/Collection/e/f/g/file6
/Collection/e/f/g/file7

And here's what I want the output to look like:

file1 : /Collection/a/b/c/d/file1
file2 : /Collection/a/b/c/d/file2
file3 : /Collection/a/b/c/d/file3
file4 : /Collection/a/b/c/d/file4
file4 : /Collection/a/b/c/e/file4
file5 : /Collection/a/b/c/e/file5
file4 : /Collection/e/f/g/file4
file5 : /Collection/e/f/g/file5
file6 : /Collection/e/f/g/file6
file7 : /Collection/e/f/g/file7


perl -pe 's/(.*\/(.*))/$2 : $1/' input.file
 
T

Ted Zlatanov

The module File::Basename is part of your standard Perl distribution.

Sorry for the bother - I just did it the old way in C, which I know is
heresy for the perl group. =:-O

/* relist.c */
/* reformats strings. */

#include <stdio.h>

char buffer[512];
char * bufp;

int main() {
while (bufp = gets(buffer)) {
bufp = strrchr(buffer, '/');
printf ("item ID = %s, data = %s\n", bufp + 1, buffer);
}
}

It's not heresy, just not interesting--most of us have written C and
much prefer Perl. Also, you shouldn't use gets(). Ever. Henry
Spencer explains it better than I could:

http://isthe.com/chongo/tech/comp/c/10com.html

Ted
 
T

Ted Zlatanov

I say why use complex tools when simple tools will suffice

Excellent point. But also, you have to know the complex ways in which
simple tools can fail.
Something as simple as

#!/bin/bash
echo `basename $1`: $1

# touch 'a b'

# cat b.sh
#!/bin/bash
echo `basename $1`: $1

# ./b.sh 'a b'
a: a b

You need the second line to be

echo `basename "$1"`: $1

and even that may have trouble on systems like Windows that don't have
a `basename' program available by default.

Ted
 
R

Rich Grise

I have a collection of about 6000 files that need to be reorganized.

After reading the wealth of responses, I decided to go ahead and break
form and reply to myself, because I want to say thank you to each and
every one of you, but I'm too lazy to type this six times. ;-)

Thanks!
Rich
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top