To extract file name only from a file

R

Rider

Hi experts,

I have this file, inut.txt (listed below). each line in the file has
more than 10 fields, but I am just listing a sample format here.

I need to print out only the filenames that are ending with .txt in
the output..

The output should be:
===============
unixFile1.txt
unixFile2.txt
unixFile3.txt
....
...
===============

I am looking out for a shorter form of a reg exp to extract only the
file names in to the output here. I do the basic perl coding on an
occasional basis, but don't know the right reg exp to do it.

Thanks in advance,
J


Here is the input file.
===================
1, bob, usr/tst/unixFile1.txt, boston, text1, text2, text3
2, bob, usr/tst/unixFile2.txt, boston, text1, text2, text3
3, bob, usr/tst/unixFile3.txt, boston, text1, text2, text3
......
.....
===================
 
J

Josef Moellers

Rider said:
Hi experts,

I have this file, inut.txt (listed below). each line in the file has
more than 10 fields, but I am just listing a sample format here.

I need to print out only the filenames that are ending with .txt in
the output..

The output should be:
===============
unixFile1.txt
unixFile2.txt
unixFile3.txt
...
..
===============

I am looking out for a shorter form of a reg exp to extract only the
file names in to the output here. I do the basic perl coding on an
occasional basis, but don't know the right reg exp to do it.

Thanks in advance,
J


Here is the input file.
===================
1, bob, usr/tst/unixFile1.txt, boston, text1, text2, text3
2, bob, usr/tst/unixFile2.txt, boston, text1, text2, text3
3, bob, usr/tst/unixFile3.txt, boston, text1, text2, text3

$f1 = (split(/,\s+/, $line))[2];
print "$f1\n" if $f1 =~ /\.txt$/;
 
J

Josef Moellers

Rider said:
Hi experts,

I have this file, inut.txt (listed below). each line in the file has
more than 10 fields, but I am just listing a sample format here.

I need to print out only the filenames that are ending with .txt in
the output..

The output should be:
===============
unixFile1.txt
unixFile2.txt
unixFile3.txt
...
..
===============

I am looking out for a shorter form of a reg exp to extract only the
file names in to the output here. I do the basic perl coding on an
occasional basis, but don't know the right reg exp to do it.

Thanks in advance,
J


Here is the input file.
===================
1, bob, usr/tst/unixFile1.txt, boston, text1, text2, text3
2, bob, usr/tst/unixFile2.txt, boston, text1, text2, text3
3, bob, usr/tst/unixFile3.txt, boston, text1, text2, text3
.....
....
===================


$f1 = (split(/\//, (split(/,\s+/, $line))[2]))[-1];
print "$f1\n" if $f1 =~ /\.txt$/;
 
J

Jürgen Exner

Rider said:
I have this file, inut.txt (listed below). each line in the file has
more than 10 fields, but I am just listing a sample format here.

I need to print out only the filenames that are ending with .txt in
the output..

The output should be:
===============
unixFile1.txt
unixFile2.txt

This looks like a standard CSV format, and you want the third column. So
I would use Text::CSV and grab the third element from each row.

If you insist on reinventing the wheel then at least for the sample data
you have shown you can grab the third element after split()ing each line
at the comma.

jue
 
R

Rider

Rider said:
Hi experts,
I have this file, inut.txt (listed below). each line in the file has
more than 10 fields, but I am just listing a sample format here.
I need to print out only the filenames that are ending with .txt in
the output..
The output should be:
===============
unixFile1.txt
unixFile2.txt
unixFile3.txt
...
..
===============
I am looking out for a shorter form of a reg exp to extract only the
file names in to the output here. I do the basic perl coding on an
occasional basis, but don't know the right reg exp to do it.
Thanks in advance,
J
Here is the input file.
===================
1, bob, usr/tst/unixFile1.txt, boston, text1, text2, text3
2, bob, usr/tst/unixFile2.txt, boston, text1, text2, text3
3, bob, usr/tst/unixFile3.txt, boston, text1, text2, text3
.....
....
===================

$f1 = (split(/\//, (split(/,\s+/, $line))[2]))[-1];
print "$f1\n" if $f1 =~ /\.txt$/;

--
These are my personal views and not those of Fujitsu Technology Solutions!
Josef Möllers (Pinguinpfleger bei FTS)
        If failure had no penalty success would not be a prize (T..  Pratchett)
Company Details:http://de.ts.fujitsu.com/imprint.html

Thanks Josef,

But I am looking out for a one-liner of just grabbing the only file
name that ends with .txt from each line with no need of using split
function. I am sure that that I saw that kind of reg expression
before, but I can not recall now.
 
J

Jürgen Exner

Rider said:
But I am looking out for a one-liner of just grabbing the only file
name that ends with .txt from each line with no need of using split
function. I am sure that that I saw that kind of reg expression
before, but I can not recall now.

Unless this is some academic excercise why do you want to do it the hard
way?
It is the easy way and the most robust way to use Text::CSV, grab the
third item, and then use File::Basename to extract the file name.

Or actually in your case you could also use
substr($line, 16, 13) #might be off by one somewhere
because the filename starts at character 16 and is 13 characters long.

Oh, you mean that's just your sample data and the actual data might vary
in lenght? Well, to bad, because your actual data may also vary in such
a way to make a regexp fail. That is exactly why using Text::CSV and
File::Basename are more robust and spare you from patching your
hand-rolled code over and over again whenever you encounter some
unforeseen data.

jue
 
R

Rider

Unless this is some academic excercise why do you want to do it the hard
way?
It is the easy way and the most robust way to use Text::CSV, grab the
third item, and then use File::Basename to extract the file name.

Or actually in your case you could also use
        substr($line, 16, 13) #might be off by one somewhere
because the filename starts at character 16 and is 13 characters long.

Oh, you mean that's just your sample data and the actual data might vary
in lenght? Well, to bad, because your actual data may also vary in such
a way to make a regexp fail. That is exactly why using Text::CSV and
File::Basename are more robust and spare you from patching your
hand-rolled code over and over again whenever you encounter some
unforeseen data.

jue

It is not a CSV file.. it is a PHP file with a lot of comments in the
middle of the file as well.
So I am looking out for a reg exp for just gets me only the file name
that is ending with .txt (this file might have a space in the middle..
example: user input.txt, instead of userinput.txt).
 
R

Rider

Rider said:
Rider wrote:
Hi experts,
I have this file, inut.txt (listed below). each line in the file has
more than 10 fields, but I am just listing a sample format here.
I need to print out only the filenames that are ending with .txt in
the output..
The output should be:
===============
unixFile1.txt
unixFile2.txt
unixFile3.txt
...
..
===============
I am looking out for a shorter form of a reg exp to extract only the
file names in to the output here. I do the basic perl coding on an
occasional basis, but don't know the right reg exp to do it.
Thanks in advance,
J
Here is the input file.
===================
1, bob, usr/tst/unixFile1.txt, boston, text1, text2, text3
2, bob, usr/tst/unixFile2.txt, boston, text1, text2, text3
3, bob, usr/tst/unixFile3.txt, boston, text1, text2, text3
.....
....
===================
$f1 = (split(/\//, (split(/,\s+/, $line))[2]))[-1];
print "$f1\n" if $f1 =~ /\.txt$/;
--
These are my personal views and not those of Fujitsu Technology Solutions!
Josef Möllers (Pinguinpfleger bei FTS)
        If failure had no penalty success would not be a prize(T.  Pratchett)
Company Details:http://de.ts.fujitsu.com/imprint.html
Thanks Josef,
But I am looking out for a one-liner of just grabbing the only file
name that ends with .txt from each line with no need of using split
function. I am sure that that I saw that kind of reg expression
before, but I can not recall now.

perl -nle 'print $1 if (/.+\/(.+\.txt)/)' rider.txt

where rider.txt is your input file.

D:\Perl\source\1>perl -nle "print $1 if (/.+\/(.+\.txt)/)" rider.txt
unixFile1.txt
unixFile2.txt
unixFile3.txt

Awesome Len..

Thanks a bunch.. this serves my purpose. Though I did not run, I can
see that it would work.
 
S

sln

Hi experts,

I have this file, inut.txt (listed below). each line in the file has
more than 10 fields, but I am just listing a sample format here.

I need to print out only the filenames that are ending with .txt in
the output..

The output should be:
===============
unixFile1.txt
unixFile2.txt
unixFile3.txt
...
..
===============

I am looking out for a shorter form of a reg exp to extract only the
file names in to the output here. I do the basic perl coding on an
occasional basis, but don't know the right reg exp to do it.

Thanks in advance,
J


Here is the input file.
===================
1, bob, usr/tst/unixFile1.txt, boston, text1, text2, text3
2, bob, usr/tst/unixFile2.txt, boston, text1, text2, text3
3, bob, usr/tst/unixFile3.txt, boston, text1, text2, text3
.....
....
===================

This might help. Its a construct your own recipe. How you use it
is up to you. Certainly not a 1-liner (or short) but neither is real
file name parsing. There might be a module you could invoke.
Or you could use something like:

/(?:(\/\s*[.-]+.*?)|([a-z0-9_][a-z0-9_ .-]*\.txt))[\s,]+/i and defined $2

-sln

----------------------------
## parse_fname_unix.pl
## (some rudimentary regex construction)
##
use strict;
use warnings;

use constant debug => 1;

my $start_char = "a-z0-9_";
my $body_chars = "$start_char .-";
my $field_seps = "\\s,";
my $fname = "[$start_char][$body_chars]*";
my $ext = "txt";
my $bad_fname = "\/\\s*[.-]+.*?";

my $qualified_name = qr/(?:($bad_fname)|($fname\.$ext))[$field_seps]+/i;

print "\n$qualified_name\n";

while (<DATA>)
{
next if (/^\s*$/);

if (debug) {
print "\n$_";
while (/$qualified_name/g)
{
print "\tBAD: $1\n" if defined $1;
print "\tOK: $2\n" if defined $2;
}
} else {
while (/$qualified_name/g and defined $2) {
print "$2\n";
}
}
}

__DATA__

-4, bob, unix/ .txt/File_-4.txt, boston, text1, unix/tst.txt/File_-4a.txt
-3, bob, unix .txt/File_-3.txt, boston, text1, text2, text3
-2, bob, unix .txt/.-File_-2.txt, boston, text1, text2, text3
-1, bob, unix .txt.-File_-1.txt, boston, text1, text2, text3
0, bob, unixFile0.txt, boston, text1, text2, text3
1, bob, usr/tst/unixFile1.txt, boston, text1, text2, text3
2, bob, usr/tst/unixFile2.txt, boston, text1, text2, text3
3, bob, usr/tst/unix.some.txt.File3.txt, boston, text1, text2, text3
4, bob, usr/tst.txt/unixFile4.Txt, boston, text1, text2, text3

--------------------
output:

(?i-xsm:(?:(/\s*[.-]+.*?)|([a-z0-9_][a-z0-9_ .-]*\.txt))[\s,]+)

-4, bob, unix/ .txt/File_-4.txt, boston, text1, unix/tst.txt/File_-4a.txt
BAD: / .txt/File_-4.txt
OK: File_-4a.txt

-3, bob, unix .txt/File_-3.txt, boston, text1, text2, text3
OK: File_-3.txt

-2, bob, unix .txt/.-File_-2.txt, boston, text1, text2, text3
BAD: /.-File_-2.txt

-1, bob, unix .txt.-File_-1.txt, boston, text1, text2, text3
OK: unix .txt.-File_-1.txt

0, bob, unixFile0.txt, boston, text1, text2, text3
OK: unixFile0.txt

1, bob, usr/tst/unixFile1.txt, boston, text1, text2, text3
OK: unixFile1.txt

2, bob, usr/tst/unixFile2.txt, boston, text1, text2, text3
OK: unixFile2.txt

3, bob, usr/tst/unix.some.txt.File3.txt, boston, text1, text2, text3
OK: unix.some.txt.File3.txt

4, bob, usr/tst.txt/unixFile4.Txt, boston, text1, text2, text3
OK: unixFile4.Txt
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,154
Members
46,702
Latest member
LukasConde

Latest Threads

Top