How to use a textfile for filenames??

Jake · Feb 13, 2006

I have a textfile..in it are a large number of URL's (ex:
Http://www.yahoo.com/index.html)

I would like to take each line of the file (one URL per line) and
remove the Http://

then I would like to use those names to navigate folders in my computer
that have names that correspond (ex: cd /www.yahoo.com)

does anybody know how i could do this using perl????

thanks!
-jake-

Jeff Stampes · Feb 13, 2006

Jake said:
I have a textfile..in it are a large number of URL's (ex:
Http://www.yahoo.com/index.html)

I would like to take each line of the file (one URL per line) and
remove the Http://

then I would like to use those names to navigate folders in my computer
that have names that correspond (ex: cd /www.yahoo.com)

does anybody know how i could do this using perl????

Yes.

You'll get more responses if you show us what you tried yourself, and where you
experienced problems.

I failed in a quick google search, but it's been said before, this is a discussion group,
not a help desk.

~Jeff

Jake · Feb 13, 2006

unfortunately I'm REALLY REALLY new at perl, but I have to finish this
thing up soon. So I was hoping for some suggestions to explore, even
if they seem trivial. I haven't really tried much yet b/c I'm totally
unfamiliar with possibilities.

I hope this isn't to vague and needy :/

-jake-

A. Sinan Unur · Feb 13, 2006

unfortunately I'm REALLY REALLY new at perl, but I have to finish this
thing up soon. So I was hoping for some suggestions to explore, even
if they seem trivial. I haven't really tried much yet b/c I'm totally
unfamiliar with possibilities.

What is "this" that you need to finish soon?

I hope this isn't to vague and needy :/

It is, indeed.

Please read the posting guidelines for this group.

Sinan

DJ Stunks · Feb 13, 2006

The said:
unfortunately I'm REALLY REALLY new at perl

how do you know it's perl you want?

- Original Jake

Jake · Feb 13, 2006

thank you!

DJ Stunks · Feb 13, 2006

Jamie said:
If every line begins with http:// it'd be a 5 minute project for someone
who knows perl.

if he/she were an exceptionally slow typist...

-jp

Tad McClellan · Feb 14, 2006

Jake said:
I have a textfile..in it are a large number of URL's (ex:
Http://www.yahoo.com/index.html)

I would like to take each line of the file (one URL per line) and
remove the Http://

Use the substitution operator ( s/// ) for that. The s/// operator
is documented in:

perldoc perlop

then I would like to use those names to navigate folders in my computer
that have names that correspond (ex: cd /www.yahoo.com)

perldoc -f chdir

does anybody know how i could do this using perl????

Yes, I do.

Dr.Ruud · Feb 14, 2006

Jake schreef:

[creating foldernames from URLs]
does anybody know how i could do this using perl?

Yes. But wget can also do it.

???

Huh?

Dan · Feb 14, 2006

I would like to take each line of the file (one URL per line) and

Use the substitution operator ( s/// ) for that. The s/// operator
is documented in:

Would using substr() be an even more efficient method of doing this?

Dan

Paul Lalli · Feb 14, 2006

Dan said:
Would using substr() be an even more efficient method of doing this?

According to the following benchmark, it depends on which substr() form
you use:

#!/usr/bin/perl
use strict;
use warnings;

use Benchmark qw/cmpthese/;

sub gen_lines {
qw{
Http://www.yahoo.com
Http://gmail.google.com
Http://disney.go.com
Http://www.perldoc.com
Http://search.cpan.org
Http://www.oreilly.com
Http://www.ge.com
Http://www.stratus.com
Http://www.es11.com
Http://www.bankofamerica.com
};
}

cmpthese(-10, {
'substr_3arg' => sub {
my @lines = gen_lines();
substr($_, 0, 7) = q{} for @lines;
},
'substr_4arg' => sub {
my @lines = gen_lines();
substr($_, 0, 7, q{}) for @lines;
},
's///' => sub {
my @lines = gen_lines();
s{^Http://}{} for @lines;
}
}
);

__END__
Benchmark: running s///, substr_3arg, substr_4arg, each for at least 10
CPU seconds...
s///: 10 wallclock secs (10.52 usr + 0.01 sys = 10.53 CPU) @
10956.03/s (n=115367)
substr_3arg: 11 wallclock secs (10.22 usr + 0.01 sys = 10.23 CPU) @
10105.96/s (n=103384)
substr_4arg: 10 wallclock secs (10.54 usr + 0.00 sys = 10.54 CPU) @
12353.89/s (n=130210)
Rate substr_3arg s/// substr_4arg
substr_3arg 10106/s -- -8% -18%
s/// 10956/s 8% -- -11%
substr_4arg 12354/s 22% 13% --

(If I've done something that renders this test invalid, I'd appreciate
hearing about it... I'm rather a novice when it comes to Benchmarking)

Paul Lalli

A. Sinan Unur · Feb 14, 2006

According to the following benchmark, it depends on which substr()
form you use:
....

'substr_3arg' => sub {
my @lines = gen_lines();
substr($_, 0, 7) = q{} for @lines;

That looks contrived to me. If I were to use substr here, I would use:

'substr_3arg' => sub {
my @lines = gen_lines();
$_ = substr($_, 7) for @lines;
},

Rate substr_3arg s/// substr_4arg
substr_3arg 10106/s -- -8% -18%
s/// 10956/s 8% -- -11%
substr_4arg 12354/s 22% 13% --

The two runs below compare before and after the change above:

D:\Home\asu1\UseNet\clpmisc> s.pl
Rate substr_3arg s/// substr_4arg
substr_3arg 39983/s -- -6% -11%
s/// 42386/s 6% -- -6%
substr_4arg 44908/s 12% 6% --

D:\Home\asu1\UseNet\clpmisc> s.pl
Rate s/// substr_4arg substr_3arg
s/// 41749/s -- -4% -17%
substr_4arg 43480/s 4% -- -14%
substr_3arg 50534/s 21% 16% --

Of course, s/// maybe easier to maintain (especially if there is any
chance lines may contain padding or URLs may be embedded etc).

In the case of the OP, the substitution is done only once per line, and
hence speeding up the substitution would be wasted time.

Sinan

Anno Siegel · Feb 14, 2006

Paul Lalli said:
According to the following benchmark, it depends on which substr() form
you use:

#!/usr/bin/perl
use strict;
use warnings;

use Benchmark qw/cmpthese/;

sub gen_lines {
qw{
Http://www.yahoo.com
Http://gmail.google.com
Http://disney.go.com
Http://www.perldoc.com
Http://search.cpan.org
Http://www.oreilly.com
Http://www.ge.com
Http://www.stratus.com
Http://www.es11.com
Http://www.bankofamerica.com
};
}

cmpthese(-10, {
'substr_3arg' => sub {
my @lines = gen_lines();
substr($_, 0, 7) = q{} for @lines;
},
'substr_4arg' => sub {
my @lines = gen_lines();
substr($_, 0, 7, q{}) for @lines;
},
's///' => sub {
my @lines = gen_lines();
s{^Http://}{} for @lines;
}
}
);

__END__
Benchmark: running s///, substr_3arg, substr_4arg, each for at least 10
CPU seconds...
s///: 10 wallclock secs (10.52 usr + 0.01 sys = 10.53 CPU) @
10956.03/s (n=115367)
substr_3arg: 11 wallclock secs (10.22 usr + 0.01 sys = 10.23 CPU) @
10105.96/s (n=103384)
substr_4arg: 10 wallclock secs (10.54 usr + 0.00 sys = 10.54 CPU) @
12353.89/s (n=130210)
Rate substr_3arg s/// substr_4arg
substr_3arg 10106/s -- -8% -18%
s/// 10956/s 8% -- -11%
substr_4arg 12354/s 22% 13% --

(If I've done something that renders this test invalid, I'd appreciate
hearing about it... I'm rather a novice when it comes to Benchmarking)

Looks good to me. You avoided the easy-to-make error of changing the
same string(s) repeatedly. I don't quite see why you used a sub
(gen_lines()) to refresh the lines. Not that anything's wrong with
that, but the more natural choice would have been an array:

my @lines_orig = qw{
Http://www.yahoo.com
...
};

and later

"..." => sub {
my @lines = @lines_orig;
# etc
},

It would take another set of benchmarks to decide which is faster
(the relevant criterion here).

Anno

Paul Lalli · Feb 14, 2006

A. Sinan Unur said:
That looks contrived to me. If I were to use substr here, I would use:

'substr_3arg' => sub {
my @lines = gen_lines();
$_ = substr($_, 7) for @lines;
},

Nope, not contrived... I just honestly didn't think about doing it that
way. For some reason, I seem to have a bizarre fondness for using
substr() as an l-value. It didn't occur to me to go backwards from the
way I was originally thinking.

Thanks for pointing that out. For the sake of completeness, I offer a
benchmark with all four methods....

#!/usr/bin/perl
use strict;
use warnings;

use Benchmark qw/cmpthese/;

sub gen_lines {
qw{
Http://www.yahoo.com
Http://gmail.google.com
Http://disney.go.com
Http://www.perldoc.com
Http://search.cpan.org
Http://www.oreilly.com
Http://www.ge.com
Http://www.stratus.com
Http://www.es11.com
Http://www.bankofamerica.com
};
}

cmpthese(-10, {
'substr_2arg' => sub {
my @lines = gen_lines();
$_ = substr($_, 7) for @lines;
},
'substr_3arg' => sub {
my @lines = gen_lines();
substr($_, 0, 7) = q{} for @lines;
},
'substr_4arg' => sub {
my @lines = gen_lines();
substr($_, 0, 7, q{}) for @lines;
},
's///' => sub {
my @lines = gen_lines();
s{^Http://}{} for @lines;
}
}
);
__END__

Benchmark: running s///, substr_2arg, substr_3arg, substr_4arg, each
for at least 10 CPU seconds...
s///: 12 wallclock secs (10.55 usr + 0.01 sys = 10.56 CPU) @
10819.89/s (n=114258)
substr_2arg: 11 wallclock secs (10.02 usr + 0.00 sys = 10.02 CPU) @
13797.90/s (n=138255)
substr_3arg: 12 wallclock secs (10.78 usr + 0.00 sys = 10.78 CPU) @
10266.42/s (n=110672)
substr_4arg: 12 wallclock secs (10.70 usr + 0.00 sys = 10.70 CPU) @
12289.63/s (n=131499)
Rate substr_3arg s/// substr_4arg substr_2arg
substr_3arg 10266/s -- -5% -16% -26%
s/// 10820/s 5% -- -12% -22%
substr_4arg 12290/s 20% 14% -- -11%
substr_2arg 13798/s 34% 28% 12% --

Paul Lalli

Paul Lalli · Feb 14, 2006

Anno said:
Looks good to me. You avoided the easy-to-make error of changing the
same string(s) repeatedly. I don't quite see why you used a sub
(gen_lines()) to refresh the lines.

Uhm. Huh. You know, after you pointed that out... I don't see why I
did either....

Not that anything's wrong with
that, but the more natural choice would have been an array:

my @lines_orig = qw{
Http://www.yahoo.com
...
};

and later

"..." => sub {
my @lines = @lines_orig;
# etc
},

Agreed. Thanks for the comments.

Paul Lalli

Todd · Feb 14, 2006

Jake said:
I have a textfile..in it are a large number of URL's (ex:
Http://www.yahoo.com/index.html)

I would like to take each line of the file (one URL per line) and
remove the Http://

then I would like to use those names to navigate folders in my computer
that have names that correspond (ex: cd /www.yahoo.com)

does anybody know how i could do this using perl????

thanks!
-jake-

perl -e 'while(<>){chdir($cwd);chomp;s/^.*\/\///;chdir($_);print
`pwd`;}' my.file

Todd

Paul Lalli · Feb 14, 2006

perl -e 'while(<>){chdir($cwd);chomp;s/^.*\/\///;chdir($_);print
`pwd`;}' my.file

Nothing wrong with the above, but if you're going to make a one-liner,
it's more standard to let Perl do as much for you as possible...

1) eliminate the while(<>) loop by using -n
2) elminate the chomp; by using -l
3) No reason to use an undef variable to get the default behavior of
chdir.

perl -lne'chdir; s{^.*//}{}; chdir $_; print `pwd`' my.file

(Both of these assume, of course, that all of the directories you want
to change to are based in $HOME...)

Paul Lalli

Todd · Feb 14, 2006

Paul said:
Nothing wrong with the above, but if you're going to make a one-liner,
it's more standard to let Perl do as much for you as possible...

1) eliminate the while(<>) loop by using -n
2) elminate the chomp; by using -l
3) No reason to use an undef variable to get the default behavior of
chdir.

perl -lne'chdir; s{^.*//}{}; chdir $_; print `pwd`' my.file

(Both of these assume, of course, that all of the directories you want
to change to are based in $HOME...)

Paul Lalli

Paul thanks for the tip.

Todd

Uri Guttman · Feb 14, 2006

PL> Nope, not contrived... I just honestly didn't think about doing it that
PL> way. For some reason, I seem to have a bizarre fondness for using
PL> substr() as an l-value. It didn't occur to me to go backwards from the
PL> way I was originally thinking.

lvalue substr is slow because of the extra work needed to make an lvalue
(and assign to it) from a substring. 4 arg substr was created to make it
run faster (the 4th arg would be what you used to assign to the lvalue
substr) and more consistant with splice (ever notice how they are really
the same function but one works on arrays and the other on strings?).

PL> sub gen_lines {
PL> qw{
PL> Http://www.yahoo.com
PL> Http://gmail.google.com
PL> Http://disney.go.com
PL> Http://www.perldoc.com
PL> Http://search.cpan.org
PL> Http://www.oreilly.com
PL> Http://www.ge.com
PL> Http://www.stratus.com
PL> Http://www.es11.com
PL> Http://www.bankofamerica.com
PL> };
PL> }

why the sub? just assign a my array with that list of strings and use
that in the benchmark subs. no need for the extra overhead of a sub
call. one key to benchmarking is to lower common overhead so it will
highlight the code under test. another related idea is to set up a dummy
or control entry which only does the overhead work (in this case, the
copy of the array). you would be surprised at how much work is being
done in that assignment.

PL> Rate substr_3arg s/// substr_4arg substr_2arg
PL> substr_3arg 10266/s -- -5% -16% -26%
PL> s/// 10820/s 5% -- -12% -22%
PL> substr_4arg 12290/s 20% 14% -- -11%
PL> substr_2arg 13798/s 34% 28% 12% --

i think the 2 arg form is faster because it isn't munging in place but
assigning to a new string. and i would actually use the two arg as it is
clearer here. but this is such a simple example that it doesn't matter
too much.

uri

Dan · Feb 15, 2006

I have a textfile..in it are a large number of URL's (ex:
Http://www.yahoo.com/index.html)

I would like to take each line of the file (one URL per line) and
remove the Http://

then I would like to use those names to navigate folders in my computer
that have names that correspond (ex: cd /www.yahoo.com)

Hang on... that's contradictory, do you want to strip off the 'Http://'
and the '/index.html' or just the 'Http://'?

does anybody know how i could do this using perl????

Yup, lots and lots of people.

Dan

Select Eof extension files based on text list of filenames with if condition	0	May 4, 2022
Select files based on text list of filenames(part of the name:date) with condition	0	May 4, 2022
I cannot find a suitable guide on how to use jzy3d, can anyone help me?	1	Jan 5, 2024
How to install and use PhpSanitization	0	Feb 7, 2021
Help for ActionPerformance and how to use HashMap.	2	Feb 10, 2022
How to use PDF-lib and how to center each line of texts on the page?	1	Aug 16, 2023
Textfile to array or hash	6	Jul 27, 2008
How to use Flow-guided video completion (FGVC)?	0	Jan 25, 2021

How to use a textfile for filenames??

Jake

Jeff Stampes

Jake

A. Sinan Unur

DJ Stunks

Jake

DJ Stunks

Tad McClellan

Dr.Ruud

Dan

Paul Lalli

A. Sinan Unur

Anno Siegel

Paul Lalli

Paul Lalli

Todd

Paul Lalli

Todd

Uri Guttman

Dan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads