How to use a textfile for filenames??

J

Jake

I have a textfile..in it are a large number of URL's (ex:
Http://www.yahoo.com/index.html)

I would like to take each line of the file (one URL per line) and
remove the Http://

then I would like to use those names to navigate folders in my computer
that have names that correspond (ex: cd /www.yahoo.com)

does anybody know how i could do this using perl????

thanks!
-jake-
 
J

Jeff Stampes

Jake said:
I have a textfile..in it are a large number of URL's (ex:
Http://www.yahoo.com/index.html)

I would like to take each line of the file (one URL per line) and
remove the Http://

then I would like to use those names to navigate folders in my computer
that have names that correspond (ex: cd /www.yahoo.com)

does anybody know how i could do this using perl????

Yes.

You'll get more responses if you show us what you tried yourself, and where you
experienced problems.

I failed in a quick google search, but it's been said before, this is a discussion group,
not a help desk.

~Jeff
 
J

Jake

unfortunately I'm REALLY REALLY new at perl, but I have to finish this
thing up soon. So I was hoping for some suggestions to explore, even
if they seem trivial. I haven't really tried much yet b/c I'm totally
unfamiliar with possibilities.

I hope this isn't to vague and needy :/

-jake-
 
A

A. Sinan Unur

unfortunately I'm REALLY REALLY new at perl, but I have to finish this
thing up soon. So I was hoping for some suggestions to explore, even
if they seem trivial. I haven't really tried much yet b/c I'm totally
unfamiliar with possibilities.

What is "this" that you need to finish soon?
I hope this isn't to vague and needy :/

It is, indeed.

Please read the posting guidelines for this group.

Sinan
 
T

Tad McClellan

Jake said:
I have a textfile..in it are a large number of URL's (ex:
Http://www.yahoo.com/index.html)

I would like to take each line of the file (one URL per line) and
remove the Http://


Use the substitution operator ( s/// ) for that. The s/// operator
is documented in:

perldoc perlop

then I would like to use those names to navigate folders in my computer
that have names that correspond (ex: cd /www.yahoo.com)


perldoc -f chdir

does anybody know how i could do this using perl????


Yes, I do.
 
D

Dan

I would like to take each line of the file (one URL per line) and
Use the substitution operator ( s/// ) for that. The s/// operator
is documented in:

Would using substr() be an even more efficient method of doing this?

Dan
 
P

Paul Lalli

Dan said:
Would using substr() be an even more efficient method of doing this?

According to the following benchmark, it depends on which substr() form
you use:

#!/usr/bin/perl
use strict;
use warnings;

use Benchmark qw/cmpthese/;

sub gen_lines {
qw{
Http://www.yahoo.com
Http://gmail.google.com
Http://disney.go.com
Http://www.perldoc.com
Http://search.cpan.org
Http://www.oreilly.com
Http://www.ge.com
Http://www.stratus.com
Http://www.es11.com
Http://www.bankofamerica.com
};
}

cmpthese(-10, {
'substr_3arg' => sub {
my @lines = gen_lines();
substr($_, 0, 7) = q{} for @lines;
},
'substr_4arg' => sub {
my @lines = gen_lines();
substr($_, 0, 7, q{}) for @lines;
},
's///' => sub {
my @lines = gen_lines();
s{^Http://}{} for @lines;
}
}
);

__END__
Benchmark: running s///, substr_3arg, substr_4arg, each for at least 10
CPU seconds...
s///: 10 wallclock secs (10.52 usr + 0.01 sys = 10.53 CPU) @
10956.03/s (n=115367)
substr_3arg: 11 wallclock secs (10.22 usr + 0.01 sys = 10.23 CPU) @
10105.96/s (n=103384)
substr_4arg: 10 wallclock secs (10.54 usr + 0.00 sys = 10.54 CPU) @
12353.89/s (n=130210)
Rate substr_3arg s/// substr_4arg
substr_3arg 10106/s -- -8% -18%
s/// 10956/s 8% -- -11%
substr_4arg 12354/s 22% 13% --


(If I've done something that renders this test invalid, I'd appreciate
hearing about it... I'm rather a novice when it comes to Benchmarking)

Paul Lalli
 
A

A. Sinan Unur

According to the following benchmark, it depends on which substr()
form you use:
....

'substr_3arg' => sub {
my @lines = gen_lines();
substr($_, 0, 7) = q{} for @lines;

That looks contrived to me. If I were to use substr here, I would use:

'substr_3arg' => sub {
my @lines = gen_lines();
$_ = substr($_, 7) for @lines;
},
Rate substr_3arg s/// substr_4arg
substr_3arg 10106/s -- -8% -18%
s/// 10956/s 8% -- -11%
substr_4arg 12354/s 22% 13% --

The two runs below compare before and after the change above:

D:\Home\asu1\UseNet\clpmisc> s.pl
Rate substr_3arg s/// substr_4arg
substr_3arg 39983/s -- -6% -11%
s/// 42386/s 6% -- -6%
substr_4arg 44908/s 12% 6% --

D:\Home\asu1\UseNet\clpmisc> s.pl
Rate s/// substr_4arg substr_3arg
s/// 41749/s -- -4% -17%
substr_4arg 43480/s 4% -- -14%
substr_3arg 50534/s 21% 16% --

Of course, s/// maybe easier to maintain (especially if there is any
chance lines may contain padding or URLs may be embedded etc).

In the case of the OP, the substitution is done only once per line, and
hence speeding up the substitution would be wasted time.

Sinan
 
A

Anno Siegel

Paul Lalli said:
According to the following benchmark, it depends on which substr() form
you use:

#!/usr/bin/perl
use strict;
use warnings;

use Benchmark qw/cmpthese/;

sub gen_lines {
qw{
Http://www.yahoo.com
Http://gmail.google.com
Http://disney.go.com
Http://www.perldoc.com
Http://search.cpan.org
Http://www.oreilly.com
Http://www.ge.com
Http://www.stratus.com
Http://www.es11.com
Http://www.bankofamerica.com
};
}

cmpthese(-10, {
'substr_3arg' => sub {
my @lines = gen_lines();
substr($_, 0, 7) = q{} for @lines;
},
'substr_4arg' => sub {
my @lines = gen_lines();
substr($_, 0, 7, q{}) for @lines;
},
's///' => sub {
my @lines = gen_lines();
s{^Http://}{} for @lines;
}
}
);

__END__
Benchmark: running s///, substr_3arg, substr_4arg, each for at least 10
CPU seconds...
s///: 10 wallclock secs (10.52 usr + 0.01 sys = 10.53 CPU) @
10956.03/s (n=115367)
substr_3arg: 11 wallclock secs (10.22 usr + 0.01 sys = 10.23 CPU) @
10105.96/s (n=103384)
substr_4arg: 10 wallclock secs (10.54 usr + 0.00 sys = 10.54 CPU) @
12353.89/s (n=130210)
Rate substr_3arg s/// substr_4arg
substr_3arg 10106/s -- -8% -18%
s/// 10956/s 8% -- -11%
substr_4arg 12354/s 22% 13% --


(If I've done something that renders this test invalid, I'd appreciate
hearing about it... I'm rather a novice when it comes to Benchmarking)

Looks good to me. You avoided the easy-to-make error of changing the
same string(s) repeatedly. I don't quite see why you used a sub
(gen_lines()) to refresh the lines. Not that anything's wrong with
that, but the more natural choice would have been an array:

my @lines_orig = qw{
Http://www.yahoo.com
...
};

and later

"..." => sub {
my @lines = @lines_orig;
# etc
},

It would take another set of benchmarks to decide which is faster
(the relevant criterion here).

Anno
 
P

Paul Lalli

A. Sinan Unur said:
That looks contrived to me. If I were to use substr here, I would use:

'substr_3arg' => sub {
my @lines = gen_lines();
$_ = substr($_, 7) for @lines;
},


Nope, not contrived... I just honestly didn't think about doing it that
way. For some reason, I seem to have a bizarre fondness for using
substr() as an l-value. It didn't occur to me to go backwards from the
way I was originally thinking.

Thanks for pointing that out. For the sake of completeness, I offer a
benchmark with all four methods....

#!/usr/bin/perl
use strict;
use warnings;

use Benchmark qw/cmpthese/;

sub gen_lines {
qw{
Http://www.yahoo.com
Http://gmail.google.com
Http://disney.go.com
Http://www.perldoc.com
Http://search.cpan.org
Http://www.oreilly.com
Http://www.ge.com
Http://www.stratus.com
Http://www.es11.com
Http://www.bankofamerica.com
};
}

cmpthese(-10, {
'substr_2arg' => sub {
my @lines = gen_lines();
$_ = substr($_, 7) for @lines;
},
'substr_3arg' => sub {
my @lines = gen_lines();
substr($_, 0, 7) = q{} for @lines;
},
'substr_4arg' => sub {
my @lines = gen_lines();
substr($_, 0, 7, q{}) for @lines;
},
's///' => sub {
my @lines = gen_lines();
s{^Http://}{} for @lines;
}
}
);
__END__

Benchmark: running s///, substr_2arg, substr_3arg, substr_4arg, each
for at least 10 CPU seconds...
s///: 12 wallclock secs (10.55 usr + 0.01 sys = 10.56 CPU) @
10819.89/s (n=114258)
substr_2arg: 11 wallclock secs (10.02 usr + 0.00 sys = 10.02 CPU) @
13797.90/s (n=138255)
substr_3arg: 12 wallclock secs (10.78 usr + 0.00 sys = 10.78 CPU) @
10266.42/s (n=110672)
substr_4arg: 12 wallclock secs (10.70 usr + 0.00 sys = 10.70 CPU) @
12289.63/s (n=131499)
Rate substr_3arg s/// substr_4arg substr_2arg
substr_3arg 10266/s -- -5% -16% -26%
s/// 10820/s 5% -- -12% -22%
substr_4arg 12290/s 20% 14% -- -11%
substr_2arg 13798/s 34% 28% 12% --

Paul Lalli
 
P

Paul Lalli

Anno said:
Looks good to me. You avoided the easy-to-make error of changing the
same string(s) repeatedly. I don't quite see why you used a sub
(gen_lines()) to refresh the lines.

Uhm. Huh. You know, after you pointed that out... I don't see why I
did either....
Not that anything's wrong with
that, but the more natural choice would have been an array:

my @lines_orig = qw{
Http://www.yahoo.com
...
};

and later

"..." => sub {
my @lines = @lines_orig;
# etc
},

Agreed. Thanks for the comments.

Paul Lalli
 
T

Todd

Jake said:
I have a textfile..in it are a large number of URL's (ex:
Http://www.yahoo.com/index.html)

I would like to take each line of the file (one URL per line) and
remove the Http://

then I would like to use those names to navigate folders in my computer
that have names that correspond (ex: cd /www.yahoo.com)

does anybody know how i could do this using perl????

thanks!
-jake-
perl -e 'while(<>){chdir($cwd);chomp;s/^.*\/\///;chdir($_);print
`pwd`;}' my.file


Todd
 
P

Paul Lalli

perl -e 'while(<>){chdir($cwd);chomp;s/^.*\/\///;chdir($_);print
`pwd`;}' my.file

Nothing wrong with the above, but if you're going to make a one-liner,
it's more standard to let Perl do as much for you as possible...

1) eliminate the while(<>) loop by using -n
2) elminate the chomp; by using -l
3) No reason to use an undef variable to get the default behavior of
chdir.

perl -lne'chdir; s{^.*//}{}; chdir $_; print `pwd`' my.file

(Both of these assume, of course, that all of the directories you want
to change to are based in $HOME...)

Paul Lalli
 
T

Todd

Paul said:
Nothing wrong with the above, but if you're going to make a one-liner,
it's more standard to let Perl do as much for you as possible...

1) eliminate the while(<>) loop by using -n
2) elminate the chomp; by using -l
3) No reason to use an undef variable to get the default behavior of
chdir.

perl -lne'chdir; s{^.*//}{}; chdir $_; print `pwd`' my.file

(Both of these assume, of course, that all of the directories you want
to change to are based in $HOME...)

Paul Lalli
Paul thanks for the tip.

Todd
 
U

Uri Guttman

PL> Nope, not contrived... I just honestly didn't think about doing it that
PL> way. For some reason, I seem to have a bizarre fondness for using
PL> substr() as an l-value. It didn't occur to me to go backwards from the
PL> way I was originally thinking.

lvalue substr is slow because of the extra work needed to make an lvalue
(and assign to it) from a substring. 4 arg substr was created to make it
run faster (the 4th arg would be what you used to assign to the lvalue
substr) and more consistant with splice (ever notice how they are really
the same function but one works on arrays and the other on strings?).

PL> sub gen_lines {
PL> qw{
PL> Http://www.yahoo.com
PL> Http://gmail.google.com
PL> Http://disney.go.com
PL> Http://www.perldoc.com
PL> Http://search.cpan.org
PL> Http://www.oreilly.com
PL> Http://www.ge.com
PL> Http://www.stratus.com
PL> Http://www.es11.com
PL> Http://www.bankofamerica.com
PL> };
PL> }

why the sub? just assign a my array with that list of strings and use
that in the benchmark subs. no need for the extra overhead of a sub
call. one key to benchmarking is to lower common overhead so it will
highlight the code under test. another related idea is to set up a dummy
or control entry which only does the overhead work (in this case, the
copy of the array). you would be surprised at how much work is being
done in that assignment.

PL> Rate substr_3arg s/// substr_4arg substr_2arg
PL> substr_3arg 10266/s -- -5% -16% -26%
PL> s/// 10820/s 5% -- -12% -22%
PL> substr_4arg 12290/s 20% 14% -- -11%
PL> substr_2arg 13798/s 34% 28% 12% --

i think the 2 arg form is faster because it isn't munging in place but
assigning to a new string. and i would actually use the two arg as it is
clearer here. but this is such a simple example that it doesn't matter
too much.

uri
 
D

Dan

I have a textfile..in it are a large number of URL's (ex:
Http://www.yahoo.com/index.html)

I would like to take each line of the file (one URL per line) and
remove the Http://

then I would like to use those names to navigate folders in my computer
that have names that correspond (ex: cd /www.yahoo.com)

Hang on... that's contradictory, do you want to strip off the 'Http://'
and the '/index.html' or just the 'Http://'?
does anybody know how i could do this using perl????

Yup, lots and lots of people.

Dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,183
Messages
2,570,968
Members
47,517
Latest member
TashaLzw39

Latest Threads

Top