My code only works for 1st line in a file..need your help

S

Shan

My code works but only for the first line in a file. The file that is
being read contains urls.
for example

http://21stcmb.typepad.com
http://3banklawyers.typepad.com


I would appreciate your help getting my code to be fully functional.
Thank you

My code is below
---------------------------------------------------------------------------------------------------------------------------
#!/usr/bin/perl -w


use strict;


use LWP::Simple;
my $record;
my $entry;

open (OUT,">results.txt") or die "err";
open (URLS, "urls1.txt");
while ($record = <URLS>) {
$entry=$record;


my $html = get("$entry")
or die "Could not get the information you wanted";
while ($html =~ m{<link rel="alternate"(.*?) />}g){
my $site_feed = $1;

my $string = $site_feed;
my $ATTRIBUTE = qr/type|title|href/;
my $INSIDE_QUOTES = qr/.*?/;
my @files = $string =~ m{(?:$ATTRIBUTE)="($INSIDE_QUOTES)"}g;
print "Found @files\n";


print OUT "$entry \, $files[0] \, $files[1] \, $files[2]\n";


}
}
close (URLS);

--------------------------------------------------------------------------------------------------------------------
 
P

Paul Lalli

Shan said:
My code works but only for the first line in a file.

So how does it *not* work for the remainder? Can't get the URL? Can't
find any matches? Program freezes? Incorrect output? No output?
Crash? Infinite loop? Computer bursts into flames?
The file that is being read contains urls.
for example

http://21stcmb.typepad.com
http://3banklawyers.typepad.com


I would appreciate your help getting my code to be fully functional.
Thank you

My code is below
---------------------------------------------------------------------------------------------------------------------------
#!/usr/bin/perl -w

use strict;
use LWP::Simple;
my $record;
my $entry;

Declare your variables in the shortest scope possible.
open (OUT,">results.txt") or die "err";

1) Use the three argument form of open
2) Use lexical filehandles, not global barewords
3) State *why* the open failed:

open my $OUT, '>', 'results.txt' or die "Error: $!";
open (URLS, "urls1.txt");
Ditto.

while ($record = <URLS>) {
$entry=$record;

Uh. Why? You never use $record again. So why didn't you just do:
my $html = get("$entry")

I'm somewhat surprise this works, as I'd expect get() to be looking for
a URL that contains a newline. You should be chomping this variable
before using it.

Also, see: perldoc -q quoting
What's wrong with always quoting "$vars"?
or die "Could not get the information you wanted";
while ($html =~ m{<link rel="alternate"(.*?) />}g){
my $site_feed = $1;

What do you have against indentation?
my $string = $site_feed;

And again, you never use $site_feed again, so why the duplicate
variables?
my $ATTRIBUTE = qr/type|title|href/;
my $INSIDE_QUOTES = qr/.*?/;
my @files = $string =~ m{(?:$ATTRIBUTE)="($INSIDE_QUOTES)"}g;
print "Found @files\n";

Are you confident that each of the URLs you're fetching actually have
these attributes?
print OUT "$entry \, $files[0] \, $files[1] \, $files[2]\n";


}
}
close (URLS);

You still haven't told us what's going wrong, so I have no idea how to
help you.

Paul Lalli
 
S

Shan

The follwing is the message i Get when the read file conatins more than
one url

%shttp://21stcmb.typepad.com
Could not get the information you wanted at getfeeds2.pl line 14,
<INFILE> line
1.


here is the new code (works with one url in the file)
-----------------------------------------------------------------------------------------------------
#!/usr/bin/perl -w
# Gets feeds

use strict;


use LWP::Simple;

open (OUT,">results.txt") or die "err";
open(INFILE,"< urls1.txt");
while (<INFILE>)
{
print("%s",$_);
my $html = get("$_")
or die "Could not get the information you wanted";
while ($html =~ m{<link rel="alternate"(.*?) />}g)
{
my $string = $1;
my $ATTRIBUTE = qr/type|title|href/;
my $INSIDE_QUOTES = qr/.*?/;
my @files = $string =~ m{(?:$ATTRIBUTE)="($INSIDE_QUOTES)"}g;
print "Found @files\n";
print OUT "$_ \, $files[0] \, $files[1] \, $files[2]\n";


}
}
 
P

Paul Lalli

Shan said:
The follwing is the message i Get when the read file conatins more than
one url

%shttp://21stcmb.typepad.com
Could not get the information you wanted at getfeeds2.pl line 14,
<INFILE> line
1.


here is the new code

....that completely ignores the suggestion I made. You are not chomping
the URL. My guess is that in your file that has only one URL, it does
not have a newline on the end.
(works with one url in the file)
-----------------------------------------------------------------------------------------------------
#!/usr/bin/perl -w
# Gets feeds

use strict;


use LWP::Simple;

open (OUT,">results.txt") or die "err";
open(INFILE,"< urls1.txt");

Why do you ask for people's advice if you're going to ignore it?
while (<INFILE>)
{
print("%s",$_);

What do you think that %s is doing? print() is not printf()
my $html = get("$_")

Make the adjustment I told you to make, twice now. See if it helps.
And follow the rest of the suggestions in my previous post.

Paul Lalli
 
B

Ben Morrow

Quoth "Shan said:
The follwing is the message i Get when the read file conatins more than
one url

%shttp://21stcmb.typepad.com
Could not get the information you wanted at getfeeds2.pl line 14,
<INFILE> line
1.


here is the new code (works with one url in the file)
-----------------------------------------------------

[lines this long are not really welcome on Usenet; they really don't add
to the clarity of your message]
#!/usr/bin/perl -w

-w is for very old perls. Nowadays you should use

use warnings;

instead.
# Gets feeds

use strict;
Good.

use LWP::Simple;

open (OUT,">results.txt") or die "err";

Ignoring what people say is not considered polite. Paul has already told
you (for good reason) to write this as

open(my $OUT, '>', 'results.txt')
or die "can't write to 'results.txt': $!"

I've kept your parens even though I (and most people here) wouldn't use
them as they are often helpful to people who are unsure of Perl's
precedence table (which is complicated, so that's OK :) ).
open(INFILE,"< urls1.txt");

As above.

FWIW, it's usually better not to hardcode filenames. Either use
something like Getopt::Std to let your program accept sensible
arguments, or in this case I'd probably just use <> and print to STDOUT.
Then you invoke your program like

C:\whatever\> perl checkfeeds.pl urls1.txt > results.txt

(I'm assuming since you're using .txt extensions that you're on Win32.
If not adjust appropriately.)
while (<INFILE>)
{
print("%s",$_);

print ne printf. Read the documentation for the functions you use:
perldoc -f print.

You haven't set $\, so this will print all the urls jammed up together
(or would, if you didn't have a useless "%s" in there).

Diagnostic output should generally go to STDERR, or if (as I suspect)
this is just for debugging, should use warn.

print STDERR $_;

or

warn "Trying to get url '$_'";
my $html = get("$_")

Don't quote variables when you don't need to. Did you read Paul's reply?
You are still trying to GET a url with a newline on the end: I strongly
suspect that since "\n" is not a valid character in URLs LWP is DWYM,
but you should still say what you mean. Read perldoc -f chomp or read
about the -l flag in perldoc perlrun. I would also use the -n flag here,
but you may find it easier to be explicit.
or die "Could not get the information you wanted";

A more informative error message would be useful. As would be not
abandoning the whole program when one URL fails; something like

URL: while (<INFILE>) {

...

my $html = get $_;
unless (defined $html) {
warn "Can't GET '$_'";
next URL;
}

...
}

Unfortunately the LWP::Simple::get function doesn't allow you to get at
the error code... it may be worth rewriting to use the proper
LWP::UserAgent interface if you have persistant problems with some URLs.
while ($html =~ m{<link rel="alternate"(.*?) />}g)

You really *never* want to parse X?HTML with a regexp. Use a proper
module; I'm not sure what I'd use in this case: any recommendations
anyone?

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,708
Latest member
SherleneF1

Latest Threads

Top