Error downloading page, some pages work great but cant seem to get this one

J

Jack Schafer

I am trying to download the source code for an array of differant
websites, usually i will get something like this from Dilbert.com:

HTTP/1.1 200 OK
Date: Fri, 23 Apr 2004 00:04:54 GMT
Server: Apache/1.3.27 (Unix) Resin/2.1.s030505 mod_ssl/2.8.14
OpenSSL/0.9.7b
Last-Modified: Thu, 22 Apr 2004 07:05:10 GMT
ETag: "182ba6-9d7b-40876ea6"
Accept-Ranges: bytes
Content-Length: 40315
Connection: close
Content-Type: text/html


then the whole html page prints
.....


the problem occurs when i try the same thing on www.kingsofchaos.com i
get the following header:

HTTP/1.1 200 OK
Date: Fri, 23 Apr 2004 00:16:49 GMT
Server: Apache/1.3.29 (Unix) (Gentoo/Linux)
Connection: close
Content-Type: text/html

with out the page attatched.
I was wondering if you had any ideas on why i cant access the page,
and any suggestions as to how i should do it. Right now i am using the
following code:


use IO::Socket::INET;
my $host = $_[0];
my $get = $_[1];
my $port= 80;
my $protocol = "tcp";
my $socket;
my @page;
$socket = IO::Socket::INET->new(PeerAddr => $host, PeerPort => $port,
Proto => $protocol) or die "Could not connect\n";
#sends request
$socket->send("GET $get HTTP/1.0\nHOST: $host\n\n");
#recieve desired file
@page=<$socket>;
 
J

Joe Smith

Jack said:
the problem occurs when i try the same thing on www.kingsofchaos.com
$socket->send("GET $get HTTP/1.0\nHOST: $host\n\n");
@page=<$socket>;

1) You're doing it the hard way. Use the LWP modules instead.
2) Because of 1, you're not sending all of the HTTP headers the
web server wants to see.

According to the Web Scraping Proxy (http://www.research.att.com/~hpk/wsp/)
you'll need to store and send cookies, and execute javascript.

# Request: http://www.kingsofchaos.com/
$request = new HTTP::Request('GET' => "http://www.kingsofchaos.com/");
# Set-Cookie: koc_session=ea30aa58e36; path=/; domain=www.kingsofchaos.com
# Set-Cookie: security_hash=323466; expires=Sun, 23-May-2004 08:17:26 GMT;
path=/; domain=.kingsofchaos.com
# Set-Cookie: cookie_hash=801f782dce8147; path=/

3) Post to comp.lang.perl.misc (instead of comp.lang.perl) next time.
-Joe
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top