Naveen said:
Yes i do have the URI info in the log files and i too guessed that
these would be required in addition to client IP and time of access of
each resource. But sorry for sounding naive (I'm just less than 4
months into Perl, infact programming), my biggest problem here is not
knowing what i require but knowing how to use that information. To be
precise, i'm aware that hashes need to be built and so on, but at my
wit's end reagrding what logic to be applied on which to build such
hashes or HOHs. Top entry and exit pages can only be extracted out of a
list of all the entry and exit pages, each of which definitely will be
a fraction of all the access log entries. How do i get to identify the
entry pages and exit pages? Do i put to use the time at which an IP
first accessed any resource and last accessed a resource?
I hope i made myself clear enough.
I recently wrote something similar... although it is for a rarely
visited site and would need to be altered for your logsize.
You would need to pick up the first entry and last by a particular client
and probably rearrange what is printed.
#!/usr/bin/perl
# Check a current log by retrieving it from onsite and processing it
$|++;
use strict;
use warnings;
use Socket;
use Text:
arseWords;
use LWP::UserAgent;
# Hardwired
my $curr_log = 'XXX';
my $username = 'XXX';
my $password ='XXX';
# Global
my %paths; # Path hits
my %ips; # IP hits
my %ips_hosts; # IP hostnames
my %ips_to_hosts; # IP to hostnames hostnames
# Header
print "Content-type: text/plain\n\n";
# Get logfile
$curr_log .= $ARGV[0] if $ARGV[0];
my $r_log = get_log();
print "Stats\n\n";
print $#$r_log;
print " records\n\n";
print "Raw Log\n\n";
foreach (@$r_log) {
# Throw out spurious data immediately - must investigate why seg fault caused
next if length > 255;
# Split up the data
chomp;
my ($ip, $junk1, $junk2, $date, $offset, $request, $result, $size, $ref, $browser)
= quotewords('\s', 0, $_);
my ($method, $path, $http) = split / /, $request;
# Only valid 200 requests for now
next if $result ne '200';
# Ignore graphics, css, js
next if $path =~ /jpg$|gif$|png$|ico$|js$|css$/;
# Some spurious http:// requests - to investiagte
next if $path =~ /^http/;
# Stuff some data into hashes
unless (exists $ips_to_hosts{$ip}) {
my $hostname = get_hostname($ip) || $ip;
$ips_to_hosts{$ip} = $hostname;
$ips_hosts{$hostname} = \$ips{$ip};
}
$ips{$ip}++;
$paths{$path}++;
$date =~ s/\[//;
$date =~ s/:/ /;
print "$ips_to_hosts{$ip}\t$path\t$date\n";
}
print "\n\n";
# Print some totals
print "Pages requested\n\n";
for my $k (sort keys %paths) {
print $k,
' ' x (30 - length $k), "\t",
$paths{$k}, "\n";
}
print "\n\n";
print "Calling Hosts\n\n";
for my $host (sort dom_sort values %ips_to_hosts) {
print $host,
' ' x (40 - length $host), "\t",
${$ips_hosts{$host}}, "\n";
}
sub dom_sort {
# Sort a domain
no warnings;
my @dom_a = reverse split /\./, $a;
my @dom_b = reverse split /\./, $b;
$dom_a[0] = '~' . $dom_a[0] if $dom_a[0] =~ /^\d/;
$dom_b[0] = '~' . $dom_b[0] if $dom_b[0] =~ /^\d/;
return $dom_a[0] cmp $dom_b[0] ||
return $dom_a[1] cmp $dom_b[1] ||
return $dom_a[2] cmp $dom_b[2] ||
return $dom_a[3] cmp $dom_b[3];
}
sub get_hostname {
my $ip =shift;
my $ipaddr = inet_aton($ip);
return undef unless $ipaddr;
my $hostname = gethostbyaddr($ipaddr, AF_INET);
return $hostname;
}
sub get_log {
my $ua = LWP::UserAgent->new;
$ua->agent("NOM Qget/0.1 ");
# Create a request
my $req = HTTP::Request->new(GET => $curr_log);
$req->authorization_basic($username, $password);
# Pass request to the user agent and get a response back
my $res = $ua->request($req);
# Check the outcome of the response
unless ($res->is_success) {
die "Unable to collect log from server: $res->status_line";
}
my @inarr = split /\n/, $res->content;
return \@inarr;
}