Thank you sln.
First I had a hard time making cpan work with Windows 7 / cygwin /
perl.. the problem was found to be cpan.pm module and mirror site.
made at that work.. and installed the required modules.
And, the next step was to run your code... When i copied and pasted
your code from the google groups.. I had issue with some "..." which
the html was not formatted correctly... so, i had to make the best
judgment and fixed the m_prefix and m_sn number correctly.
another sample:
m_prefix=081&m_sn=75133844
And it works just as per the needs. Thank you so much.
I have no idea why my initial direct "curl" execution cannot execute
correctly... Can you please explain why a direct GET doesn't work with
the URL.
and why your code had to be instead.. what does the web site developer
do to avoid getting direct GET result. Is it mainly to do with the
cookie, or user agent or some form ajax issues, etc.??
Also, can you please explain a bit about your code and what it does..
just some comments.
Thank you.. much appreciated.
Not a problem. I'm learning as I go.
Whats going on with this is that it is using JavaScript and Ajax.
The first GET is to load a minimal html page that has embedded JS
that calls Ajax layer. At this time it also establishes a session id
that is only good as long as the page is loaded.
The html is sparse, and contains a table "container". One of the elements,
a single <td> with an id of 'output', is being used as a placeholder into
which more html/JS will be added dynamically with the next GET call.
This is called a html/code fragment, its not a new page, its just the
dynamic loading of table data. Each new WBN sent in subsequent GETs will
return data (html fragment) for that <td> element (id="output").
So, rendering the full page is at least a two-step process.
Loading the main html frame, then loading html code fragment (table data
for the Air Bill). Subsequent GETs (without leaving the page) just updates
the table data to contain the information for a new Air Bill.
Thats the way it works in the browser. In the browser, Java Script is run.
It takes the url input and "constructs" a new url. The "new" url is formulated
into a new request called XMLHttpRequest() object (similar to LWP request).
This Ajax request object goes out and does a normal GET. Whats returned is a
fragment of html, in this case table data containing info about the luggage
for the particular Way Bill.
So thats the reason it didn't work in LWP, the main page is just a shell for
the dynamic data loaded later.
WSP however, see's two requests, one for the main page, the other for the data
fragment. WSP doesen't need to execute JS/Ajax, it just records the result of the
interaction between the client/server.
On the bottom of the main html page, we see this:
<script>
searchajax2('./search_awb.php?m_prefix=176&m_sn=75064953&h_prefix=HWB&h_sn=&ch= ');
reloadpage();
</script>
This is the first thing that is run.
We see that the function searchajax2() first creates an Ajax request object
(using that url). Then it asigns the ajax response reference to the
<td> id="output" elements innerHTML. The ajax request is opened then sent:
ajaxRequest.open("GET", url , true);
ajaxRequest.send(null);
Finally, searchajax2() function returns, then reloadpage() is called to render
the DOM.
Apparently, with regard to the LWP, its a two step process. First to load the
main page skeleton, establish a cookie, then do sucessive calls to load
each fragment with a new WBN info. The html fragments returned each contain
specific information (mostly table data html) related to the WBN.
I hope I am clear, trying not to overload the noise on the group.
I am new to this too, but it doesn't look line rocket science.
-sln
Ps. Here is fleshed out example with some comments and added constuct
to fetch mutilple Way Bills' data.
To see the content, set $show_content = 1;
and maybe redirect the output to a file:
perl lwp.pl > mycapture.txt
-----------------------------------------------
use strict;
use warnings;
use HTML::TableExtract;
use HTTP::Cookies;
use HTTP::Request::Common qw(POST GET);
use LWP::UserAgent;
my $show_content = 0; # 1 = shows response content (html)
my ( $content1, $content2 );
# Create cookies
my $jar = HTTP::Cookies->new();
# Create user agent
my $ua = LWP::UserAgent->new();
$ua->timeout( 10 );
$ua->cookie_jar( $jar );
$ua->agent( "Microsoft Internet Explorer/6.0" );
# Create a first request: "get track table framework"
# Note - this will establish a session with the server.
# ---------
my $request = HTTP::Request->new('GET' =>
join '', qw{
http://www.bangkokflightservices.com/TrackTrace/showc_track.php?m_prefix=176&m_s
n=75064953&h_prefix=HWB&h_sn=&ecy=e076438db64c6190f7b9689a379b7f7093368f1652d14d
b65fee1ab916713f3f5f4030f53369cb1f669614312c4748899c272f4d976a2b299274a21ad80fc0
72b1bab2ab1c181d08c670188722e51ec162f9ae337e3f2f132c88d249133815558d241ce8a4e9b3
fa75c144268b9e901037c2c7257142ee42ff9b2bf2767f57ed62b94fd938ea4dd2b28c53fea6af74
be&ch=%A0%A0%A0%A0 &id=1.2405164500620218} );
# Pass request to agent
# Note - the response is just Java Script/Ajax laced
# html document with a skeleton table. One of the table's element <td> has
# an Id = "output" that recieves the real table data from the next request.
# Apparently this establishes a cookie.
my $res = $ua->request( $request );
if ( $res->is_success ) {
print "\nHtml main Content .. OK\n\n";
if ($show_content) {
print $res->content, "\n\n";
}
$content1 = $res->content;
}
else {
print "Request (Html main Content) Failed\n";
print $res->status_line, "\n\n";
die;
}
print '='x20, "\n\n";
# Create a second request: "get track table body"
# Note - When running as an html document, JS/Ajax are used
# to dynamically load table data (html) to put in <td id="output" ..>
# already loaded with the first request (the main html).
# The html that is returned is Dynamic Html fragment. This contails
# the table data for a single prefix/serial no.
# ---------
# Loop, get the data for a couple of Way Bill Numbers.
my %wbhash = ( '176'=>'75064953', '081'=>'75133844' );
while (my ( $WBNprefix, $WBN ) = each %wbhash)
{
$request = HTTP::Request->new('GET' =>
join '', (
"
http://www.bangkokflightservices.com/TrackTrace/search_awb.php?",
"m_prefix=$WBNprefix",
"&m_sn=$WBN",
"&h_prefix=HWB",
"&h_sn=&ch= ")
);
# Pass request to agent
$res = $ua->request( $request );
if ( $res->is_success ) {
print "\nWay Bill fragment .. OK\n";
if ($show_content) {
print $res->content, "\n\n";
}
$content2 = $res->content;
}
else {
print "Request (Way Bill html fragment Content) Failed\n";
print $res->status_line, "\n\n";
die;
}
print "Way Bill ($WBNprefix - $WBN) Content tables:\n", '-'x20, "\n\n";
print_tables( $content2 );
print "\n";
}
print '='x20, "\n\n";
print "Done!\n\n\n";
exit;
## Table extract Util from wsp
##
sub print_tables {
my ( $table, $row, $cell );
my $tc = 0;
my $table_extractor = HTML::TableExtract->new();
$table_extractor->parse( $_[0] );
foreach $table ( $table_extractor->table_states ) {
print "TABLE $tc:\n"; $tc++;
my $rc = 0;
foreach $row ( $table->rows ) {
print "ROW $rc:\n"; $rc++;
foreach $cell ( @$row ) {
$cell = '' unless defined $cell;
$cell =~ s/\n/ /g;
$cell =~ s/[ \t]+/ /g;
$cell =~ s/^[ \t]//;
$cell =~ s/[ \t]$//;
$cell =~ s/ *<\/td *//g;
print "$cell|";
}
print "\n";
}
}
}
__END__
Html main Content .. OK
====================
Way Bill fragment .. OK
Way Bill (081 - 75133844) Content tables:
--------------------
TABLE 0:
ROW 0:
á||||
ROW 1:
á|Enter Master Air Waybill (MAWB)|
ROW 2:
Optional (For Import MAWB Only)|
ROW 3:
á||||
ROW 4:
||* Master Air Waybill number example 123 - 12345678||
TABLE 1:
ROW 0:
||||||||||
ROW 1:
Item|AWB No|Flight No|Flight Date|Origin|Dest|Status|Pieces|Weight|Time|
ROW 2:
1|081-75133844|JQ 029|Oct 19 2010|MEL|BKK|Delivered|2|1,480.00|Oct 20 2010 - 125
5|
Way Bill fragment .. OK
Way Bill (176 - 75064953) Content tables:
--------------------
TABLE 0:
ROW 0:
á||||
ROW 1:
á|Enter Master Air Waybill (MAWB)|
ROW 2:
Optional (For Import MAWB Only)|
ROW 3:
á||||
ROW 4:
||* Master Air Waybill number example 123 - 12345678||
TABLE 1:
ROW 0:
|||||||||||
ROW 1:
Item|AWB No|Flight No|Flight Date|Origin|Dest|ULD No|Status|Pieces|Weight|Time|
ROW 2:
1|176-75064953|EK 419|Oct 15 2010|BKK|DXB|Flight Changeá|Export Transshipment|3|
743.00|Oct 14 2010 5:37PM|
ROW 3:
2|176-75064953|EK 419|Oct 15 2010|BKK|DXB|á|Accepted|3|743.00|Oct 14 2010 5:37PM
|
ROW 4:
3|176-75064953|EK 373|Oct 15 2010|BKK|DXB|Flight Changeá|Export Transshipment|3|
743.00|Oct 14 2010 6:12PM|
ROW 5:
4|176-75064953|EK 373|Oct 15 2010|BKK|DXB|SHCá|Export Transshipment|3|743.00|Oct
14 2010 6:12PM|
ROW 6:
5|176-75064953|EK 373|Oct 14 2010|BKK|DXB|Flight Changeá|Export Transshipment|3|
743.00|Oct 14 2010 6:42PM|
ROW 7:
6|176-75064953|EK 373|Oct 14 2010|BKK|DXB|PMC31131EKá|Manifested|3|743.00|Oct 14
2010 6:57PM|
ROW 8:
7|176-75064953|EK 373|Oct 14 2010|BKK|DXB|á|Departed|3|743.00|Oct 14 2010 9:54PM
|
====================
Done!