Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Using Perl to get data from website

Reply
Thread Tools

Using Perl to get data from website

 
 
fiazidris
Guest
Posts: n/a
 
      03-07-2008
Previously, I have written a perl script to access data from this URL:

http://www.bangkokflightservices.com...argo_track.php

Some sample: MAWB - Master Airwaybill Number

724-26332482
724-61480672
724-61441122

and this was the final URL:

http://203.151.118.123:8090/showc_tr...efix=HWB&h_sn=

But, now there is a change on the website and I couldn't extract
through the same script. One change I noticed is the URL has changed
to:

<iframe src="http://203.151.118.123:8090/showc_track.php?
m_prefix=724&m_sn=26332482&h_prefix=HWB&h_sn=&ecy= e076438db64c6190f7b9689a379b7f7093368f1652d14db65f ee1ab916713f3f5f4030f53369cb1f669614312c4748899c27 2f4d976a2b299274a21ad80fc072b1bab2ab1c181d08c67018 8722e51ec162f9ae337e3f2f132c88d249133815558d241ce8 a4e9b3fa75c144268b9e901037c2c7257142ee42ff9b2bf276 7f57ed62b94fd938ea4dd2b28c53fea6af74be&ch=
" frameborder="0" scrolling="yes" height="700" width="100%"> </iframe>

How can I programmatically obtain data for a list of MAWBs.

Here is a sample script that I wrote which previously worked:

#!/usr/bin/perl

while (<>) {
chomp;

$mprefix = substr($_, 0, 3);
$msn = substr($_, 4, ;

if (length($mprefix) ne 3) { next; }

$currurl = 'http://203.151.118.123:8090/showc_track.php?
m_prefix=' . $mprefix . '&m_sn=' . $msn .
'&h_prefix=HWB&h_sn=&ecy=e076438db64c6190f7b9689a3 79b7f7093368f1652d14db65fee1ab916713f3f5f4030f5336 9cb1f669614312c4748899c272f4d976a2b299274a21ad80fc 072b1bab2ab1c181d08c670188722e51ec162f9ae337e3f2f1 32c88d249133815558d241ce8a4e9b3fa75c144268b9e90103 7c2c7257142ee42ff9b2bf2767f57ed62b94fd938ea4dd2b28 c53fea6af74be&ch=
';


$currresult = qx{curl -s '$currurl'};

while ( $currresult=~ m#(.*)#g ) {
$currline=$1;

if ($currline =~ m#style12#i) {

$currline =~ m#.*>(.*?)<.*#i;
$result = $result . " / " . $1;
}

}
print "***$result\n";
$result = '';
}

 
Reply With Quote
 
 
 
 
Ben Morrow
Guest
Posts: n/a
 
      03-07-2008

Quoth fiazidris <(E-Mail Removed)>:
> Previously, I have written a perl script to access data from this URL:
>
> http://www.bangkokflightservices.com...argo_track.php
>
> Some sample: MAWB - Master Airwaybill Number
>
> 724-26332482
> 724-61480672
> 724-61441122
>
> and this was the final URL:
>
> http://203.151.118.123:8090/showc_tr...efix=724&m_sn=
> 26332482&h_prefix=HWB&h_sn=
>
> But, now there is a change on the website and I couldn't extract
> through the same script. One change I noticed is the URL has changed
> to:
>

[url trimmed]
> <iframe src="http://203.151.118.123:8090/showc_track.php?
> m_prefix=724&m_sn=26332482&h_prefix=HWB&h_sn=&ecy= e076438db64c61..."
> frameborder="0" scrolling="yes" height="700" width="100%"> </iframe>
>
> How can I programmatically obtain data for a list of MAWBs.


Yuck, what a horrible page. <input> without <form>... I would use
something like

#!/usr/bin/perl

use WWW::Mechanize;

my $baseurl =
'http://www.bangkokflightservices.com/our_cargo_track&trace.php';
my $hawb = 'h_prefix=HAWB&h_sn=';

my $M = WWW::Mechanize->new(auto_check => 1);

while (<>) {
chomp;

my ($mprefix, $msn) = /(...)(........)/ or do {
warn "invalid MAWB: '$_'";
next;
};

$M->get("$baseurl?m_prefix=$mprefix&m_sn=$msn&$hawb") ;
$M->follow_link(url_regex => qr/showc_track/);
my $content = $M->content;

# process $content as before
}

You may need to adjust the follow_link call if there are several links on
the same page that match that regex; see perldoc WWW::Mechanize for the
arguments. If the server checks the Referer, you may also need to ->get
/our_cargo_track.php first.

Ben

 
Reply With Quote
 
 
 
 
ifiaz
Guest
Posts: n/a
 
      03-07-2008
You may need to adjust the follow_link call if there are several links
on
the same page that match that regex; see perldoc WWW::Mechanize for
the
arguments. If the server checks the Referer, you may also need to -
>get

/our_cargo_track.php first.

Ben
----

Thank you for your prompt response.

When I used the code with minor modifications, I still have the
problem that I can't access the data as the process throws me to
another page as below.

This is what the $content contains:

<script> window.open ('http://www.bangkokflightservices.com/
our_cargo_track.php') ;
setTimeout("window.close();", 10);
</script>

How to get to the actual data page. Please guide me here as I am a
newbie.

I don't know how to implement Referer and all that.


### This is the complete code I used.
#!/usr/bin/perl

use WWW::Mechanize;

my $baseurl =
'http://www.bangkokflightservices.com/our_cargo_track&trace.php';
my $hawb = 'h_prefix=HAWB&h_sn=';

my $M = WWW::Mechanize->new(auto_check => 1);

## Added code for testing Only
my $F = WWW::Mechanize->new(auto_check => 1);
$F->get("http://www.bangkokflightservices.com/our_cargo_track.php");
my $contentF = $F->content;
#print "$contentF\n";
#$M->add_header("Referer => 'http://www.bangkokflightservices.com/
our_cargo_track.php'" )

while (<>) {
chomp;

my ($mprefix, $msn) = /(...)-(........)/ or do {
warn "invalid MAWB: '$_'";
next;
};

print "$mprefix $msn\n";

$M->get("$baseurl?m_prefix=$mprefix&m_sn=$msn&$hawb") ;
$M->follow_link(url_regex => qr/showc_track/);
my $content = $M->content;

print "$content\n"; # for debugging

# process $content as before
#
while ( $content =~ m#(.*)#g ) {
$currline=$1;

if ($currline =~ m#style12#i) {

$currline =~ m#.*>(.*?)<.*#i;
$result = $result . " / " . $1;
}
}
print "***$result\n";
$result = '';
}
 
Reply With Quote
 
ifiaz
Guest
Posts: n/a
 
      03-08-2008
Also, please so you know,

my $baseurl =
'http://www.bangkokflightservices.com/our_cargo_track&trace.php';
my $hawb = 'h_prefix=HAWB&h_sn=';

h_prefix should be HWB and not HAWB.

I have fixed that in my code and still the same problem that it throws
me to a different page.



On Mar 7, 9:46 pm, ifiaz <(E-Mail Removed)> wrote:
> You may need to adjust the follow_link call if there are several links
> on
> the same page that match that regex; see perldoc WWW::Mechanize for
> the
> arguments. If the server checks the Referer, you may also need to ->get
>
> /our_cargo_track.php first.
>
> Ben
> ----
>
> Thank you for your prompt response.
>
> When I used the code with minor modifications, I still have the
> problem that I can't access the data as the process throws me to
> another page as below.
>
> This is what the $content contains:
>
> <script> window.open ('http://www.bangkokflightservices.com/
> our_cargo_track.php') ;
> setTimeout("window.close();", 10);
> </script>
>
> How to get to the actual data page. Please guide me here as I am a
> newbie.
>
> I don't know how to implement Referer and all that.
>
> ### This is the complete code I used.
> #!/usr/bin/perl
>
> use WWW::Mechanize;
>
> my $baseurl =
> 'http://www.bangkokflightservices.com/our_cargo_track&trace.php';
> my $hawb = 'h_prefix=HAWB&h_sn=';
>
> my $M = WWW::Mechanize->new(auto_check => 1);
>
> ## Added code for testing Only
> my $F = WWW::Mechanize->new(auto_check => 1);
> $F->get("http://www.bangkokflightservices.com/our_cargo_track.php");
> my $contentF = $F->content;
> #print "$contentF\n";
> #$M->add_header("Referer => 'http://www.bangkokflightservices.com/
> our_cargo_track.php'" )
>
> while (<>) {
> chomp;
>
> my ($mprefix, $msn) = /(...)-(........)/ or do {
> warn "invalid MAWB: '$_'";
> next;
> };
>
> print "$mprefix $msn\n";
>
> $M->get("$baseurl?m_prefix=$mprefix&m_sn=$msn&$hawb") ;
> $M->follow_link(url_regex => qr/showc_track/);
> my $content = $M->content;
>
> print "$content\n"; # for debugging
>
> # process $content as before
> #
> while ( $content =~ m#(.*)#g ) {
> $currline=$1;
>
> if ($currline =~ m#style12#i) {
>
> $currline =~ m#.*>(.*?)<.*#i;
> $result = $result . " / " . $1;
> }
> }
> print "***$result\n";
> $result = '';
>
> }


 
Reply With Quote
 
fiazidris
Guest
Posts: n/a
 
      03-10-2008
On Mar 8, 10:34 pm, ifiaz <(E-Mail Removed)> wrote:
> Also, please so you know,
>
> my $baseurl =
> 'http://www.bangkokflightservices.com/our_cargo_track&trace.php';
> my $hawb = 'h_prefix=HAWB&h_sn=';
>
> h_prefix should be HWB and not HAWB.
>
> I have fixed that in my code and still the same problem that it throws
> me to a different page.
>


I have reached to a level where the following URL works on a browser:
prefix and serials can be changed.

http://203.151.118.123:8090/showc_tr...h=%A0%A0%A0%A0

but this URL doesn't return results using perl or curl.

Ben Morrow, please help.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
FAQ 2.17 What is perl.com? Perl Mongers? pm.org? perl.org? cpan.org? PerlFAQ Server Perl Misc 0 02-03-2011 11:00 AM
FAQ 1.4 What are Perl 4, Perl 5, or Perl 6? PerlFAQ Server Perl Misc 0 01-23-2011 05:00 AM
perl curl get data from website SVCitian Perl Misc 15 10-23-2010 02:01 AM
Perl Help - Windows Perl script accessing a Unix perl Script dpackwood Perl 3 09-30-2003 02:56 AM



Advertisments