Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > How can I follow links in my website

Reply
Thread Tools

How can I follow links in my website

 
 
Danny
Guest
Posts: n/a
 
      04-12-2004
I would like to browse a page in one of my websites and get info to populate
a database. But each page will have a NEXT and PREVIOUS link that takes you
to another page.

I need something to look at one page and save it to a file on the HD, then
follow the NEXT link and go to the next page, and do the same thing, and so
on.

Can this be done?


 
Reply With Quote
 
 
 
 
Eric Bohlman
Guest
Posts: n/a
 
      04-12-2004
"Danny" <(E-Mail Removed)> wrote in
newsfyec.6859$(E-Mail Removed) t:

> I would like to browse a page in one of my websites and get info to
> populate a database. But each page will have a NEXT and PREVIOUS link
> that takes you to another page.
>
> I need something to look at one page and save it to a file on the HD,
> then follow the NEXT link and go to the next page, and do the same
> thing, and so on.
>
> Can this be done?


Yep: LWP::Simple and HTML::LinkExtor together ought to do the trick.
 
Reply With Quote
 
 
 
 
John Bokma
Guest
Posts: n/a
 
      04-12-2004
Danny wrote:

> I would like to browse a page in one of my websites and get info to populate
> a database. But each page will have a NEXT and PREVIOUS link that takes you
> to another page.
>
> I need something to look at one page and save it to a file on the HD, then
> follow the NEXT link and go to the next page, and do the same thing, and so
> on.
>
> Can this be done?


Yes.

check the lwpcookbook, and HTML:arser, for example. It's possible to
not use the parser, but just a regexp if you know what you are doing .

--
John personal page: http://johnbokma.com/

Experienced Perl / Java developer available - http://castleamber.com/
 
Reply With Quote
 
Danny
Guest
Posts: n/a
 
      04-12-2004
"John Bokma" <(E-Mail Removed)> wrote in message
news:407abcfb$0$24349$(E-Mail Removed).. .
> Danny wrote:
>
> > I would like to browse a page in one of my websites and get info to

populate
> > a database. But each page will have a NEXT and PREVIOUS link that takes

you
> > to another page.
> >
> > I need something to look at one page and save it to a file on the HD,

then
> > follow the NEXT link and go to the next page, and do the same thing, and

so
> > on.
> >
> > Can this be done?

>
> Yes.
>
> check the lwpcookbook, and HTML:arser, for example. It's possible to
> not use the parser, but just a regexp if you know what you are doing .
>
> --
> John personal page: http://johnbokma.com/
>
> Experienced Perl / Java developer available - http://castleamber.com/



Thanks for your responses.
I have a sample that works, in that it gets a webpage, prints the contents
of the website to a text file and then prints all the links in the website.
Now I just want to follow the links in that website that have "nextpage" in
the link and so on (this means it goes to the next category page). and I
want to save each page to a text file like page1.txt, page2.txt etc etc

this script works but I am not sure where to put loops. I am still
learning.

HOw can I do this?
I would appreciate your help.
Thanks again
Danny

-------
use CGI;

$co = new CGI;
use LWP::Simple;
use HTML::LinkExtor;
print $co->header;
$html = get("http://www.website.com");
$link_extor = HTML::LinkExtor->new(\&handle_links);
$link_extor->parse($html);
use LWP::UserAgent;
$user_agent = new LWP::UserAgent;

$request = new HTTP::Request('GET','http://www.website.com');
$response = $user_agent->request($request);
open FILEHANDLE, ">file.txt";
print FILEHANDLE $response->{_content};
close FILEHANDLE;

sub handle_links
{
($tag, %links) = @_;
if ($tag eq 'a') {
foreach $key (keys %links) {
if ($key eq 'href') {
# I assume I put a test here for the NEXT link and then this gets
loades as above in REQUEST statement?
print "This is a link: $links{$key}.\n";
}
}
}
}


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
Firefox won't follow links. Dan Condon Firefox 1 06-10-2006 06:33 AM
IE 6 noticeably slower in retrieving pages & *can't follow links* F_S_M Computer Support 0 02-17-2005 11:47 PM
URL links in newsreaders not working - follow-up Iggy Computer Support 1 07-22-2004 01:12 AM
REALLY CHEAP computer programming ebook libraries just follow any ofthe links below =?ISO-8859-1?Q?=A0?= Java 0 01-10-2004 01:34 PM



Advertisments