Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > How to save a webpage contents to a file ( with LWP )

Reply
Thread Tools

How to save a webpage contents to a file ( with LWP )

 
 
Jack
Guest
Posts: n/a
 
      02-20-2008
Hi there, does anyone skilled in the art of LWP (or other perl module)
and screen scraping know how to do the equivalent of a "file", "save
as" html content ? Some webpages arent scrapeable but when you save
down their content to a local file its available. Any ideas would be
great.

Also, if there is a drop down + button to select content BUT in the
HTML source no "submit" entry at all, how does one remote control a
user selection without this post handle ?

Thanks in advance,

Jack
 
Reply With Quote
 
 
 
 
A. Sinan Unur
Guest
Posts: n/a
 
      02-20-2008
Jack <(E-Mail Removed)> wrote in news:(E-Mail Removed):

> Hi there, does anyone skilled in the art of LWP (or other perl module)
> and screen scraping know how to do the equivalent of a "file", "save
> as" html content ?


http://search.cpan.org/~gaas/libwww-.../LWP/Simple.pm

getstore($url, $file)

http://search.cpan.org/~gaas/libwww-...esponse_Object

http://search.cpan.org/~gaas/libwww-...TP/Response.pm

$r->content( $content )

This is used to get/set the raw content

$r->decoded_content( %options )

This will return the content after any Content-Encoding and charsets
has been decoded.

> Also, if there is a drop down + button to select content BUT in the
> HTML source no "submit" entry at all, how does one remote control a
> user selection without this post handle ?


If the page uses Javascript to dynamically post form contents, you will
have to figure out what the Javascript does and replicate it.

Sinan


--
A. Sinan Unur <(E-Mail Removed)>
(remove .invalid and reverse each component for email address)
clpmisc guidelines: <URL:http://www.rehabitation.com/clpmisc.shtml>

 
Reply With Quote
 
 
 
 
Jack
Guest
Posts: n/a
 
      02-20-2008
On Feb 20, 5:49*am, "A. Sinan Unur" <(E-Mail Removed)> wrote:
> Jack <(E-Mail Removed)> wrote innews:(E-Mail Removed):
>
> > Hi there, does anyone skilled in the art of LWP (or other perl module)
> > and screen scraping know how to do the equivalent of a "file", "save
> > as" html content ?

>
> http://search.cpan.org/~gaas/libwww-.../LWP/Simple.pm
>
> getstore($url, $file)
>
> http://search.cpan.org/~gaas/libwww-...pm#The_Respons...
>
> http://search.cpan.org/~gaas/libwww-...TP/Response.pm
>
> $r->content( $content )
>
> * * This is used to get/set the raw content
>
> $r->decoded_content( %options )
>
> * * This will return the content after any Content-Encoding and charsets
> * * has been decoded.
>
> > Also, if there is a drop down + button to select content BUT in the
> > HTML source no "submit" entry at all, how does one remote control a
> > user selection without this post handle ?

>
> If the page uses Javascript to dynamically post form contents, you will
> have to figure out what the Javascript does and replicate it.
>
> Sinan
>
> --
> A. Sinan Unur <(E-Mail Removed)>
> (remove .invalid and reverse each component for email address)
> clpmisc guidelines: <URL:http://www.rehabitation.com/clpmisc.shtml>


Hi Sinan the site uses ASP, no JS files.. this is all there is in the
html
<!--<SCRIPT>
//
</SCRIPT>-->
<FRAMESET ROWS="70,*" FRAMESPACING=0>
<FRAME NAME="header" SRC="./header_default.asp?
NoCache=2%2F20%2F2008+7%3A35%3A47+AM" SCROLLING="no" MARGINWIDTH="2"
MARGINHEIGHT="0">

<FRAME NAME="bodyx" SRC=
body.asp?centerin=GGCC
SCROLLING="auto" MARGINWIDTH="2" MARGINHEIGHT="2">


</FRAMESET>

</HTML>
 
Reply With Quote
 
A. Sinan Unur
Guest
Posts: n/a
 
      02-20-2008
Jack <(E-Mail Removed)> wrote in
news:(E-Mail Removed):

> On Feb 20, 5:49*am, "A. Sinan Unur" <(E-Mail Removed)> wrote:
>> Jack <(E-Mail Removed)> wrote
>> innews:412be207-d043-4b9d-bd96-252942

> http://www.velocityreviews.com/forums/(E-Mail Removed):
>>
>> > Hi there, does anyone skilled in the art of LWP (or other perl
>> > module) and screen scraping know how to do the equivalent of a
>> > "file", "save as" html content ?

>>
>> http://search.cpan.org/~gaas/libwww-.../LWP/Simple.pm
>>
>> getstore($url, $file)
>>
>> http://search.cpan.org/~gaas/libwww-perl-

5.808/lib/LWP.pm#The_Respons.
>> ..
>>
>> http://search.cpan.org/~gaas/libwww-...TP/Response.pm
>>
>> $r->content( $content )
>>
>> * * This is used to get/set the raw content
>>
>> $r->decoded_content( %options )
>>
>> * * This will return the content after any Content-Encoding and
>> charse

> ts
>> * * has been decoded.
>>
>> > Also, if there is a drop down + button to select content BUT in the
>> > HTML source no "submit" entry at all, how does one remote control a
>> > user selection without this post handle ?

>>
>> If the page uses Javascript to dynamically post form contents, you
>> will have to figure out what the Javascript does and replicate it.
>>
>> Sinan
>>
>> --
>> A. Sinan Unur <(E-Mail Removed)>


Do *not* quote sigs.

> Hi Sinan the site uses ASP, no JS files.. this is all there is in the
> html
> <!--<SCRIPT>
> //
> </SCRIPT>-->
> <FRAMESET ROWS="70,*" FRAMESPACING=0>
> <FRAME NAME="header" SRC="./header_default.asp?
> NoCache=2%2F20%2F2008+7%3A35%3A47+AM" SCROLLING="no" MARGINWIDTH="2"
> MARGINHEIGHT="0">
>
> <FRAME NAME="bodyx" SRCbody.asp?centerin=GGCC


I am assuming you retyped the source rather than copied & pasting.
Please don't retype code.

> SCROLLING="auto" MARGINWIDTH="2" MARGINHEIGHT="2">


Oh, but there is more. How about them frames?

Anyway, this forum is for help with the Perl aspect of things. If you
need to learn html, there is a group for that as well.

Sinan
--
A. Sinan Unur <(E-Mail Removed)>
(remove .invalid and reverse each component for email address)
clpmisc guidelines: <URL:http://www.rehabitation.com/clpmisc.shtml>

 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      02-20-2008
Jack wrote:
> this is all there is in the html
> <!--<SCRIPT>
> //
> </SCRIPT>-->
> <FRAMESET ROWS="70,*" FRAMESPACING=0>
> <FRAME NAME="header" SRC="./header_default.asp?
> NoCache=2%2F20%2F2008+7%3A35%3A47+AM" SCROLLING="no" MARGINWIDTH="2"
> MARGINHEIGHT="0">
>
> <FRAME NAME="bodyx" SRC=
> body.asp?centerin=GGCC
> SCROLLING="auto" MARGINWIDTH="2" MARGINHEIGHT="2">
>
>
> </FRAMESET>
>
> </HTML>


Then get the bodyx frame, not the frameset.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
Jack
Guest
Posts: n/a
 
      02-20-2008
On Feb 20, 8:08*am, Gunnar Hjalmarsson <(E-Mail Removed)> wrote:
> Jack wrote:
> > this is all there is in the html
> > * <!--<SCRIPT>
> > * *//
> > * </SCRIPT>-->
> > * <FRAMESET ROWS="70,*" FRAMESPACING=0>
> > * *<FRAME NAME="header" SRC="./header_default.asp?
> > NoCache=2%2F20%2F2008+7%3A35%3A47+AM" SCROLLING="no" MARGINWIDTH="2"
> > MARGINHEIGHT="0">

>
> > * *<FRAME NAME="bodyx" SRC=
> > body.asp?centerin=GGCC
> > * *SCROLLING="auto" MARGINWIDTH="2" MARGINHEIGHT="2">

>
> > </FRAMESET>

>
> > </HTML>

>
> Then get the bodyx frame, not the frameset.
>
> --
> Gunnar Hjalmarsson
> Email:http://www.gunnar.cc/cgi-bin/contact.pl- Hide quoted text -
>
> - Show quoted text -


How exactly does one get the bodyx frame, and more importantly how do
you auto select from the select box when there is no such mention of
it or a submit button in html for this ASP application.
Thank you,
Jack
 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      02-21-2008
Jack wrote:
> On Feb 20, 8:08 am, Gunnar Hjalmarsson <(E-Mail Removed)> wrote:
>> Jack wrote:
>>> this is all there is in the html
>>> <!--<SCRIPT>
>>> //
>>> </SCRIPT>-->
>>> <FRAMESET ROWS="70,*" FRAMESPACING=0>
>>> <FRAME NAME="header" SRC="./header_default.asp?
>>> NoCache=2%2F20%2F2008+7%3A35%3A47+AM" SCROLLING="no" MARGINWIDTH="2"
>>> MARGINHEIGHT="0">
>>> <FRAME NAME="bodyx" SRC=
>>> body.asp?centerin=GGCC
>>> SCROLLING="auto" MARGINWIDTH="2" MARGINHEIGHT="2">
>>> </FRAMESET>
>>> </HTML>

>>
>> Then get the bodyx frame, not the frameset.

>
> How exactly does one get the bodyx frame,


Assuming the URL of the frameset is
http://www.example.com/somepage/index.asp, you probably use the URL
http://www.example.com/somepage/body.asp?centerin=GGCC

> and more importantly how do
> you auto select from the select box when there is no such mention of
> it or a submit button in html for this ASP application.


As Sinan mentioned, you apparently need to learn some basics about HTML.
Asking questions in a Perl group is not the right way to do so.

Recommended reading: http://www.w3.org/TR/html4/present/frames.html

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
Adding contents on yaml file without overwriting actual contents Kamarulnizam Rahim Ruby 4 01-28-2011 09:10 AM
Email contents of webpage or Form on webpage w/o using Server scripting sifar Javascript 5 08-24-2005 05:47 PM
Save contents of iframe from parent's save button user ASP .Net 1 04-04-2005 07:44 PM
How to save lwp::useragent state? John Perl Misc 1 04-28-2004 01:30 PM



Advertisments