![]() |
How to save a webpage contents to a file ( with LWP )
Hi there, does anyone skilled in the art of LWP (or other perl module)
and screen scraping know how to do the equivalent of a "file", "save as" html content ? Some webpages arent scrapeable but when you save down their content to a local file its available. Any ideas would be great. Also, if there is a drop down + button to select content BUT in the HTML source no "submit" entry at all, how does one remote control a user selection without this post handle ? Thanks in advance, Jack |
Re: How to save a webpage contents to a file ( with LWP )
Jack <jack_posemsky@yahoo.com> wrote in news:412be207-d043-4b9d-bd96-25294294d50e@u72g2000hsf.googlegroups.com:
> Hi there, does anyone skilled in the art of LWP (or other perl module) > and screen scraping know how to do the equivalent of a "file", "save > as" html content ? http://search.cpan.org/~gaas/libwww-.../LWP/Simple.pm getstore($url, $file) http://search.cpan.org/~gaas/libwww-...esponse_Object http://search.cpan.org/~gaas/libwww-...TP/Response.pm $r->content( $content ) This is used to get/set the raw content $r->decoded_content( %options ) This will return the content after any Content-Encoding and charsets has been decoded. > Also, if there is a drop down + button to select content BUT in the > HTML source no "submit" entry at all, how does one remote control a > user selection without this post handle ? If the page uses Javascript to dynamically post form contents, you will have to figure out what the Javascript does and replicate it. Sinan -- A. Sinan Unur <1usa@llenroc.ude.invalid> (remove .invalid and reverse each component for email address) clpmisc guidelines: <URL:http://www.rehabitation.com/clpmisc.shtml> |
Re: How to save a webpage contents to a file ( with LWP )
On Feb 20, 5:49*am, "A. Sinan Unur" <1...@llenroc.ude.invalid> wrote:
> Jack <jack_posem...@yahoo.com> wrote innews:412be207-d043-4b9d-bd96-25294294d50e@u72g2000hsf.googlegroups.com: > > > Hi there, does anyone skilled in the art of LWP (or other perl module) > > and screen scraping know how to do the equivalent of a "file", "save > > as" html content ? > > http://search.cpan.org/~gaas/libwww-.../LWP/Simple.pm > > getstore($url, $file) > > http://search.cpan.org/~gaas/libwww-...pm#The_Respons... > > http://search.cpan.org/~gaas/libwww-...TP/Response.pm > > $r->content( $content ) > > * * This is used to get/set the raw content > > $r->decoded_content( %options ) > > * * This will return the content after any Content-Encoding and charsets > * * has been decoded. > > > Also, if there is a drop down + button to select content BUT in the > > HTML source no "submit" entry at all, how does one remote control a > > user selection without this post handle ? > > If the page uses Javascript to dynamically post form contents, you will > have to figure out what the Javascript does and replicate it. > > Sinan > > -- > A. Sinan Unur <1...@llenroc.ude.invalid> > (remove .invalid and reverse each component for email address) > clpmisc guidelines: <URL:http://www.rehabitation.com/clpmisc.shtml> Hi Sinan the site uses ASP, no JS files.. this is all there is in the html <!--<SCRIPT> // </SCRIPT>--> <FRAMESET ROWS="70,*" FRAMESPACING=0> <FRAME NAME="header" SRC="./header_default.asp? NoCache=2%2F20%2F2008+7%3A35%3A47+AM" SCROLLING="no" MARGINWIDTH="2" MARGINHEIGHT="0"> <FRAME NAME="bodyx" SRC= body.asp?centerin=GGCC SCROLLING="auto" MARGINWIDTH="2" MARGINHEIGHT="2"> </FRAMESET> </HTML> |
Re: How to save a webpage contents to a file ( with LWP )
Jack <jack_posemsky@yahoo.com> wrote in
news:14c9e85d-9e1d-43ca-ae55-423ce6256df2@q78g2000hsh.googlegroups.com: > On Feb 20, 5:49*am, "A. Sinan Unur" <1...@llenroc.ude.invalid> wrote: >> Jack <jack_posem...@yahoo.com> wrote >> innews:412be207-d043-4b9d-bd96-252942 > 94d50e@u72g2000hsf.googlegroups.com: >> >> > Hi there, does anyone skilled in the art of LWP (or other perl >> > module) and screen scraping know how to do the equivalent of a >> > "file", "save as" html content ? >> >> http://search.cpan.org/~gaas/libwww-.../LWP/Simple.pm >> >> getstore($url, $file) >> >> http://search.cpan.org/~gaas/libwww-perl- 5.808/lib/LWP.pm#The_Respons. >> .. >> >> http://search.cpan.org/~gaas/libwww-...TP/Response.pm >> >> $r->content( $content ) >> >> * * This is used to get/set the raw content >> >> $r->decoded_content( %options ) >> >> * * This will return the content after any Content-Encoding and >> charse > ts >> * * has been decoded. >> >> > Also, if there is a drop down + button to select content BUT in the >> > HTML source no "submit" entry at all, how does one remote control a >> > user selection without this post handle ? >> >> If the page uses Javascript to dynamically post form contents, you >> will have to figure out what the Javascript does and replicate it. >> >> Sinan >> >> -- >> A. Sinan Unur <1...@llenroc.ude.invalid> Do *not* quote sigs. > Hi Sinan the site uses ASP, no JS files.. this is all there is in the > html > <!--<SCRIPT> > // > </SCRIPT>--> > <FRAMESET ROWS="70,*" FRAMESPACING=0> > <FRAME NAME="header" SRC="./header_default.asp? > NoCache=2%2F20%2F2008+7%3A35%3A47+AM" SCROLLING="no" MARGINWIDTH="2" > MARGINHEIGHT="0"> > > <FRAME NAME="bodyx" SRCbody.asp?centerin=GGCC I am assuming you retyped the source rather than copied & pasting. Please don't retype code. > SCROLLING="auto" MARGINWIDTH="2" MARGINHEIGHT="2"> Oh, but there is more. How about them frames? Anyway, this forum is for help with the Perl aspect of things. If you need to learn html, there is a group for that as well. Sinan -- A. Sinan Unur <1usa@llenroc.ude.invalid> (remove .invalid and reverse each component for email address) clpmisc guidelines: <URL:http://www.rehabitation.com/clpmisc.shtml> |
Re: How to save a webpage contents to a file ( with LWP )
Jack wrote:
> this is all there is in the html > <!--<SCRIPT> > // > </SCRIPT>--> > <FRAMESET ROWS="70,*" FRAMESPACING=0> > <FRAME NAME="header" SRC="./header_default.asp? > NoCache=2%2F20%2F2008+7%3A35%3A47+AM" SCROLLING="no" MARGINWIDTH="2" > MARGINHEIGHT="0"> > > <FRAME NAME="bodyx" SRC= > body.asp?centerin=GGCC > SCROLLING="auto" MARGINWIDTH="2" MARGINHEIGHT="2"> > > > </FRAMESET> > > </HTML> Then get the bodyx frame, not the frameset. -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl |
Re: How to save a webpage contents to a file ( with LWP )
On Feb 20, 8:08*am, Gunnar Hjalmarsson <nore...@gunnar.cc> wrote:
> Jack wrote: > > this is all there is in the html > > * <!--<SCRIPT> > > * *// > > * </SCRIPT>--> > > * <FRAMESET ROWS="70,*" FRAMESPACING=0> > > * *<FRAME NAME="header" SRC="./header_default.asp? > > NoCache=2%2F20%2F2008+7%3A35%3A47+AM" SCROLLING="no" MARGINWIDTH="2" > > MARGINHEIGHT="0"> > > > * *<FRAME NAME="bodyx" SRC= > > body.asp?centerin=GGCC > > * *SCROLLING="auto" MARGINWIDTH="2" MARGINHEIGHT="2"> > > > </FRAMESET> > > > </HTML> > > Then get the bodyx frame, not the frameset. > > -- > Gunnar Hjalmarsson > Email:http://www.gunnar.cc/cgi-bin/contact.pl- Hide quoted text - > > - Show quoted text - How exactly does one get the bodyx frame, and more importantly how do you auto select from the select box when there is no such mention of it or a submit button in html for this ASP application. Thank you, Jack |
Re: How to save a webpage contents to a file ( with LWP )
Jack wrote:
> On Feb 20, 8:08 am, Gunnar Hjalmarsson <nore...@gunnar.cc> wrote: >> Jack wrote: >>> this is all there is in the html >>> <!--<SCRIPT> >>> // >>> </SCRIPT>--> >>> <FRAMESET ROWS="70,*" FRAMESPACING=0> >>> <FRAME NAME="header" SRC="./header_default.asp? >>> NoCache=2%2F20%2F2008+7%3A35%3A47+AM" SCROLLING="no" MARGINWIDTH="2" >>> MARGINHEIGHT="0"> >>> <FRAME NAME="bodyx" SRC= >>> body.asp?centerin=GGCC >>> SCROLLING="auto" MARGINWIDTH="2" MARGINHEIGHT="2"> >>> </FRAMESET> >>> </HTML> >> >> Then get the bodyx frame, not the frameset. > > How exactly does one get the bodyx frame, Assuming the URL of the frameset is http://www.example.com/somepage/index.asp, you probably use the URL http://www.example.com/somepage/body.asp?centerin=GGCC > and more importantly how do > you auto select from the select box when there is no such mention of > it or a submit button in html for this ASP application. As Sinan mentioned, you apparently need to learn some basics about HTML. Asking questions in a Perl group is not the right way to do so. Recommended reading: http://www.w3.org/TR/html4/present/frames.html -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl |
| All times are GMT. The time now is 11:47 AM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.