![]() |
saving a webpage's links to the hard disk
Is there a good place to look to see where I can find some code that
will help me to save webpage's links to the local drive, after I have used urllib2 to retrieve the page? Many times I have to view these pages when I do not have access to the internet. |
Re: saving a webpage's links to the hard disk
En Sun, 04 May 2008 01:33:45 -0300, Jetus <stevegill7@gmail.com> escribió:
> Is there a good place to look to see where I can find some code that > will help me to save webpage's links to the local drive, after I have > used urllib2 to retrieve the page? > Many times I have to view these pages when I do not have access to the > internet. Don't reinvent the wheel and use wget http://en.wikipedia.org/wiki/Wget -- Gabriel Genellina |
Re: saving a webpage's links to the hard disk
On May 4, 12:33*am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote: > En Sun, 04 May 2008 01:33:45 -0300, Jetus <stevegi...@gmail.com> escribió: > > > Is there a good place to look to see where I can find some code that > > will help me to save webpage's links to the local drive, after I have > > used urllib2 to retrieve the page? > > Many times I have to view these pages when I do not have access to the > > internet. > > Don't reinvent the wheel and use wgethttp://en.wikipedia.org/wiki/Wget > > -- > Gabriel Genellina A lot of the functionality is already present. import urllib urllib.urlretrieve( 'http://python.org/', 'main.htm' ) from htmllib import HTMLParser from formatter import NullFormatter parser= HTMLParser( NullFormatter( ) ) parser.feed( open( 'main.htm' ).read( ) ) import urlparse for a in parser.anchorlist: print urlparse.urljoin( 'http://python.org/', a ) Output snipped: ... http://python.org/psf/ http://python.org/dev/ http://python.org/links/ http://python.org/download/releases/2.5.2 http://docs.python.org/ http://python.org/ftp/python/2.5.2/python-2.5.2.msi ... |
Re: saving a webpage's links to the hard disk
On May 4, 7:22 am, castiro...@gmail.com wrote:
> On May 4, 12:33 am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar> > wrote: > > > En Sun, 04 May 2008 01:33:45 -0300, Jetus <stevegi...@gmail.com> escribió: > > > > Is there a good place to look to see where I can find some code that > > > will help me to save webpage's links to the local drive, after I have > > > used urllib2 to retrieve the page? > > > Many times I have to view these pages when I do not have access to the > > > internet. > > > Don't reinvent the wheel and use wgethttp://en.wikipedia.org/wiki/Wget > > > -- > > Gabriel Genellina > > A lot of the functionality is already present. > > import urllib > urllib.urlretrieve( 'http://python.org/', 'main.htm' ) > from htmllib import HTMLParser > from formatter import NullFormatter > parser= HTMLParser( NullFormatter( ) ) > parser.feed( open( 'main.htm' ).read( ) ) > import urlparse > for a in parser.anchorlist: > print urlparse.urljoin( 'http://python.org/', a ) > > Output snipped: > > ...http://python.org/psf/http://python....thon-2.5.2.msi > ... How can I modify or add to the above code, so that the file references are saved to specified local directories, AND the saved webpage makes reference to the new saved files in the respective directories? Thanks for your help in advance. |
Re: saving a webpage's links to the hard disk
On May 7, 1:40*am, Jetus <stevegi...@gmail.com> wrote:
> On May 4, 7:22 am, castiro...@gmail.com wrote: > > > > > > > On May 4, 12:33 am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar> > > wrote: > > > > En Sun, 04 May 2008 01:33:45 -0300, Jetus <stevegi...@gmail.com> escribió: > > > > > Is there a good place to look to see where I can find some code that > > > > will help me to save webpage's links to the local drive, after I have > > > > used urllib2 to retrieve the page? > > > > Many times I have to view these pages when I do not have access to the > > > > internet. > > > > Don't reinvent the wheel and use wgethttp://en.wikipedia.org/wiki/Wget > > > > -- > > > Gabriel Genellina > > > A lot of the functionality is already present. > > > import urllib > > urllib.urlretrieve( 'http://python.org/', 'main.htm' ) > > from htmllib import HTMLParser > > from formatter import NullFormatter > > parser= HTMLParser( NullFormatter( ) ) > > parser.feed( open( 'main.htm' ).read( ) ) > > import urlparse > > for a in parser.anchorlist: > > * * print urlparse.urljoin( 'http://python.org/', a ) > > > Output snipped: > > > ...http://python.org/psf/http://python....on.org/links/h.... > > ... > > How can I modify or add to the above code, so that the file references > are saved to specified local directories, AND the saved webpage makes > reference to the new saved files in the respective directories? > Thanks for your help in advance.- Hide quoted text - > > - Show quoted text - You'd have to convert filenames in the loop to a file system path; try writing as is with makedirs( ). You'd have to replace contents in a file for links, so your best might be prefixing them with localhost and spawning a small bounce-router. |
Re: saving a webpage's links to the hard disk
Jetus wrote:
> On May 4, 7:22 am, castiro...@gmail.com wrote: >> On May 4, 12:33 am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar> >> wrote: >> >> > En Sun, 04 May 2008 01:33:45 -0300, Jetus <stevegi...@gmail.com> >> > escribió: >> >> > > Is there a good place to look to see where I can find some code that >> > > will help me to save webpage's links to the local drive, after I have >> > > used urllib2 to retrieve the page? >> > > Many times I have to view these pages when I do not have access to >> > > the internet. >> >> > Don't reinvent the wheel and use wgethttp://en.wikipedia.org/wiki/Wget >> >> > -- >> > Gabriel Genellina >> >> A lot of the functionality is already present. >> >> import urllib >> urllib.urlretrieve( 'http://python.org/', 'main.htm' ) >> from htmllib import HTMLParser >> from formatter import NullFormatter >> parser= HTMLParser( NullFormatter( ) ) >> parser.feed( open( 'main.htm' ).read( ) ) >> import urlparse >> for a in parser.anchorlist: >> print urlparse.urljoin( 'http://python.org/', a ) >> >> Output snipped: >> >> ...http://python.org/psf/http://python....thon-2.5.2.msi >> ... > > How can I modify or add to the above code, so that the file references > are saved to specified local directories, AND the saved webpage makes > reference to the new saved files in the respective directories? > Thanks for your help in advance. how about you *try* to do so - and if you have actual problems, you come back and ask for help? Alternatively, there's always guru.com Diez |
Re: saving a webpage's links to the hard disk
On May 7, 8:36*am, "Diez B. Roggisch" <de...@nospam.web.de> wrote:
> Jetus wrote: > > On May 4, 7:22 am, castiro...@gmail.com wrote: > >> On May 4, 12:33 am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar> > >> wrote: > > >> > En Sun, 04 May 2008 01:33:45 -0300, Jetus <stevegi...@gmail.com> > >> > escribió: > > >> > > Is there a good place to look to see where I can find some code that > >> > > will help me to save webpage's links to the local drive, after I have > >> > > used urllib2 to retrieve the page? > >> > > Many times I have to view these pages when I do not have access to > >> > > the internet. > > >> > Don't reinvent the wheel and use wgethttp://en.wikipedia.org/wiki/Wget > > >> > -- > >> > Gabriel Genellina > > >> A lot of the functionality is already present. > > >> import urllib > >> urllib.urlretrieve( 'http://python.org/', 'main.htm' ) > >> from htmllib import HTMLParser > >> from formatter import NullFormatter > >> parser= HTMLParser( NullFormatter( ) ) > >> parser.feed( open( 'main.htm' ).read( ) ) > >> import urlparse > >> for a in parser.anchorlist: > >> * * print urlparse.urljoin( 'http://python.org/', a ) > > >> Output snipped: > > >> ...http://python.org/psf/http://python....on.org/links/h... > >> ... > > > How can I modify or add to the above code, so that the file references > > are saved to specified local directories, AND the saved webpage makes > > reference to the new saved files in the respective directories? > > Thanks for your help in advance. > > how about you *try* to do so - and if you have actual problems, you come > back and ask for help? Alternatively, there's always guru.com > > Diez- Hide quoted text - > > - Show quoted text - I've tried, no avail. How does the open-source plug to Python look/ work? Firefox was able to spawn Python in a toolbar in a distant land. Does it still? I believe under DOM, return a file named X that contains a list of changes to make to the page, or put it at the top of one, to be removed by Firefox. At that point, X would pretty much be the last lexicly-sorted file in a pre-established directory. Files are really easy to create and add syntax too, if you create a bunch of them. Sector size was bouncing though, which brings that all the way up to file system. for( int docID= 0; docID++ ) { if ( doc.links[ docID ]== pythonfileA.links[ pyID ] ) { doc.links[ docID ].anchor= pythonfileB.links[ pyID ]; pyID++; } } |
| All times are GMT. The time now is 02:10 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.