Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   saving a webpage's links to the hard disk (http://www.velocityreviews.com/forums/t613322-saving-a-webpages-links-to-the-hard-disk.html)

Jetus 05-04-2008 04:33 AM

saving a webpage's links to the hard disk
 
Is there a good place to look to see where I can find some code that
will help me to save webpage's links to the local drive, after I have
used urllib2 to retrieve the page?
Many times I have to view these pages when I do not have access to the
internet.

Gabriel Genellina 05-04-2008 05:33 AM

Re: saving a webpage's links to the hard disk
 
En Sun, 04 May 2008 01:33:45 -0300, Jetus <stevegill7@gmail.com> escribió:

> Is there a good place to look to see where I can find some code that
> will help me to save webpage's links to the local drive, after I have
> used urllib2 to retrieve the page?
> Many times I have to view these pages when I do not have access to the
> internet.


Don't reinvent the wheel and use wget
http://en.wikipedia.org/wiki/Wget

--
Gabriel Genellina


castironpi@gmail.com 05-04-2008 11:22 AM

Re: saving a webpage's links to the hard disk
 
On May 4, 12:33*am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
> En Sun, 04 May 2008 01:33:45 -0300, Jetus <stevegi...@gmail.com> escribió:
>
> > Is there a good place to look to see where I can find some code that
> > will help me to save webpage's links to the local drive, after I have
> > used urllib2 to retrieve the page?
> > Many times I have to view these pages when I do not have access to the
> > internet.

>
> Don't reinvent the wheel and use wgethttp://en.wikipedia.org/wiki/Wget
>
> --
> Gabriel Genellina


A lot of the functionality is already present.

import urllib
urllib.urlretrieve( 'http://python.org/', 'main.htm' )
from htmllib import HTMLParser
from formatter import NullFormatter
parser= HTMLParser( NullFormatter( ) )
parser.feed( open( 'main.htm' ).read( ) )
import urlparse
for a in parser.anchorlist:
print urlparse.urljoin( 'http://python.org/', a )

Output snipped:

...
http://python.org/psf/
http://python.org/dev/
http://python.org/links/
http://python.org/download/releases/2.5.2
http://docs.python.org/
http://python.org/ftp/python/2.5.2/python-2.5.2.msi
...

Jetus 05-07-2008 06:40 AM

Re: saving a webpage's links to the hard disk
 
On May 4, 7:22 am, castiro...@gmail.com wrote:
> On May 4, 12:33 am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
> wrote:
>
> > En Sun, 04 May 2008 01:33:45 -0300, Jetus <stevegi...@gmail.com> escribió:

>
> > > Is there a good place to look to see where I can find some code that
> > > will help me to save webpage's links to the local drive, after I have
> > > used urllib2 to retrieve the page?
> > > Many times I have to view these pages when I do not have access to the
> > > internet.

>
> > Don't reinvent the wheel and use wgethttp://en.wikipedia.org/wiki/Wget

>
> > --
> > Gabriel Genellina

>
> A lot of the functionality is already present.
>
> import urllib
> urllib.urlretrieve( 'http://python.org/', 'main.htm' )
> from htmllib import HTMLParser
> from formatter import NullFormatter
> parser= HTMLParser( NullFormatter( ) )
> parser.feed( open( 'main.htm' ).read( ) )
> import urlparse
> for a in parser.anchorlist:
> print urlparse.urljoin( 'http://python.org/', a )
>
> Output snipped:
>
> ...http://python.org/psf/http://python....thon-2.5.2.msi
> ...


How can I modify or add to the above code, so that the file references
are saved to specified local directories, AND the saved webpage makes
reference to the new saved files in the respective directories?
Thanks for your help in advance.

castironpi@gmail.com 05-07-2008 09:59 AM

Re: saving a webpage's links to the hard disk
 
On May 7, 1:40*am, Jetus <stevegi...@gmail.com> wrote:
> On May 4, 7:22 am, castiro...@gmail.com wrote:
>
>
>
>
>
> > On May 4, 12:33 am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
> > wrote:

>
> > > En Sun, 04 May 2008 01:33:45 -0300, Jetus <stevegi...@gmail.com> escribió:

>
> > > > Is there a good place to look to see where I can find some code that
> > > > will help me to save webpage's links to the local drive, after I have
> > > > used urllib2 to retrieve the page?
> > > > Many times I have to view these pages when I do not have access to the
> > > > internet.

>
> > > Don't reinvent the wheel and use wgethttp://en.wikipedia.org/wiki/Wget

>
> > > --
> > > Gabriel Genellina

>
> > A lot of the functionality is already present.

>
> > import urllib
> > urllib.urlretrieve( 'http://python.org/', 'main.htm' )
> > from htmllib import HTMLParser
> > from formatter import NullFormatter
> > parser= HTMLParser( NullFormatter( ) )
> > parser.feed( open( 'main.htm' ).read( ) )
> > import urlparse
> > for a in parser.anchorlist:
> > * * print urlparse.urljoin( 'http://python.org/', a )

>
> > Output snipped:

>
> > ...http://python.org/psf/http://python....on.org/links/h....
> > ...

>
> How can I modify or add to the above code, so that the file references
> are saved to specified local directories, AND the saved webpage makes
> reference to the new saved files in the respective directories?
> Thanks for your help in advance.- Hide quoted text -
>
> - Show quoted text -


You'd have to convert filenames in the loop to a file system path; try
writing as is with makedirs( ). You'd have to replace contents in a
file for links, so your best might be prefixing them with localhost
and spawning a small bounce-router.

Diez B. Roggisch 05-07-2008 01:36 PM

Re: saving a webpage's links to the hard disk
 
Jetus wrote:

> On May 4, 7:22 am, castiro...@gmail.com wrote:
>> On May 4, 12:33 am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
>> wrote:
>>
>> > En Sun, 04 May 2008 01:33:45 -0300, Jetus <stevegi...@gmail.com>
>> > escribió:

>>
>> > > Is there a good place to look to see where I can find some code that
>> > > will help me to save webpage's links to the local drive, after I have
>> > > used urllib2 to retrieve the page?
>> > > Many times I have to view these pages when I do not have access to
>> > > the internet.

>>
>> > Don't reinvent the wheel and use wgethttp://en.wikipedia.org/wiki/Wget

>>
>> > --
>> > Gabriel Genellina

>>
>> A lot of the functionality is already present.
>>
>> import urllib
>> urllib.urlretrieve( 'http://python.org/', 'main.htm' )
>> from htmllib import HTMLParser
>> from formatter import NullFormatter
>> parser= HTMLParser( NullFormatter( ) )
>> parser.feed( open( 'main.htm' ).read( ) )
>> import urlparse
>> for a in parser.anchorlist:
>> print urlparse.urljoin( 'http://python.org/', a )
>>
>> Output snipped:
>>
>> ...http://python.org/psf/http://python....thon-2.5.2.msi
>> ...

>
> How can I modify or add to the above code, so that the file references
> are saved to specified local directories, AND the saved webpage makes
> reference to the new saved files in the respective directories?
> Thanks for your help in advance.


how about you *try* to do so - and if you have actual problems, you come
back and ask for help? Alternatively, there's always guru.com

Diez

castironpi@gmail.com 05-08-2008 12:41 AM

Re: saving a webpage's links to the hard disk
 
On May 7, 8:36*am, "Diez B. Roggisch" <de...@nospam.web.de> wrote:
> Jetus wrote:
> > On May 4, 7:22 am, castiro...@gmail.com wrote:
> >> On May 4, 12:33 am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
> >> wrote:

>
> >> > En Sun, 04 May 2008 01:33:45 -0300, Jetus <stevegi...@gmail.com>
> >> > escribió:

>
> >> > > Is there a good place to look to see where I can find some code that
> >> > > will help me to save webpage's links to the local drive, after I have
> >> > > used urllib2 to retrieve the page?
> >> > > Many times I have to view these pages when I do not have access to
> >> > > the internet.

>
> >> > Don't reinvent the wheel and use wgethttp://en.wikipedia.org/wiki/Wget

>
> >> > --
> >> > Gabriel Genellina

>
> >> A lot of the functionality is already present.

>
> >> import urllib
> >> urllib.urlretrieve( 'http://python.org/', 'main.htm' )
> >> from htmllib import HTMLParser
> >> from formatter import NullFormatter
> >> parser= HTMLParser( NullFormatter( ) )
> >> parser.feed( open( 'main.htm' ).read( ) )
> >> import urlparse
> >> for a in parser.anchorlist:
> >> * * print urlparse.urljoin( 'http://python.org/', a )

>
> >> Output snipped:

>
> >> ...http://python.org/psf/http://python....on.org/links/h...
> >> ...

>
> > How can I modify or add to the above code, so that the file references
> > are saved to specified local directories, AND the saved webpage makes
> > reference to the new saved files in the respective directories?
> > Thanks for your help in advance.

>
> how about you *try* to do so - and if you have actual problems, you come
> back and ask for help? Alternatively, there's always guru.com
>
> Diez- Hide quoted text -
>
> - Show quoted text -


I've tried, no avail. How does the open-source plug to Python look/
work? Firefox was able to spawn Python in a toolbar in a distant
land. Does it still? I believe under DOM, return a file named X that
contains a list of changes to make to the page, or put it at the top
of one, to be removed by Firefox. At that point, X would pretty much
be the last lexicly-sorted file in a pre-established directory. Files
are really easy to create and add syntax too, if you create a bunch of
them. Sector size was bouncing though, which brings that all the way
up to file system.

for( int docID= 0; docID++ ) {
if ( doc.links[ docID ]== pythonfileA.links[ pyID ] ) {
doc.links[ docID ].anchor= pythonfileB.links[ pyID ];
pyID++;
}
}


All times are GMT. The time now is 02:10 PM.

Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57