![]() |
web crawling.
Hello, I have been writing very simple Python programs that parse HTML and such, mainly just to get a better feel for the language. Here is my question: If I parsed an HTML page into all of the image files listed on that page, how could I request all of those images and download them into some specified folder? I am sure this is quite easy, but I am stuck. Thank you very much. Burgeoning Pythonista |
Re: web crawling.
S Borg <spwpreston@gmail.com> wrote:
> Hello, > > I have been writing very simple Python programs that parse HTML and > such, mainly just to get > a better feel for the language. Here is my question: If I parsed an > HTML page into all of the image > files listed on that page, how could I request all of those images and > download them into some specified folder? I am sure this is quite easy, > but I am stuck. There's a good crawler in the Demo directory of the Python source distribution, so download and unpack said sources and look there. Alex |
Re: web crawling.
S Borg wrote: > Hello, > > I have been writing very simple Python programs that parse HTML and > such, mainly just to get > a better feel for the language. Here is my question: If I parsed an > HTML page into all of the image > files listed on that page, how could I request all of those images and > download them into some specified folder? I am sure this is quite easy, > but I am stuck. > > Thank you very much. > Burgeoning Pythonista http://sig.levillage.org/?p=588 |
Re: web crawling.
Use BeautifulSoup to get all the image tags out of the html.
You'll need to join the urls of the images to the url of the page (urlparse.urljoin off the top of my head). If you look at BeautifulSoup you will see how to get the 'src' reference of each image tag. All the best, Fuzzyman http://www.voidspace.org.uk/python/index.shtml |
Re: web crawling.
Alex Martelli wrote:
> S Borg <spwpreston@gmail.com> wrote: > > >> Hello, >> >> I have been writing very simple Python programs that parse HTML and >>such, mainly just to get >>a better feel for the language. Here is my question: If I parsed an >>HTML page into all of the image >>files listed on that page, how could I request all of those images and >>download them into some specified folder? I am sure this is quite easy, >>but I am stuck. > > > There's a good crawler in the Demo directory of the Python source > distribution, so download and unpack said sources and look there. > > > Alex Hm. Looks like that's: Python-2.4.2/Tools/webchecker See 'pydoc ./webchecker.py' for more info. ---J -- (remove zeez if demunging email address) |
| All times are GMT. The time now is 11:39 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.