Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   Parsing (http://www.velocityreviews.com/forums/t319501-parsing.html)

Michael 07-10-2003 05:38 PM

Parsing
 
I have been assigned a project to parse a webpage for data using
Python. I have finished only basic tutorials. Any suggestions as to
where I should go from here? Thanks in advance.

Simon Bayling 07-10-2003 08:04 PM

Re: Parsing
 
whatsupg21@hotmail.com (Michael) wrote in
news:e5fb8973.0307100938.13fcea56@posting.google.c om:

> I have been assigned a project to parse a webpage for data using
> Python. I have finished only basic tutorials. Any suggestions as to
> where I should go from here? Thanks in advance.
>


Parsing? What are you looking for?
Do you have to download the page as well?

If it's a fairly simple thing to find, you could use something like;

>>> import urllib
>>> source = urllib.urlopen("http://www.google.com").readlines()
>>> for line in source:
>>> if line.find("logo.gif") > -1:
>>> print "Found google logo"


If the data to find is more complicated, or you need to parse the HTML as
well, you should look at more string methods, maybe regular expressions
(import re)...

Cheers,
Simon.

Peter van Kampen 07-10-2003 10:07 PM

Re: Parsing
 
In article <e5fb8973.0307100938.13fcea56@posting.google.com >, Michael wrote:
> I have been assigned a project to parse a webpage for data using
> Python. I have finished only basic tutorials. Any suggestions as to
> where I should go from here? Thanks in advance.



Try to be a little more specific. Parse for what? Links? Images? Tags?

Anyway. A good start might be the HTMLParser that comes with the
batteries since 2.2 if I remember correctly. See

http://www.python.org/doc/current/li...r-example.html

for a tiny example.

PterK

--
Peter van Kampen
pterk -- at -- datatailors.com


All times are GMT. The time now is 05:01 AM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.