Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   python-parser running Beautiful Soup only spits out one line of 10.What i have gotten wrong here? (http://www.velocityreviews.com/forums/t740845-python-parser-running-beautiful-soup-only-spits-out-one-line-of-10-what-i-have-gotten-wrong-here.html)

Martin Kaspar 12-25-2010 05:58 PM

python-parser running Beautiful Soup only spits out one line of 10.What i have gotten wrong here?
 
Hello dear Community,.


I am trying to get a scraper up and running: And keep running into
problems.

when I try what you have i have learnedd so far I only get:
<strong>Schuldaten</strong>

Here is the code that I used:

import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen("http://www.schulministerium.nrw.de/BP/
SchuleSuchen?action=799.601437941842&SchulAdresseM apDO=142323")
soup = BeautifulSoup(page)
table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})
first_td = soup.find('td')
text = first_td.renderContents()
trimmed_text = text.strip()
print trimmed_text


i run it in the template at http://scraperwiki.com/scrapers/new/python

see the target: http://www.schulministerium.nrw.de/B...seMapDO=142323

What have I gotten wrong?

Can anybody review the code -

many thanks in Advance

regards
matze

John Nagle 12-25-2010 06:36 PM

Re: python-parser running Beautiful Soup only spits out one lineof 10. What i have gotten wrong here?
 
Your program is doing what you asked it to do. It finds the
first table with class 'bp_ergebnis_tab_info'. Then it ignores
that results. Then it finds the first "td" item in the document,
and prints the contents of that. Then it exits. What did
you want it to do?

Try this. It prints out the TD items on each
row of the table, in order.

import urllib2
from BeautifulSoup import BeautifulSoup
page =
urllib2.urlopen("http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=799.601437941842&SchulAdresseM apDO=142323")
soup = BeautifulSoup(page)
table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})
for row in table.findAll('tr') : # for all TR items (table rows)
for td in row.findAll('td') : # for TD items in row
text = td.renderContents().strip()
print(text)
print('-----') # mark end of row

John Nagle

On 12/25/2010 9:58 AM, Martin Kaspar wrote:
> Hello dear Community,.
> I am trying to get a scraper up and running: And keep running into
> problems.
>
> when I try what you have i have learned so far I only get:
> <strong>Schuldaten</strong>
>
> Here is the code that I used:
>
> import urllib2
> from BeautifulSoup import BeautifulSoup
> page = urllib2.urlopen("http://www.schulministerium.nrw.de/BP/
> SchuleSuchen?action=799.601437941842&SchulAdresseM apDO=142323")
> soup = BeautifulSoup(page)
> table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})
> first_td = soup.find('td')
> text = first_td.renderContents()
> trimmed_text = text.strip()
> print trimmed_text
>
>
> i run it in the template at http://scraperwiki.com/scrapers/new/python
>
> see the target: http://www.schulministerium.nrw.de/B...seMapDO=142323
>
> What have I gotten wrong?
>
> Can anybody review the code -
>
> many thanks in Advance
>
> regards
> matze




All times are GMT. The time now is 05:49 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.