Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > python-parser running Beautiful Soup only spits out one line of 10.What i have gotten wrong here?

Reply
Thread Tools

python-parser running Beautiful Soup only spits out one line of 10.What i have gotten wrong here?

 
 
Martin Kaspar
Guest
Posts: n/a
 
      12-25-2010
Hello dear Community,.


I am trying to get a scraper up and running: And keep running into
problems.

when I try what you have i have learnedd so far I only get:
<strong>Schuldaten</strong>

Here is the code that I used:

import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen("http://www.schulministerium.nrw.de/BP/
SchuleSuchen?action=799.601437941842&SchulAdresseM apDO=142323")
soup = BeautifulSoup(page)
table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})
first_td = soup.find('td')
text = first_td.renderContents()
trimmed_text = text.strip()
print trimmed_text


i run it in the template at http://scraperwiki.com/scrapers/new/python

see the target: http://www.schulministerium.nrw.de/B...seMapDO=142323

What have I gotten wrong?

Can anybody review the code -

many thanks in Advance

regards
matze
 
Reply With Quote
 
 
 
 
John Nagle
Guest
Posts: n/a
 
      12-25-2010
Your program is doing what you asked it to do. It finds the
first table with class 'bp_ergebnis_tab_info'. Then it ignores
that results. Then it finds the first "td" item in the document,
and prints the contents of that. Then it exits. What did
you want it to do?

Try this. It prints out the TD items on each
row of the table, in order.

import urllib2
from BeautifulSoup import BeautifulSoup
page =
urllib2.urlopen("http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=799.601437941842&SchulAdresseM apDO=142323")
soup = BeautifulSoup(page)
table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})
for row in table.findAll('tr') : # for all TR items (table rows)
for td in row.findAll('td') : # for TD items in row
text = td.renderContents().strip()
print(text)
print('-----') # mark end of row

John Nagle

On 12/25/2010 9:58 AM, Martin Kaspar wrote:
> Hello dear Community,.
> I am trying to get a scraper up and running: And keep running into
> problems.
>
> when I try what you have i have learned so far I only get:
> <strong>Schuldaten</strong>
>
> Here is the code that I used:
>
> import urllib2
> from BeautifulSoup import BeautifulSoup
> page = urllib2.urlopen("http://www.schulministerium.nrw.de/BP/
> SchuleSuchen?action=799.601437941842&SchulAdresseM apDO=142323")
> soup = BeautifulSoup(page)
> table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})
> first_td = soup.find('td')
> text = first_td.renderContents()
> trimmed_text = text.strip()
> print trimmed_text
>
>
> i run it in the template at http://scraperwiki.com/scrapers/new/python
>
> see the target: http://www.schulministerium.nrw.de/B...seMapDO=142323
>
> What have I gotten wrong?
>
> Can anybody review the code -
>
> many thanks in Advance
>
> regards
> matze


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
python-parser running Beautiful Soup needs to be reviewed Martin Kaspar Python 4 12-12-2010 01:29 PM
Using Beautiful Soup to entangle bookmarks.html Francach Python 15 09-21-2006 08:56 PM
Using Beautiful Soup to entangle bookmarks.html Anthra Norell Python 0 09-07-2006 08:47 PM
Using Beautiful Soup Tempo Python 1 08-19-2006 01:11 AM
beautiful soup library question meyerkp@gmail.com Python 2 03-11-2006 04:28 AM



Advertisments