Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > parsing tables with beautiful soup?

Reply
Thread Tools

parsing tables with beautiful soup?

 
 
cjl
Guest
Posts: n/a
 
      03-21-2007
I am learning python and beautiful soup, and I'm stuck.

A web page has a table that contains data I would like to scrape. The
table has a unique class, so I can use:

soup.find("table", {"class": "class_name"})

This isolates the table. So far, so good. Next, this table has a
certain number of rows (I won't know ahead of time how many), and each
row has a set number of cells (which will be constant).

I couldn't find example code on how to loop through the contents of
the rows and cells of a table using beautiful soup. I'm guessing I
need an outer loop for the rows and an inner loop for the cells, but I
don't know how to iterate over the tags that I want. The beautiful
soup documentation is a little beyond me at this point.

Can anyone point me in the right direction?

thanks again,
cjl

 
Reply With Quote
 
 
 
 
cjl
Guest
Posts: n/a
 
      03-21-2007
This works:

for row in soup.find("table",{"class": "class_name"}):
for cell in row:
print cell.contents[0]

Is there a better way to do this?

-cjl

 
Reply With Quote
 
 
 
 
Duncan Booth
Guest
Posts: n/a
 
      03-22-2007
"cjl" <(E-Mail Removed)> wrote:

> This works:
>
> for row in soup.find("table",{"class": "class_name"}):
> for cell in row:
> print cell.contents[0]
>
> Is there a better way to do this?
>


It may work for the page you are testing against, but it wouldn't work if
your page contained valid HTML. You are assuming that the TR elements are
direct children of the TABLE, but HTML requires that the TR elements appear
inside THEAD, TBODY or TFOOT elements, so if anyone ever corrects the html
your code will break.

Something like this (untested) ought to work and be reasonably robust:

table = soup.find("table",{"class": "class_name"})
for row in table.findAll("tr"):
for cell in row.findAll("td"):
print cell.findAll(text=True)

 
Reply With Quote
 
cjl
Guest
Posts: n/a
 
      03-22-2007
DB:

Thank you, that worked perfectly.

-CJL

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Beautiful Soup Table Parsing Andreas Perstinger Python 0 08-09-2012 07:25 AM
Re: Beautiful Soup Table Parsing Dieter Maurer Python 0 08-09-2012 05:43 AM
A little complex usage of Beautiful Soup Parsing Help! SAKTHEESH Python 1 07-22-2011 02:47 AM
Text after 2 tables actually appears between 2 tables ! Peter Bassett HTML 3 08-15-2003 06:46 PM



Advertisments