Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Beautiful Soup iterator question....

Reply
Thread Tools

Beautiful Soup iterator question....

 
 
cjl
Guest
Posts: n/a
 
      04-20-2007
P:

I am screen-scraping a table. The table has an unknown number of rows,
but each row has exactly 8 cells. I would like to extract the data
from the cells, but the first three cells in each row have their data
nested inside other tags.

So I have the following code:

for row in table.findAll("tr"):
for cell in row.findAll("td"):
print cell.contents[0]

This code prints out all the data, but of course the first three cells
still contain their unwanted tags.

I would like to do something like this:

for cell1, cell2, cell3, cell4, cell5, cell6, cell7, cell8 in
row.findAll("td"):

Then treat each cell differently.

I can't figure this out. Can anyone point me in the right direction?

-CJL

 
Reply With Quote
 
 
 
 
Steve Holden
Guest
Posts: n/a
 
      04-20-2007
cjl wrote:
> P:
>
> I am screen-scraping a table. The table has an unknown number of rows,
> but each row has exactly 8 cells. I would like to extract the data
> from the cells, but the first three cells in each row have their data
> nested inside other tags.
>
> So I have the following code:
>
> for row in table.findAll("tr"):
> for cell in row.findAll("td"):
> print cell.contents[0]
>
> This code prints out all the data, but of course the first three cells
> still contain their unwanted tags.
>
> I would like to do something like this:
>
> for cell1, cell2, cell3, cell4, cell5, cell6, cell7, cell8 in
> row.findAll("td"):
>
> Then treat each cell differently.
>
> I can't figure this out. Can anyone point me in the right direction?
>

did you try something like (untested)

cell1, cell2, cell3, cell4, cell5, \
cell6, cell7, cell8 = row.findAll("td")

No need for the "for" if you want to handle each cell differently, you
won;t be iterating over htem . And, as you saw, it doesn't work unless
row.findAll(...) returns a sequence of eight-item containers.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Recent Ramblings http://holdenweb.blogspot.com

 
Reply With Quote
 
 
 
 
Paul McGuire
Guest
Posts: n/a
 
      04-20-2007
On Apr 20, 2:05 pm, Steve Holden <(E-Mail Removed)> wrote:
<snip>
>
> did you try something like (untested)
>
> cell1, cell2, cell3, cell4, cell5, \
> cell6, cell7, cell8 = row.findAll("td")
>
> No need for the "for" if you want to handle each cell differently, you
> won;t be iterating over htem . And, as you saw, it doesn't work unless
> row.findAll(...) returns a sequence of eight-item containers.
>


One defensive approach to handle rows that might have too few or too
many elements, is to construct a larger list, and then slice the right
number of elements from it.

cell1, cell2, cell3, cell4, cell5, \
cell6, cell7, cell8 = (row.findAll("td") + [None]*[:
8]

-- Paul


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Beautiful Soup Question: Filtering Images based on their width and height attributes PicURLPy Python 3 12-04-2006 01:00 PM
Using Beautiful Soup to entangle bookmarks.html Francach Python 15 09-21-2006 08:56 PM
Using Beautiful Soup to entangle bookmarks.html Anthra Norell Python 0 09-07-2006 08:47 PM
Using Beautiful Soup Tempo Python 1 08-19-2006 01:11 AM
beautiful soup library question meyerkp@gmail.com Python 2 03-11-2006 04:28 AM



Advertisments