Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Extracting text using Beautifulsoup

Reply
Thread Tools

Extracting text using Beautifulsoup

 
 
TC
Guest
Posts: n/a
 
      10-25-2009
Greetings all.

Working with data from 'http://www.finviz.com/quote.ashx?t=SRS', I was able
to get the info using re, however I thought using Beautifulsoup a more
elegant approach.
Having a bit of a problem though...

Trying to extract text:

SMA20 -1.77%
SMA50 -9.73%

utilizing attribute body in <td... body=[Distance from 20-Day Simple Moving
Average].... >

From:
-----------------------HTML
snippet------------------------------------------------------------
<td width="7%" class="snapshot-td2-cp" align="left"
title="cssbody=[tooltip_short_bdy] cssheader=[tooltip_short_hdr]
body=[Distance from 20-Day Simple Moving Average] offsetx=[10] offsety=[20]
delay=[300]">
SMA20
</td>
<td width="8%" class="snapshot-td2" align="left">
<b>
<span style="color:#aa0000;">
-1.77%
</span>
</b>
</td>
<td width="7%" class="snapshot-td2-cp" align="left"
title="cssbody=[tooltip_short_bdy] cssheader=[tooltip_short_hdr]
body=[Distance from 50-Day Simple Moving Average] offsetx=[10] offsety=[20]
delay=[300]">
SMA50
</td>
<td width="8%" class="snapshot-td2" align="left">
<b>
<span style="color:#aa0000;">
-9.73%
</span>
</b>
</td>
-----------------------HTML
snippet------------------------------------------------------------
Using:

import urllib
from BeautifulSoup import BeautifulSoup
archives_url = 'http://www.finviz.com/quote.ashx?t=SRS'
archives_html = urllib.urlopen(archives_url).read()
soup = BeautifulSoup(archives_html)
t = soup.findAll('table')
for table in t:
g.write(str(table.name) + '\r\n')
rows = table.findAll('tr')
for tr in rows:
g.write('\r\n\t')
cols = tr.findAll('td')
for td in cols:
ret = str(td.find(name='title'))
g.write('\t\t' + str(td) + '\r\n')
g.close()

Total failure of course.
Any ideas?
Thanks in advance...

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with using findAll() in BeautifulSoup Alexnb Python 2 07-12-2008 12:07 PM
Extracting text from a Webpage using BeautifulSoup Magnus.Moraberg@gmail.com Python 3 05-28-2008 12:26 AM
extracting text from files using IFilters kunal ASP .Net 0 10-15-2005 11:09 AM
extracting text from files using IFilters kunal ASP .Net 0 10-15-2005 08:18 AM
HTML purifier using BeautifulSoup? Dan Stromberg Python 1 01-07-2005 06:10 PM



Advertisments