Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   Extracting text using Beautifulsoup (http://www.velocityreviews.com/forums/t702908-extracting-text-using-beautifulsoup.html)

TC 10-25-2009 07:13 PM

Extracting text using Beautifulsoup
 
Greetings all.

Working with data from 'http://www.finviz.com/quote.ashx?t=SRS', I was able
to get the info using re, however I thought using Beautifulsoup a more
elegant approach.
Having a bit of a problem though...

Trying to extract text:

SMA20 -1.77%
SMA50 -9.73%

utilizing attribute body in <td... body=[Distance from 20-Day Simple Moving
Average].... >

From:
-----------------------HTML
snippet------------------------------------------------------------
<td width="7%" class="snapshot-td2-cp" align="left"
title="cssbody=[tooltip_short_bdy] cssheader=[tooltip_short_hdr]
body=[Distance from 20-Day Simple Moving Average] offsetx=[10] offsety=[20]
delay=[300]">
SMA20
</td>
<td width="8%" class="snapshot-td2" align="left">
<b>
<span style="color:#aa0000;">
-1.77%
</span>
</b>
</td>
<td width="7%" class="snapshot-td2-cp" align="left"
title="cssbody=[tooltip_short_bdy] cssheader=[tooltip_short_hdr]
body=[Distance from 50-Day Simple Moving Average] offsetx=[10] offsety=[20]
delay=[300]">
SMA50
</td>
<td width="8%" class="snapshot-td2" align="left">
<b>
<span style="color:#aa0000;">
-9.73%
</span>
</b>
</td>
-----------------------HTML
snippet------------------------------------------------------------
Using:

import urllib
from BeautifulSoup import BeautifulSoup
archives_url = 'http://www.finviz.com/quote.ashx?t=SRS'
archives_html = urllib.urlopen(archives_url).read()
soup = BeautifulSoup(archives_html)
t = soup.findAll('table')
for table in t:
g.write(str(table.name) + '\r\n')
rows = table.findAll('tr')
for tr in rows:
g.write('\r\n\t')
cols = tr.findAll('td')
for td in cols:
ret = str(td.find(name='title'))
g.write('\t\t' + str(td) + '\r\n')
g.close()

Total failure of course.
Any ideas?
Thanks in advance...



All times are GMT. The time now is 10:39 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.