Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   Extracting text using Beautifulsoup (http://www.velocityreviews.com/forums/t702908-extracting-text-using-beautifulsoup.html)

TC 10-25-2009 07:13 PM

Extracting text using Beautifulsoup
 
Greetings all.

Working with data from 'http://www.finviz.com/quote.ashx?t=SRS', I was able
to get the info using re, however I thought using Beautifulsoup a more
elegant approach.
Having a bit of a problem though...

Trying to extract text:

SMA20 -1.77%
SMA50 -9.73%

utilizing attribute body in <td... body=[Distance from 20-Day Simple Moving
Average].... >

From:
-----------------------HTML
snippet------------------------------------------------------------
<td width="7%" class="snapshot-td2-cp" align="left"
title="cssbody=[tooltip_short_bdy] cssheader=[tooltip_short_hdr]
body=[Distance from 20-Day Simple Moving Average] offsetx=[10] offsety=[20]
delay=[300]">
SMA20
</td>
<td width="8%" class="snapshot-td2" align="left">
<b>
<span style="color:#aa0000;">
-1.77%
</span>
</b>
</td>
<td width="7%" class="snapshot-td2-cp" align="left"
title="cssbody=[tooltip_short_bdy] cssheader=[tooltip_short_hdr]
body=[Distance from 50-Day Simple Moving Average] offsetx=[10] offsety=[20]
delay=[300]">
SMA50
</td>
<td width="8%" class="snapshot-td2" align="left">
<b>
<span style="color:#aa0000;">
-9.73%
</span>
</b>
</td>
-----------------------HTML
snippet------------------------------------------------------------
Using:

import urllib
from BeautifulSoup import BeautifulSoup
archives_url = 'http://www.finviz.com/quote.ashx?t=SRS'
archives_html = urllib.urlopen(archives_url).read()
soup = BeautifulSoup(archives_html)
t = soup.findAll('table')
for table in t:
g.write(str(table.name) + '\r\n')
rows = table.findAll('tr')
for tr in rows:
g.write('\r\n\t')
cols = tr.findAll('td')
for td in cols:
ret = str(td.find(name='title'))
g.write('\t\t' + str(td) + '\r\n')
g.close()

Total failure of course.
Any ideas?
Thanks in advance...



All times are GMT. The time now is 09:35 PM.

Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57