Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > need start point for getting html info from web

Reply
Thread Tools

need start point for getting html info from web

 
 
nephish@xit.net
Guest
Posts: n/a
 
      10-31-2005
hey there,

i have a small app that i am going to need to get information from a
few tables on different websites. i have looked at urllib and httplib.
the sites i need to get data from mostly have this data in tables. So
that, i think would make it easier. Anyone suggest a good starting
point for me to find out how to do this, or know of a link to a good
how-to?
thanks,
sk

 
Reply With Quote
 
 
 
 
Mike Meyer
Guest
Posts: n/a
 
      10-31-2005
http://www.velocityreviews.com/forums/(E-Mail Removed) writes:
> i have a small app that i am going to need to get information from a
> few tables on different websites. i have looked at urllib and httplib.
> the sites i need to get data from mostly have this data in tables. So
> that, i think would make it easier. Anyone suggest a good starting
> point for me to find out how to do this, or know of a link to a good
> how-to?


Don't have a link to a howto. But you're halfway there. urllib (and
urllib2) will get HTML text from the websites. Pulling data from it
sort of depends on the nature of the HTML. If it's well-structured
XHTML, you can use your favorite xml library. if it's well structured
HTML, you can try htmllib, but it's pretty primitive. If it's not
well-structured, you can use BeautifulSoup. I've used it to pull data
from tables. The problem with any of this is that your code really
depends on the structure - or lack thereof - of the HTML you're
scraping. If they change it, your code breaks.

<mike
--
Mike Meyer <(E-Mail Removed)> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
 
Reply With Quote
 
 
 
 
nephish@xit.net
Guest
Posts: n/a
 
      10-31-2005
yeah, i know i am going to have to write a bunch of stuff because the
values i want to get come from several different sites. ah-well, just
wanting to know the easiest way to learn how to get started. i will
check into beautiful soup, i think i have heard it referred to before.
thanks
shawn

 
Reply With Quote
 
Paul McGuire
Guest
Posts: n/a
 
      10-31-2005
<(E-Mail Removed)> wrote in message
news:(E-Mail Removed) ups.com...
> hey there,
>
> i have a small app that i am going to need to get information from a
> few tables on different websites. i have looked at urllib and httplib.
> the sites i need to get data from mostly have this data in tables. So
> that, i think would make it easier. Anyone suggest a good starting
> point for me to find out how to do this, or know of a link to a good
> how-to?
> thanks,
> sk
>

pyparsing comes with a simple HTML scraper example for extracting the NIST
NTP servers from an HTML table. pyparsing is also fairly tolerant of
"unclean" HTML. Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul


 
Reply With Quote
 
alex_f_il@hotmail.com
Guest
Posts: n/a
 
      10-31-2005
You can easily do it with SW Explorer Automation
(http://home.comcast.net/~furmana/SWIEAutomation.htm).
The program creates an automation API for any Web application which
uses HTML and DHTML and works with Microsoft Internet Explorer. The Web
application becomes programmatically accessible from any .NET language.


The tool has Visual Table Data Extractor. It allows visually define the
table structure. The table becomes accessible from the code as
DataTable class. You can develop the extraction script in hours with
the tool.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Share-Point-2010 ,Share-Point -2010 Training , Share-point-2010Hyderabad , Share-point-2010 Institute Saraswati lakki ASP .Net 0 01-06-2012 06:39 AM
I need help with point-to-point T1 and ip route statement bk2007 Cisco 0 08-30-2007 03:02 AM
Web Server via Point-to-Point cyberjanitor1 Cisco 0 03-27-2007 04:28 PM
Scenario 5: IS-IS routing on Frame Relay Multi-point and Point-to-Point David Sudjiman Cisco 0 06-08-2006 09:11 AM
need a start point for wsdl nephish@xit.net Python 3 03-07-2006 05:18 AM



Advertisments