![]() |
HTML parsing
I need to parse the following HTML page and extract TV listing data using VC++ http://tvlistings.zap2it.com/tvlistings/ZCGrid.do any good way to extract the data? is easy for VC++ to call PERL script and do some regular expression? since the HTML page is not XML well formed, I cannot use a XML parser right? any other good ways to extract HTML page data? |
Re: HTML parsing
worlman385@yahoo.com wrote:
: I need to parse the following HTML page and extract TV listing data : using VC++ : http://tvlistings.zap2it.com/tvlistings/ZCGrid.do : any good way to extract the data? : is easy for VC++ to call PERL script and do some regular expression? : since the HTML page is not XML well formed, I cannot use a XML parser : right? : any other good ways to extract HTML page data? Perl, HTML::Parser (my spelling is right but case may be wrong). #!perl use strict; use HTML::Parser; ... perl code, etc... As an aside, this is also an excellent tool for sax-like parsing of xml. It has an xml mode that expects properly balanced tags, and etc, and though it it doesn't handle all xml features, HTML::Parser comes with almost all distros of perl, which means that any a script that uses it can work with almost any installation of perl, even if you can't install anything additional (a real life saver in a controlled environment). |
Re: HTML parsing
worlman385@yahoo.com wrote:
> I need to parse the following HTML page and extract TV listing data > using VC++ > > http://tvlistings.zap2it.com/tvlistings/ZCGrid.do > > any good way to extract the data? > > is easy for VC++ to call PERL script and do some regular expression? > > since the HTML page is not XML well formed, I cannot use a XML parser > right? > > any other good ways to extract HTML page data? Pass the page through HTML Tidy, which produces well-formed XHTML. Then use XSLT to extract what you need. ///Peter -- XML FAQ: http://xml.silmaril.ie/ |
| All times are GMT. The time now is 09:59 AM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.