Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > HTML parsing

Reply
Thread Tools

HTML parsing

 
 
worlman385@yahoo.com
Guest
Posts: n/a
 
      03-16-2008

I need to parse the following HTML page and extract TV listing data
using VC++

http://tvlistings.zap2it.com/tvlistings/ZCGrid.do

any good way to extract the data?

is easy for VC++ to call PERL script and do some regular expression?

since the HTML page is not XML well formed, I cannot use a XML parser
right?

any other good ways to extract HTML page data?
 
Reply With Quote
 
 
 
 
Malcolm Dew-Jones
Guest
Posts: n/a
 
      03-16-2008
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:

: I need to parse the following HTML page and extract TV listing data
: using VC++

: http://tvlistings.zap2it.com/tvlistings/ZCGrid.do

: any good way to extract the data?

: is easy for VC++ to call PERL script and do some regular expression?

: since the HTML page is not XML well formed, I cannot use a XML parser
: right?

: any other good ways to extract HTML page data?

Perl, HTML:arser (my spelling is right but case may be wrong).

#!perl
use strict;
use HTML:arser;
... perl code, etc...

As an aside, this is also an excellent tool for sax-like parsing of xml.
It has an xml mode that expects properly balanced tags, and etc, and
though it it doesn't handle all xml features, HTML:arser comes with
almost all distros of perl, which means that any a script that uses it can
work with almost any installation of perl, even if you can't install
anything additional (a real life saver in a controlled environment).

 
Reply With Quote
 
 
 
 
Peter Flynn
Guest
Posts: n/a
 
      03-16-2008
(E-Mail Removed) wrote:
> I need to parse the following HTML page and extract TV listing data
> using VC++
>
> http://tvlistings.zap2it.com/tvlistings/ZCGrid.do
>
> any good way to extract the data?
>
> is easy for VC++ to call PERL script and do some regular expression?
>
> since the HTML page is not XML well formed, I cannot use a XML parser
> right?
>
> any other good ways to extract HTML page data?


Pass the page through HTML Tidy, which produces well-formed XHTML.
Then use XSLT to extract what you need.

///Peter
--
XML FAQ: http://xml.silmaril.ie/
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Parsing HTML with HTML::Tree Ninja Li Perl Misc 1 03-01-2010 03:37 PM
Parsing HTML with HTML::TableExtract Ninja Li Perl Misc 2 11-28-2009 12:43 AM
Parsing HTML - using HTML::TreeBuilder olson_ord@yahoo.it Perl Misc 7 10-06-2006 06:33 PM
SAX Parsing - Weird results when parsing content between tags. Naren XML 0 05-11-2004 07:25 PM
Perl expression for parsing CSV (ignoring parsing commas when in double quotes) GIMME Perl 2 02-11-2004 05:40 PM



Advertisments