Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Convert HTML to XML

Reply
Thread Tools

Convert HTML to XML

 
 
Ninja Li
Guest
Posts: n/a
 
      11-16-2009
Hi,

I tried to parse a HTML page using HTML::TreeBuilder but it is a
little cumbersome. Is there an easier way to parse HTML, say from HTML
to XML? Which perl package and methods should I use?

Thanks in advance.

Nick
 
Reply With Quote
 
 
 
 
Ninja Li
Guest
Posts: n/a
 
      11-16-2009
On Nov 16, 11:43*am, Ben Morrow <(E-Mail Removed)> wrote:
>
> It's not clear what you're trying to do once you've parsed it, but if
> you want an XML DOMish interface then XML::LibXML will quite happily
> parse HTML.
>
> Ben


I tried to filter HTML to get the the earnings data, e.g. symbol,
company, event, time data (link: http://www.earnings.com/conferencecall.asp?client=cb
) and put them in a text file.
 
Reply With Quote
 
 
 
 
Ninja Li
Guest
Posts: n/a
 
      11-16-2009
On Nov 16, 2:04*pm, Lawrence Statton <(E-Mail Removed)> wrote:
> Ninja Li <(E-Mail Removed)> writes:
>
> HTML::TreeBuilder really is the "right" tool for parsing HTML you get
> from the web. One of it's major strengths is it can generate reasonable
> parse-trees from even unreasonable HTML.
>
> Keep in mind that scraping earnings.com's website may be in violation of
> their terms of use, and you should make sure you have appropriate
> permission before doing that in an automated way.
>
> --L


Thanks for your help and concern. We are a client of the website and
are trying to move for Excel-based program to perl.
 
Reply With Quote
 
sln@netherlands.com
Guest
Posts: n/a
 
      11-18-2009
On Mon, 16 Nov 2009 13:40:49 -0800 (PST), Ninja Li <(E-Mail Removed)> wrote:

>On Nov 16, 2:04*pm, Lawrence Statton <(E-Mail Removed)> wrote:
>> Ninja Li <(E-Mail Removed)> writes:
>>
>> HTML::TreeBuilder really is the "right" tool for parsing HTML you get
>> from the web. One of it's major strengths is it can generate reasonable
>> parse-trees from even unreasonable HTML.
>>
>> Keep in mind that scraping earnings.com's website may be in violation of
>> their terms of use, and you should make sure you have appropriate
>> permission before doing that in an automated way.
>>
>> --L

>
>Thanks for your help and concern. We are a client of the website and
>are trying to move for Excel-based program to perl.


I looked at the source to the page link you provided.
I hope thats not in violation and the Feds are gonna come get me.

I wouldn't call it scraping would you? I'd guess Yaaahooei/Googleballs
own the web cause they do it all the time.

I've heard there is some kind of Perl module that will turn table data
into some kind of hash for you. I have personal software (written by me)
that sucks table data out of html/xml like buttaa. Unfortunately you can't
get it.

Look for that module on cpan or somewhere.

-sln
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Convert HTML String to HTML Document And Save csgraham74 ASP .Net 2 09-19-2006 08:07 AM
Different results parsing a XML file with XML::Simple (XML::Sax vs. XML::Parser) Erik Wasser Perl Misc 5 03-05-2006 10:09 PM
RE: Convert HTML to XML or Paser HTML Steven Cheng[MSFT] ASP .Net 3 02-12-2004 07:15 PM
Re: Convert HTML to XML or Paser HTML Q.Z ASP .Net 0 01-13-2004 04:20 PM
Re: Convert HTML to XML or Paser HTML Joerg Jooss ASP .Net 0 01-11-2004 12:23 AM



Advertisments