Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Parsing html by XML::libXML

Reply
Thread Tools

Parsing html by XML::libXML

 
 
John7481
Guest
Posts: n/a
 
      08-12-2004
Hello everybody,

A database project is targeted to use a perl script to parse the html
file and picking few items from html file, it will insert those items
into database.

Could somebody explain their ideas or real experiences to do such
parsing job using libXML?

Thanks in advance
AR
 
Reply With Quote
 
 
 
 
Abhinav
Guest
Posts: n/a
 
      08-12-2004
John7481 wrote:
> Hello everybody,
>
> A database project is targeted to use a perl script to parse the html
> file and picking few items from html file, it will insert those items
> into database.


Assuming that your HTML is XML compliant (ei.e. XHTML), you could try using
XPath. It does a great job of finding specific information, and /should/
be installed with your Perl 5.8 system.

There is an introductory tutorial on http://w3schools.org

[SNIP]

HTH

--

Abhinav
 
Reply With Quote
 
 
 
 
ko
Guest
Posts: n/a
 
      08-13-2004
John7481 wrote:
> Hello everybody,
>
> A database project is targeted to use a perl script to parse the html
> file and picking few items from html file, it will insert those items
> into database.
>
> Could somebody explain their ideas or real experiences to do such
> parsing job using libXML?
>
> Thanks in advance
> AR


These articles should help get you started:

http://www.stonehenge.com/merlyn/PerlJournal/col02.html
http://www.stonehenge.com/merlyn/PerlJournal/col03.html

The articles are titled 'Cleaning up your HTML', but its the same
concept, identifying tags/attributes.

After you initialize the parser object, call its recover() method if
you're not sure whether you're dealing with well-formed HTML.

HTH - keith
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Parsing HTML with HTML::Tree Ninja Li Perl Misc 1 03-01-2010 03:37 PM
Parsing HTML with HTML::TableExtract Ninja Li Perl Misc 2 11-28-2009 12:43 AM
Parsing HTML - using HTML::TreeBuilder olson_ord@yahoo.it Perl Misc 7 10-06-2006 06:33 PM
SAX Parsing - Weird results when parsing content between tags. Naren XML 0 05-11-2004 07:25 PM
Perl expression for parsing CSV (ignoring parsing commas when in double quotes) GIMME Perl 2 02-11-2004 05:40 PM



Advertisments