Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > extract data from xhtml

Reply
Thread Tools

extract data from xhtml

 
 
Damo_Suzuki
Guest
Posts: n/a
 
      12-07-2006
Hi,
I am in the process of extracting data from a html document. I used
Jtidy to convert it to XHTML . Now that I have the XHTML how can i
extract data from it. Say, I wanted to extract a node with the tag <h2
class ="r">.......</h2> , does anyone know or have sample code to
achieve this. I've been knocking my head off a brick wall for a few
days now trying to do this.
Thanks

 
Reply With Quote
 
 
 
 
Flo 'Irian' Schaetz
Guest
Posts: n/a
 
      12-07-2006
"Damo_Suzuki" <> schrieb

> I am in the process of extracting data from a html document. I used
> Jtidy to convert it to XHTML . Now that I have the XHTML how can i
> extract data from it.


As a valid XHTML document is well formed XML, you should be able to parse
it - either with a DOMParser or SAXParser. Searching for them in Google
should bring up enough examples how to use them.

Flo

 
Reply With Quote
 
 
 
 
Damo_Suzuki
Guest
Posts: n/a
 
      12-07-2006
Hi,
Now that its in XHTML can I use DocumentBuilder to extract data from it
.. I dont want to write the xhml to a file. my code looks like this :

tidy.parse(in, System.out);


DocumentBuilderFactory domFactory =
DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(XXXXXXXXXX);

In the parse method 'in' is the file i want to extract data from. Its
gotten straight off the web, "JTidied" and output to the console. Can
I somehow use this as the paramater where all the X's are for the
DocumentBuilder parse method?
Thanks

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How do i extract vidios when winrar wont extract them??? help plzzzzzzzz smuttdog@sc.rr.com Computer Support 2 12-23-2007 07:03 AM
How *extract* data from XHTML Transitional web pages? got xml.dom.minidom troubles.. seberino@spawar.navy.mil Python 4 03-03-2007 01:46 AM
extract table from xhtml file and java Damo_Suzuki Java 0 12-09-2006 04:44 PM
Should I Convert Site To XHTML or XHTML mobile? chronos3d HTML 9 12-05-2006 04:46 PM
parse URL (href) from xhtml, xhtml -> text, for data hawat.thufir@gmail.com XML 7 02-08-2006 07:39 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57