Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Javascript > Extracting text (cross platform)

Reply
Thread Tools

Extracting text (cross platform)

 
 
Debbie
Guest
Posts: n/a
 
      07-16-2006
Is there a standard way to extract text from a web page, without using
innertext/innerhtml?

It's an academic exercise, and we've been advised that we can't use
Internet Explorer DOM extensions that are not part of the W3C DOM.

Thanks,

Debbie

 
Reply With Quote
 
 
 
 
Martin Honnen
Guest
Posts: n/a
 
      07-16-2006


Debbie wrote:
> Is there a standard way to extract text from a web page, without using
> innertext/innerhtml?
>
> It's an academic exercise, and we've been advised that we can't use
> Internet Explorer DOM extensions that are not part of the W3C DOM.


Well then use the W3C DOM, text will sit in text nodes as leaf nodes of
the DOM tree and each text node has a property named nodeValue that will
give you the text in the text node. You could also use the data property
for that.
If you want the text in an element then you will either have to go
through the child nodes and concatenate the text of the child nodes
(where you might have to recursively go down the tree until you have the
text nodes) or depending on your needs and requirements you can use the
W3C DOM Level 3 property named textContent which Mozilla has been
supporting for quite some time and which at least Opera supports too now.
Then there is the W3C DOM Level 2 Range API that also allows you to get
the text in a range so you could position the range on an element node
and call toString on the range e.g.
var range = document.createRange();
range.selectNodeContents(someNode);
var text = range.toString();
Mozilla and Opera 8 and later support the Range API.

--

Martin Honnen
http://JavaScript.FAQTs.com/
 
Reply With Quote
 
 
 
 
Debbie
Guest
Posts: n/a
 
      07-16-2006
Thank you, Martin, that does just what I was looking for.

Regards,

Debbie

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Extracting text from a Word document via StreamReader - track chan =?Utf-8?B?S2V2aW4gSw==?= ASP .Net 2 04-05-2006 11:07 PM
extracting text from files using IFilters kunal ASP .Net 0 10-15-2005 11:09 AM
extracting text from files using IFilters kunal ASP .Net 0 10-15-2005 08:18 AM
Extracting CDATA Text without CDATA Tags??? John Davison Java 1 07-06-2004 11:00 PM
extracting unique strings from text file Bubbles ASP .Net 0 03-03-2004 06:55 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57