Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Javascript > Extracting Data from IE

Reply
Thread Tools

Extracting Data from IE

 
 
chris_j_adams@hotmail.com
Guest
Posts: n/a
 
      10-30-2006
Hi,

I'm slowly discovering the world of JavaScript, so I'm not sure I'm
attacking this problem in the right manner, thus if I'm in the wrong
newsgroup, my apologies.

What I'm trying to do is extract some news items from a web site. To
do this, I'm using Microsoft Word VBA and using the following bit of
script:

'// Open web site
IeApp.Navigate
"http://www.radioaustralia.net.au/francais/stories/s1776501.htm"
Do: Loop Until IeApp.ReadyState = READYSTATE_COMPLETE

'// Find text to extract
txtTitle = IeApp.Document.GetElementByID("a2title").innerhtml
txt = IeApp.Document.GetElementByID("a2copy").innerhtml

When extracting the text (ie. "txt") I seem to get more than just the
text of the body that I'm after, and the resulting junk is difficult to
remove. I've looked at the object model but not real sure what I
should be looking for, so wondering if anyone here can spare a bit of
time to provide a pointer. For example, is there a tag that would more
easily refer to the required text?

Many thanks in advance if you can share some advice or guidance.
Regards,
Chris Adams

 
Reply With Quote
 
 
 
 
Martin Honnen
Guest
Posts: n/a
 
      10-30-2006
wrote:

> I'm slowly discovering the world of JavaScript, so I'm not sure I'm
> attacking this problem in the right manner, thus if I'm in the wrong
> newsgroup, my apologies.
>
> What I'm trying to do is extract some news items from a web site. To
> do this, I'm using Microsoft Word VBA and using the following bit of
> script:
>
> '// Open web site
> IeApp.Navigate
> "http://www.radioaustralia.net.au/francais/stories/s1776501.htm"
> Do: Loop Until IeApp.ReadyState = READYSTATE_COMPLETE
>
> '// Find text to extract
> txtTitle = IeApp.Document.GetElementByID("a2title").innerhtml
> txt = IeApp.Document.GetElementByID("a2copy").innerhtml
>
> When extracting the text (ie. "txt") I seem to get more than just the
> text of the body that I'm after, and the resulting junk is difficult to
> remove.


So you are not using JavaScript at all but you are automating Internet
Explorer with VBA. The IE object model for HTML documents is documented
here:
<http://msdn.microsoft.com/library/default.asp?url=/workshop/author/dhtml/reference/dhtml_reference_entry.asp>

You might be after the |innerText| property instead of the |innerHTML|
property of element objects. Or you might want to look at specific child
or descendant nodes of an element you have found with getElementById.

For instance
IeApp.Document.getElementById("a2copy")
gives you a div element object which then has other nodes (e.g. table
element) as child nodes. Once you have an element node you can access
its |firstChild|, |lastChild|, |childNodes| collection, you can call
|getElementsByTagName| on the element to find descendant elements of a
certain tag name.

--

Martin Honnen
http://JavaScript.FAQTs.com/
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Extracting data from a DataSet =?Utf-8?B?Z2xlbm4=?= ASP .Net 0 03-21-2006 06:39 PM
Extracting Data from Forms Roedy Green Java 2 08-25-2005 11:41 PM
Extracting Tables and columns with data types from a database RSH ASP .Net 1 06-02-2005 02:43 PM
Extracting text data from MS Word document Max Java 6 09-16-2004 11:01 PM
Extracting ASP.NET data in another Class walterd ASP .Net 1 04-28-2004 01:32 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57