Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Javascript > Reading an HTML document & extracting content

Reply
Thread Tools

Reading an HTML document & extracting content

 
 
Cognizance
Guest
Posts: n/a
 
      05-23-2005
Hi gang,

I'm an ASP developer by trade, but I've had to create client side
scripts with JavaScript many times in the past. Simple things, like
validating form elements and such.

Now I've been assigned the task of extracting content from a given HTML
page. If anyone's familiar with the Yahoo! Store order confirmation
screen, I need to be able to grab the total amount from the table to
the right-hand side. (Sample File:
http://www.2beyourself.com/t/sample.html)

If you view the source, this is in a table and enclosed with ugly html.
the value I want to retrieve is wrapped with b tags. Originally I was
thinking of using innerHTML or innerText for extracting the value. But
I find that we cannot gain control of this piece of the Yahoo! Store to
make it work!

So after talking with peers, we thought of reading in the entire HTML
page and using regular expressions to try and extract the value.
Something along the lines of: '\<b\>[0-9]+\.[0-9]{2}\<\/b\/>'

I'm not sure how to accomplish this. Could someone please point me in
the right direction? If this solution is even a good one. If you have
something better, I'm all ears! (eyes) If using the regular expression
would be a good solution, I need to find out how to read in the entire
HTML doc, and then parse out that piece.

Any tips and suggestions will be appreciate greatly!!

And I hope your week is starting off right. ^^

 
Reply With Quote
 
 
 
 
McKirahan
Guest
Posts: n/a
 
      05-23-2005
"Cognizance" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) ups.com...
> Hi gang,
>
> I'm an ASP developer by trade, but I've had to create client side
> scripts with JavaScript many times in the past. Simple things, like
> validating form elements and such.
>
> Now I've been assigned the task of extracting content from a given HTML
> page. If anyone's familiar with the Yahoo! Store order confirmation
> screen, I need to be able to grab the total amount from the table to
> the right-hand side. (Sample File:
> http://www.2beyourself.com/t/sample.html)
>
> If you view the source, this is in a table and enclosed with ugly html.
> the value I want to retrieve is wrapped with b tags. Originally I was
> thinking of using innerHTML or innerText for extracting the value. But
> I find that we cannot gain control of this piece of the Yahoo! Store to
> make it work!
>
> So after talking with peers, we thought of reading in the entire HTML
> page and using regular expressions to try and extract the value.
> Something along the lines of: '\<b\>[0-9]+\.[0-9]{2}\<\/b\/>'
>
> I'm not sure how to accomplish this. Could someone please point me in
> the right direction? If this solution is even a good one. If you have
> something better, I'm all ears! (eyes) If using the regular expression
> would be a good solution, I need to find out how to read in the entire
> HTML doc, and then parse out that piece.
>
> Any tips and suggestions will be appreciate greatly!!
>
> And I hope your week is starting off right. ^^
>


RegEx would be better but this works:

<html>
<head>
<title>Total.htm</title>
<script type="text/javascript">
function total() {
var sURL = "http://www.2beyourself.com/t/sample.html";
var oXML = new ActiveXObject("Microsoft.XMLHTTP");
oXML.Open("GET",sURL,false);
oXML.send();
try {
var sXML = oXML.ResponseText;
// Find Total's label
var iTAG = sXML.indexOf("<b>Total:</b>");
var sVAL = sXML.substr(iTAG);
// Find Total's decimal
var iDOT = sVAL.indexOf(".");
sVAL = sVAL.substr(0,iDOT+3);
// Find Total's start
iTAG = sVAL.lastIndexOf(">")
sVAL = sVAL.substr(iTAG+1)
// Show Total's value
alert(sVAL);
} catch(e) {
alert(sURL + " not found!");
}
}
</script>
</head>
<body onload="total()">
</body>
</html>



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
error: Only Content controls are allowed directly in a content page that contains Content controls. hazz ASP .Net 6 06-09-2010 01:54 PM
document.getElementById(), error in content page content page Dave L ASP .Net 3 03-04-2010 08:50 AM
Extracting HTML Content masterGaurav Perl Misc 19 05-06-2006 01:21 AM
translate a HTML document into a XHTML document mike Java 3 01-24-2005 09:42 AM
A HTML document can be converted to XHTML document. mike Java 2 01-14-2005 06:00 AM



Advertisments