Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Javascript > count occurance of a word/string in the body of an HTML page

Reply
Thread Tools

count occurance of a word/string in the body of an HTML page

 
 
Question Boy
Guest
Posts: n/a
 
      08-27-2009
I'm trying to find an easy way to count how many time a given word
appear on a webpage. For instance, I would like to be able to count
the number of occurance of the word 'Accepted', how would I go about
this?

Thank you,

QB
 
Reply With Quote
 
 
 
 
Thomas 'PointedEars' Lahn
Guest
Posts: n/a
 
      08-27-2009
Question Boy wrote:
> I'm trying to find an easy way to count how many time a given word
> appear on a webpage. For instance, I would like to be able to count
> the number of occurance of the word 'Accepted', how would I go about
> this?


You would read the FAQ of this newsgroup and find both the `textContent' or
`innerText' properties, and the properties and methods of String and RegExp
objects, described in the documentation referred to there.

<http://jibbering.com/faq/#posting>


PointedEars
--
Danny Goodman's books are out of date and teach practices that are
positively harmful for cross-browser scripting.
-- Richard Cornford, cljs, <cife6q$253$1$(E-Mail Removed)> (2004)
 
Reply With Quote
 
 
 
 
SAM
Guest
Posts: n/a
 
      08-27-2009
Le 8/27/09 8:16 PM, Question Boy a écrit :
> I'm trying to find an easy way to count how many time a given word
> appear on a webpage. For instance, I would like to be able to count
> the number of occurance of the word 'Accepted', how would I go about
> this?
>
> Thank you,
>
> QB


<script type="text/javascript">

function counter(w) {
var t = document.body.innerHTML;
var r = new RegExp ( w+'(?=[\\s.,;—)"”\\'-]+)', 'gi');
var count = t.match(r).length;
alert(count + ' strings "'+w+'"');
}

</script>
</head>
<body>
<p>Enter the word to count : <input id="word"> then
<a href="javascript:counter(document.getElementById(' word').value)">
click me</a></p>
<p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Morbi a
wisi. Mauris vulputate rutrum arcu. Sed varius. Vestibulum ante ipsum
primis in faucibus orci luctus et ultrices posuere cubilia Curae; In
dui. Aenean et turpis. Duis a sapien hendrerit turpis tempor feugiat.
Nulla facilisi. Praesent in mauris et ipsum aliquam commodo. Aenean ac
nunc. In sit amet elit. Morbi diam. Quisque sodales eleifend urna.
Aliquam suscipit velit in nunc. </p>
<p>Vestibulum id magna. Nulla ante pede, sodales non, scelerisque vel,
condimentum at, leo. Vestibulum diam. Pellentesque habitant morbi
tristique senectus et netus et malesuada fames ac turpis egestas. Nam
ullamcorper, wisi vitae aliquet aliquam, dolor arcu cursus magna, non
tincidunt nibh nibh vel sapien. Nulla feugiat elit eget urna. Nullam a
metus. Donec tempus sapien eu orci. Sed pulvinar, nunc in luctus
convallis, lacus ante gravida felis, ac sollicitudin turpis nulla
viverra justo. Fusce nunc dui, porta lacinia, tristique et, suscipit
vestibulum, lectus. Nunc fringilla sapien. Proin sed leo at velit
tincidunt sagittis. Nam mollis tincidunt mauris. Aliquam ipsum nulla,
rutrum id, pulvinar sit amet, pellentesque at, neque. </p>
<p>Curabitur ante. Praesent sit amet nibh facilisis est commodo
pulvinar. Duis auctor. Ut commodo volutpat massa. Aenean nec erat eget
erat adipiscing imperdiet. Curabitur ipsum. Quisque sem lacus, fermentum
ut, suscipit non, pulvinar pretium, wisi. Integer libero mauris,
ultricies vel, mattis at, luctus id, ipsum. Vestibulum porttitor, mi sit
amet vehicula bibendum, wisi sapien egestas purus, sit amet feugiat
dolor diam non diam. Sed quis nisl in nisl nonummy hendrerit. Sed ipsum
lorem, commodo congue, interdum sed, pretium at, nulla. Nulla facilisi.
Curabitur ipsum. Cras aliquam libero vel tellus. </p>
</body>


--
sm
 
Reply With Quote
 
Lasse Reichstein Nielsen
Guest
Posts: n/a
 
      08-28-2009
SAM <(E-Mail Removed)> writes:

> Le 8/27/09 8:16 PM, Question Boy a écrit :
>> I'm trying to find an easy way to count how many time a given word
>> appear on a webpage. For instance, I would like to be able to count
>> the number of occurance of the word 'Accepted', how would I go about
>> this?
>> Thank you,
>> QB

>
> <script type="text/javascript">
>
> function counter(w) {
> var t = document.body.innerHTML;
> var r = new RegExp ( w+'(?=[\\s.,;—)"”\\'-]+)', 'gi');


Using regexps is generally a good idea when working with strings.

I'm not sure exactly what this regexp is trying to match, but it
seems like "the word followed by some non-word character".
It still matches any other word that the word is a suffix of,
e.g., counting the word "to", you would still get a count from
"tomato".

Much more direct to search for RegExp("\\b"+w+"\\b").
Possibly test that "w" contains only word characters.

> var count = t.match(r).length;
> alert(count + ' strings "'+w+'"');
> }



/L
--
Lasse Reichstein Holst Nielsen
'Javascript frameworks is a disruptive technology'

 
Reply With Quote
 
SAM
Guest
Posts: n/a
 
      08-28-2009
Le 8/28/09 7:02 AM, Lasse Reichstein Nielsen a écrit :
> SAM <(E-Mail Removed)> writes:
>
>> Le 8/27/09 8:16 PM, Question Boy a écrit :
>>> I'm trying to find an easy way to count how many time a given word
>>> appear on a webpage. For instance, I would like to be able to count
>>> the number of occurance of the word 'Accepted', how would I go about
>>> this?
>>> Thank you,
>>> QB

>> <script type="text/javascript">
>>
>> function counter(w) {
>> var t = document.body.innerHTML;
>> var r = new RegExp ( w+'(?=[\\s.,;—)"”\\'-]+)', 'gi');

>
> Using regexps is generally a good idea when working with strings.
>
> I'm not sure exactly what this regexp is trying to match, but it
> seems like "the word followed by some non-word character".
> It still matches any other word that the word is a suffix of,
> e.g., counting the word "to", you would still get a count from
> "tomato".


I tested with 'ac' on the previous proposed demo and it did seem to
count only the words 'ac'

> Much more direct to search for RegExp("\\b"+w+"\\b").
> Possibly test that "w" contains only word characters.


No because \b consideres that é è à ù etc (non ASCI characters) are
frontiers of a word
Even if it could be very rare that a french word finish with 2 'é' or
that a word could be find with and without an 'é' at the end, what about
other languages ?

Anyway, your RegExp seems to do not catch the word 'à' :
<http://cjoint.com/data/iCmshTUkPm_cpte_un_mot_fr.htm>

>> var count = t.match(r).length;
>> alert(count + ' strings "'+w+'"');
>> }



--
sm
 
Reply With Quote
 
Question Boy
Guest
Posts: n/a
 
      08-28-2009
On Aug 27, 2:59*pm, Thomas 'PointedEars' Lahn <(E-Mail Removed)>
wrote:
> Question Boy wrote:
> > I'm trying to find an easy way to count how many time a given word
> > appear on a webpage. *For instance, I would like to be able to count
> > the number of occurance of the word 'Accepted', how would I go about
> > this?

>
> You would read the FAQ of this newsgroup and find both the `textContent' or
> `innerText' properties, and the properties and methods of String and RegExp
> objects, described in the documentation referred to there.
>
> <http://jibbering.com/faq/#posting>
>
> PointedEars
> --
> Danny Goodman's books are out of date and teach practices that are
> positively harmful for cross-browser scripting.
> *-- Richard Cornford, cljs, <cife6q$253$1$(E-Mail Removed)> (2004)





Thank you for the link! I will take a serious look at it over the
course of the coming days.
 
Reply With Quote
 
Dr J R Stockton
Guest
Posts: n/a
 
      08-28-2009
In comp.lang.javascript message <aec1b339-3206-4aa8-b374-7943f02aee3f@c2
9g2000yqd.googlegroups.com>, Thu, 27 Aug 2009 11:16:27, Question Boy
<(E-Mail Removed)> posted:
>I'm trying to find an easy way to count how many time a given word
>appear on a webpage. For instance, I would like to be able to count
>the number of occurance of the word 'Accepted', how would I go about
>this?


No, occurrences.

If the Web page is not yours, you can take a copy of the source and work
on that, so one can assume source to be available. However,
straightforwardly counting words in the source is not going to give,
reliably, the right answer. The word may appear in comment, or within
HTML tags, or in JavaScript or VBScript; and code may write it
conditionally or repeatedly. The word may be in an undisplayed or
hidden part of the page. The word may be generated by included script,
and not be in the source at all. The word may be computed - consider
what document.write( ['mk'+'op', '\x44um'].reverse().join("")+"f" )
might give.

You wrote "appear on a webpage". Display the web page, use Select All
and Copy; then paste it into something which can count words. I think
MS Word can do it; alternatively, you can paste it into a textarea and
match its value property with a well-chosen RegExp. See in my
<URL:http://www.merlyn.demon.co.uk/js-valid.htm>.

You will need to be very careful to see that you implement an
appropriate definition of a word. Will, for example, the word "Accep-
ted" be found? If looking for "paw", should it be found in "cat's-paw"?

Given what you wrote above, should you also be looking for alternative
spellings?

--
(c) John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v6.05 MIME.
Web <URL:http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
Proper <= 4-line sig. separator as above, a line exactly "-- " (SonOfRFC1036)
Do not Mail News to me. Before a reply, quote with ">" or "> " (SonOfRFC1036)
 
Reply With Quote
 
Pherdnut
Guest
Posts: n/a
 
      08-29-2009
On Aug 27, 1:16*pm, Question Boy <(E-Mail Removed)> wrote:
> I'm trying to find an easy way to count how many time a given word
> appear on a webpage. *For instance, I would like to be able to count
> the number of occurance of the word 'Accepted', how would I go about
> this?
>
> Thank you,
>
> QB


RegEx is kind of a big gun for this problem. General rule of thumb: If
you don't need logic or loops, stick to plain-vanilla string methods.
Learn RegEx though. It's very powerful. It's just not typically as
efficient as regular string methods for simple problems. The second
you start hauling out a bunch of conditions and nested for loops
though, is usually when you're better off with RegEx.

The string split function is handy if you just need the number of
occurrences. Probably much faster than a loop or RegEx specific
method. Here would be my approach to your problem.

var splitBySearchWord = (document.body.textContent).split('Accepted');
alert(splitBySearchWord.length--);

That just split all the text in the body tags into everything that's
between occurrences of 'Accepted'. Length of the array will be # of
occurences + 1 since there will be one before every occurrence and one
bonus string in the array after the last occurrence.

If you think I just did your homework for you, you might want to test
in IE first. I recommend quirksmode.org if you start to get frustrated
with this or any other Microsoft-being-run-by-a-pack-of-gits-related
problems in the future.

 
Reply With Quote
 
Dr J R Stockton
Guest
Posts: n/a
 
      08-30-2009
In comp.lang.javascript message <c6cd16fe-1e26-430f-9326-0c95d68ecfee@e2
7g2000yqm.googlegroups.com>, Fri, 28 Aug 2009 19:08:46, Pherdnut
<(E-Mail Removed)> posted:
>On Aug 27, 1:16*pm, Question Boy <(E-Mail Removed)> wrote:
>> I'm trying to find an easy way to count how many time a given word
>> appear on a webpage. *For instance, I would like to be able to count
>> the number of occurance of the word 'Accepted', how would I go about
>> this?


>var splitBySearchWord = (document.body.textContent).split('Accepted');
>alert(splitBySearchWord.length--);


Method .split with a string cannot reliably find words;
"A frantic anteater will eat an infant ant".split("ant").length-1
gives 4 (FF3.0.13).


That apparently (in FF3) does not show words appearing within <input
type=text> or <textarea></textarea>, thereby not answering the question
as asked - "appear on a webpage".

Whether copy'n'paste picks up such words is browser-dependent : IE8 yes,
FF3.0.13 no.

Apparently, document.body.textContent fails in IE8.

Actually, JavaScript cannot do the job as asked completely, since words
can appear in images.

--
(c) John Stockton, nr London UK. ?@merlyn.demon.co.uk BP7, Delphi 3 & 2006.
<URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/&c., FAQqy topics & links;
<URL:http://www.bancoems.com/CompLangPascalDelphiMisc-MiniFAQ.htm> clpdmFAQ;
NOT <URL:http://support.codegear.com/newsgroups/>: news:borland.* Guidelines
 
Reply With Quote
 
Bart Lateur
Guest
Posts: n/a
 
      08-31-2009
Dr J R Stockton wrote:

>Method .split with a string cannot reliably find words;
> "A frantic anteater will eat an infant ant".split("ant").length-1
>gives 4 (FF3.0.13).


This particular piece of code can be fixed with a regex:

"A frantic anteater will eat an infant ant".split(/\bant\b/).length-1


But the rest of your comments still apply.

--
Bart.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to compute the number of occurance of each element in an array list? John Java 16 05-05-2007 12:29 PM
Stange Occurance After Using std::Merge On std::vector's Adam Teasdale Hartshorne C++ 1 08-11-2005 03:11 PM
Strange Occurance on Live but not on Test Server =?Utf-8?B?R3JpZ3M=?= ASP .Net 0 10-13-2004 09:27 PM
validating element occurance based on attribute values Martin Honnen XML 1 07-31-2004 04:00 PM
how to count the occurance of a character in a string ? news.hku.hk C++ 7 04-26-2004 11:30 AM



Advertisments