Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Trouble Counting Words, Sentences and Paragraphs

Reply
Thread Tools

Trouble Counting Words, Sentences and Paragraphs

 
 
Max Norman
Guest
Posts: n/a
 
      07-22-2009
I'm working on the first example application from 'Learning Ruby, from
Novice to Professional,' a text analyzer that counts the number of
characters (with and without spaces), lines, words, sentences and
paragraphs in a text document. Unfortunately, I've run into trouble: the
numbers don't seem to be coming out right.

Attached is the text file, for testing purposes, and below is the
source:

lines = File.readlines("text.txt")
line_count = lines.size
text = lines.join

puts "#{line_count} lines."

total_characters = text.length
puts "#{total_characters} characters."

total_characters_nonspaces = text.gsub(/\s+/, '').length
puts "#{total_characters_nonspaces} characters, excluding spaces."

word_count = text.split.length
puts "#{word_count} words."

paragraph_count = text.split(/\n\n/).length
puts "#{paragraph_count} paragraphs."

sentence_count = text.split(/\.|\? |!/).length
puts "#{sentence_count} sentences."

--

Here are the results I get:
42 lines.
6446 characters.
5315 characters, excluding spaces.
1130 words.
2 paragraphs.
44 sentences.

Any and all help/advice would be appreciated.

Attachments:
http://www.ruby-forum.com/attachment/3890/text.txt

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
 
7stud --
Guest
Posts: n/a
 
      07-22-2009
Max Norman wrote:
> Unfortunately, I've run into trouble: the
> numbers don't seem to be coming out right.
>


What are your suspicions?
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
 
Max Norman
Guest
Posts: n/a
 
      07-22-2009
7stud -- wrote:
> Max Norman wrote:
>> Unfortunately, I've run into trouble: the
>> numbers don't seem to be coming out right.
>>

>
> What are your suspicions?


My concern stems from the paragraph count: the application reports only
two paragraphs, but the document is segmented into a score more.
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
7stud --
Guest
Posts: n/a
 
      07-22-2009
Max Norman wrote:
> 7stud -- wrote:
>> Max Norman wrote:
>>> Unfortunately, I've run into trouble: the
>>> numbers don't seem to be coming out right.
>>>

>>
>> What are your suspicions?

>
> My concern stems from the paragraph count: the application reports only
> two paragraphs, but the document is segmented into a score more.


The code defines a paragraph as two consecutive newlines, which would
look like this:


hello world.
other text.

goodbye world.
other text.

This is what your text file looks like to me:

{\rtf1\ansi\ansicpg1252\cocoartf949\cocoasubrtf460
{\fonttbl\f0\fnil\fcharset0 Verdana;}
{\colortbl;\red255\green255\blue255;}
\margl1440\margr1440\vieww9000\viewh8400\viewkind0 \deftab720
\pard\pardeftab720\ql\qnatural \f0\fs24 \cf0 Among other public
buildings in a certain town, which for many reasons it will be prudent
to refrain from mentioning, and to which I will assign no fictitious
name, there is one anciently common to most towns, great or small: to
wit, a workhouse; and in this workhouse was born; on a day and date
which I need not trouble myself to repeat, inasmuch as it can be of no
possible consequence to the reader, in this stage of the business at all
events; the item of mortality whose name is prefixed to the head of this
chapter.\ \ For a long time after it was ushered into this world of
sorrow and trouble, by the parish surgeon, it remained a matter of
considerable doubt whether..
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Max Norman
Guest
Posts: n/a
 
      07-22-2009
This is what the text file should look like:
Among other public buildings in a certain town, which for many reasons
it will be prudent to refrain from mentioning, and to which I will
assign no fictitious name, there is one anciently common to most towns,
great or small: to wit, a workhouse; and in this workhouse was born; on
a day and date which I need not trouble myself to repeat, inasmuch as it
can be of no possible consequence to the reader, in this stage of the
business at all events; the item of mortality whose name is prefixed to
the head of this chapter.

For a long time after it was ushered into this world of sorrow and
trouble, by the parish surgeon, it remained a matter of considerable
doubt whether the child would survive to bear any name at all; in which
case it is somewhat more than probable that these memoirs would never
have appeared; or, if they had, that being comprised within a couple of
pages, they would have possessed the inestimable merit of being the most
concise and faithful specimen of biography, extant in the literature of
any age or country.

Although I am not disposed to maintain that the being born in a
workhouse, is in itself the most fortunate and enviable circumstance
that can possibly befall a human being, I do mean to say that in this
particular instance, it was the best thing for Oliver Twist that could
by possibility have occurred. The fact is, that there was considerable
difficulty in inducing Oliver to take upon himself the office of
respiration,--a troublesome practice, but one which custom has rendered
necessary to our easy existence; and for some time he lay gasping on a
little flock mattress, rather unequally poised between this world and
the next: the balance being decidedly in favour of the latter. Now, if,
during this brief period, Oliver had been surrounded by careful
grandmothers, anxious aunts, experienced nurses, and doctors of profound
wisdom, he would most inevitably and indubitably have been killed in no
time. There being nobody by, however, but a pauper old woman, who was
rendered rather misty by an unwonted allowance of beer; and a parish
surgeon who did such matters by contract; Oliver and Nature fought out
the point between them. The result was, that, after a few struggles,
Oliver breathed, sneezed, and proceeded to advertise to the inmates of
the workhouse the fact of a new burden having been imposed upon the
parish, by setting up as loud a cry as could reasonably have been
expected from a male infant who had not been possessed of that very
useful appendage, a voice, for a much longer space of time than three
minutes and a quarter.

As Oliver gave this first proof of the free and proper action of his
lungs, the patchwork coverlet which was carelessly flung over the iron
bedstead, rustled; the pale face of a young woman was raised feebly from
the pillow; and a faint voice imperfectly articulated the words, 'Let me
see the child, and die.'

The surgeon had been sitting with his face turned towards the fire:
giving the palms of his hands a warm and a rub alternately. As the young
woman spoke, he rose, and advancing to the bed's head, said, with more
kindness than might have been expected of him:

'Oh, you must not talk about dying yet.'

'Lor bless her dear heart, no!' interposed the nurse, hastily
depositing in her pocket a green glass bottle, the contents of which she
had been tasting in a corner with evident satisfaction.

'Lor bless her dear heart, when she has lived as long as I have, sir,
and had thirteen children of her own, and all on 'em dead except two,
and them in the wurkus with me, she'll know better than to take on in
that way, bless her dear heart! Think what it is to be a mother, there's
a dear young lamb do.'

Apparently this consolatory perspective of a mother's prospects failed
in producing its due effect. The patient shook her head, and stretched
out her hand towards the child.

The surgeon deposited it in her arms. She imprinted her cold white lips
passionately on its forehead; passed her hands over her face; gazed
wildly round; shuddered; fell back--and died. They chafed her breast,
hands, and temples; but the blood had stopped forever. They talked of
hope and comfort. They had been strangers too long.

'It's all over, Mrs. Thingummy!' said the surgeon at last.

'Ah, poor dear, so it is!' said the nurse, picking up the cork of the
green bottle, which had fallen out on the pillow, as she stooped to take
up the child. 'Poor dear!'

'You needn't mind sending up to me, if the child cries, nurse,' said the
surgeon, putting on his gloves with great deliberation. 'It's very
likely it WILL be troublesome. Give it a little gruel if it is.' He put
on his hat, and, pausing by the bed-side on his way to the door, added,
'She was a good-looking girl, too; where did she come from?'

'She was brought here last night,' replied the old woman, 'by the
overseer's order. She was found lying in the street. She had walked some
distance, for her shoes were worn to pieces; but where she came from, or
where she was going to, nobody knows.'

The surgeon leaned over the body, and raised the left hand. 'The old
story,' he said, shaking his head: 'no wedding-ring, I see. Ah!
Good-night!'

The medical gentleman walked away to dinner; and the nurse, having once
more applied herself to the green bottle, sat down on a low chair before
the fire, and proceeded to dress the infant.

What an excellent example of the power of dress, young Oliver Twist was!
Wrapped in the blanket which had hitherto formed his only covering, he
might have been the child of a nobleman or a beggar; it would have been
hard for the haughtiest stranger to have assigned him his proper station
in society. But now that he was enveloped in the old calico robes which
had grown yellow in the same service, he was badged and ticketed, and
fell into his place at once--a parish child--the orphan of a
workhouse--the humble, half-starved drudge--to be cuffed and buffeted
through the world--despised by all, and pitied by none.

Oliver cried lustily. If he could have known that he was an orphan, left
to the tender mercies of church-wardens and overseers, perhaps he would
have cried the louder.
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
7stud --
Guest
Posts: n/a
 
      07-22-2009
Try this:


require 'pp'

lines = File.readlines("text.txt")
pp lines
puts "----"

text = lines.join

paragraph_count = text.split(/\n\n/).length
puts "#{paragraph_count} paragraphs."

What do you see?
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Max Norman
Guest
Posts: n/a
 
      07-22-2009
I solved the problem by saving the text as 'plain text' in Textmate.
TextEdit was preserving the formatting from the website I copied the
text off of.
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
how to remove the punctuation and no need words from paragraphs kylin Python 1 11-04-2009 07:42 AM
counting up instead of counting down edwardfredriks Javascript 6 09-07-2005 03:30 PM
Counting no.of sentences Guru Nathan via JavaKB.com Java 13 03-03-2005 01:10 PM
regexp to list all sentences and sub sentences, with overlapping? Tony Perl 4 11-27-2003 01:38 PM
"left floated" paragraphs and images using CSS? jersie0 HTML 0 11-23-2003 02:43 AM



Advertisments