Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > newbie's question on the text file processing?

Reply
Thread Tools

newbie's question on the text file processing?

 
 
Jim
Guest
Posts: n/a
 
      12-07-2003
Hello,

I am learning Perl and I have come across something. I would like to process
the text file and calculate the word frequency in it. All analysis is case
insensitive and all punctuation marks other than hyphens, apostrophe and
plus and minus signs were substituted by the space.As I am a new bie, I have
no idea of how to write a complex regular expression to extract the correct
word one by one from the file. Can anyone help me finish the script?



 
Reply With Quote
 
 
 
 
A. Sinan Unur
Guest
Posts: n/a
 
      12-07-2003
"Jim" <(E-Mail Removed)> wrote in
news:bqvj4l$d2h$(E-Mail Removed)99.com:

> Hello,
>
> I am learning Perl and I have come across something. I would like to
> process the text file and calculate the word frequency in it. All
> analysis is case insensitive and all punctuation marks other than
> hyphens, apostrophe and plus and minus signs were substituted by the
> space.As I am a new bie, I have no idea of how to write a complex
> regular expression to extract the correct word one by one from the
> file.


This smells of homework or some other blatant attempt to make others do
your work for you.

> Can anyone help me finish the script?


Show us what you have done so far and ask specific questions.

--
A. Sinan Unur
http://www.velocityreviews.com/forums/(E-Mail Removed)
Remove dashes for address
Spam bait: (E-Mail Removed)
 
Reply With Quote
 
 
 
 
ww
Guest
Posts: n/a
 
      12-07-2003
hint: what does open() do?
hint: what does join(split()) do?
hint: what does grep() return?
hint: I don't know how to solve your problem.

-w w



On Mon, 8 Dec 2003 00:05:51 +0800, "Jim" <(E-Mail Removed)>
wrote:

>Hello,
>
>I am learning Perl and I have come across something. I would like to process
>the text file and calculate the word frequency in it. All analysis is case
>insensitive and all punctuation marks other than hyphens, apostrophe and
>plus and minus signs were substituted by the space.As I am a new bie, I have
>no idea of how to write a complex regular expression to extract the correct
>word one by one from the file. Can anyone help me finish the script?
>
>


 
Reply With Quote
 
Tad McClellan
Guest
Posts: n/a
 
      12-07-2003
Jim <(E-Mail Removed)> wrote:

> I would like to process
> the text file and calculate the word frequency in it.


my %words;
while ( <> ) {
$words{$1}++ while /(\w+)/g;
}
printf "%9d %s\n", $_, $words{$_} for sort keys %words;


--
Tad McClellan SGML consulting
(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
Jim
Guest
Posts: n/a
 
      12-07-2003
while(my $line = <FILE>) {
$line =~ s/[\+\-\']/_/g;
$line = lc $line;
my @array = ($line =~ /\b\w+\b/g);
foreach(@array) {
$wordFreq{$_}++;
}
}

Is this correct? But I am not sure if the code fulfill the requirement.

Jim


 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      12-07-2003
Jim wrote:
> while(my $line = <FILE>) {
> $line =~ s/[\+\-\']/_/g;
> $line = lc $line;
> my @array = ($line =~ /\b\w+\b/g);
> foreach(@array) {
> $wordFreq{$_}++;
> }
> }
>
> Is this correct? But I am not sure if the code fulfill the
> requirement.


How can we say? You don't tell us what the code is supposed to do (i.e. what
are those ominous requirements you are refering to without actually telling
us) or what kind of problems you have with that code or why you believe it
is not correct. Just "question on text file processing" is a bit vague,
don't you think?

Posting your code is good, but it is not sufficient.
Please
- specify the requirement
- explain what the code is supposed to do (or what you think the code is
doing)
- explain what the code is actully doing and in how this is different from
what you expect it to do
- quote literally any warning or error message you are getting
Then we may be able to help you more

jue


 
Reply With Quote
 
John W. Krahn
Guest
Posts: n/a
 
      12-07-2003
Jim wrote:
>
> I am learning Perl and I have come across something. I would like to process
> the text file and calculate the word frequency in it. All analysis is case
> insensitive and all punctuation marks other than hyphens, apostrophe and
> plus and minus signs were substituted by the space.As I am a new bie, I have
> no idea of how to write a complex regular expression to extract the correct
> word one by one from the file. Can anyone help me finish the script?


my %words;
while ( <> ) {
s/[^[:alnum:]'+-]/ /g;
$words{ lc() }++ for /\S+/g;
}

print "$_\t$words{$_}\n" for sort keys %words;



John
--
use Perl;
program
fulfillment
 
Reply With Quote
 
Brian McCauley
Guest
Posts: n/a
 
      12-11-2003

> Subject: newbie's question on the text file processing?


Please put the subject of your post in the Subject of your post. If
in doubt try this simple test. Imagine you could have been bothered
to have done a search before you posted. Next imagine you found a
thread with your subject line. Would you have been able to recognise
it as the same subject?

Note: the words "newbie" and "question" are red-flag words in subject
lines.

"Jim" <(E-Mail Removed)> writes:

[ No context - Please don't overtrim ]

> while(my $line = <FILE>) {
> $line =~ s/[\+\-\']/_/g;
> $line = lc $line;
> my @array = ($line =~ /\b\w+\b/g);
> foreach(@array) {
> $wordFreq{$_}++;
> }
> }
>
> Is this correct? But I am not sure if the code fulfill the requirement.


I don't see why you do s/[\+\-\']/_/g

It I read the requirement correctly you want to treat hyphen, plus and
apostrophe as distinct word characters not replace then with underscore.

The leading \b in /\b\w+\b/ is redundant because // always favours the
ealiest possible match..

The trailing \b in /\b\w+\b/ is redundant because + is greedy.

BTW the variable @array is redundant - you could just use the
expression directly in the argument of foreach().

while(my $line = <FILE>} {
$wordFreq{$_}++ for lc($line) =~ /[-+'\w]+/g;
}
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How do I tell the difference between the end of a text file, and an empty line in a text file? walterbyrd Python 7 05-17-2007 06:02 AM
Re: text file search to text file output possible? Whiskers Computer Support 3 10-07-2006 06:32 PM
Re: text file search to text file output possible? Mitch Computer Support 0 10-06-2006 11:15 PM
Controlling text in a Text Area or Text leo ASP General 1 12-05-2005 01:13 AM
Read Text File and split them to individual text file Krish ASP .Net 1 10-20-2005 03:39 PM



Advertisments