Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > regular expression for wc

Reply
Thread Tools

regular expression for wc

 
 
Zeh Mau
Guest
Posts: n/a
 
      04-23-2007
Please go to this thread:

http://groups.google.de/group/regex/...aaafcd30?hl=de

Thanks for your support,

Zeh Mau

 
Reply With Quote
 
 
 
 
Thomas J.
Guest
Posts: n/a
 
      04-23-2007
REs are not able to "count".

so the Answer must be: No.

However they may help you to separate words like "wc", but you have to
count those words by yourself (your program).

Thomas

 
Reply With Quote
 
 
 
 
Zeh Mau
Guest
Posts: n/a
 
      04-23-2007
Hello Thomas,

I use LEX to count the results of the REs. So I have only to define
the correct REs which I don't know how they could look like.

Zeh

 
Reply With Quote
 
Mirco Wahab
Guest
Posts: n/a
 
      04-23-2007
Thomas J. wrote:
> REs are not able to "count".
>
> so the Answer must be: No.
>
> However they may help you to separate words like "wc", but you have to
> count those words by yourself (your program).


First shot:
<===

use strict;
use warnings;

my $text='Hello,

is it possible to create a regular expression,
which does exactly the same as the UNIX tool wc,
which means counting
lines, words and all signs of a file?

Thanks,
Zeh Mau';

my %count = (lines=>0, words=>0, characters=>0);
my $re = qr/(?:
\b(?{$count{words}+=0.25})
|
\n(?{++$count{lines}})
|
.(?{++$count{characters}})
)
/xms;

1 while $text =~ /$re/g;

print "$_ => $count{$_}\n" for keys %count;

<===

Needs some more thinking (will look
at it today on evening again

Regards

M.
 
Reply With Quote
 
Zeh Mau
Guest
Posts: n/a
 
      04-23-2007
> Well, that's quite rude.

Sorry, I did not know where to reach most of the people,
so I have chosen the groups which seems reasonable for me. I hope to
have not offended anyone by doing this so


 
Reply With Quote
 
Zeh Mau
Guest
Posts: n/a
 
      04-23-2007
> If you restrict yourself to what the regular expression engine can without
> falling back to Perl, than the answer is "no", for a very simple reason:
> you can only match what is present in the string you match against. And
> usually, the number of lines, words, or characters isn't present in
> the file.


In LEX, I may specify
&&
\n {CountLines++;}

So I get the numbers of lines. So every match increments the variable
CountLines++;

But how do can I separate whole words from the rest of the text?

Zeh

 
Reply With Quote
 
Mirco Wahab
Guest
Posts: n/a
 
      04-23-2007
Mirco Wahab wrote:
>
> Needs some more thinking (will look
> at it today on evening again


As Abigail mentioned in another post,
Perls Regexes allow code assertions,
so this task isn't too hard.

The following should work as
poor-mans wc

[wc.pl] ==>

use strict;
use warnings;

my %wc = (lines=>1, words=>0, chars=>0);
my $re = qr/ \b (?{ $wc{words} += 0.25 })
| \n (?{ $wc{lines} ++ })
| . (?{ $wc{chars} ++ })
/x;

my $text = do { local$/; <> };

print map "$wc{$_} $_, ", keys %wc
if () = $text =~ /$re/g;

<==


Regards

M.
 
Reply With Quote
 
Ala Qumsieh
Guest
Posts: n/a
 
      04-23-2007
Zeh Mau wrote:

> Please go to this thread:
>
>

http://groups.google.de/group/regex/...aaafcd30?hl=de
>


If you want to recreate wc in Perl, then it has been already done for you:

http://ppt.cvs.sourceforge.net/*chec...ppt/ppt/bin/wc

--Ala

 
Reply With Quote
 
anno4000@radom.zrz.tu-berlin.de
Guest
Posts: n/a
 
      04-24-2007
Mirco Wahab <wahab-> wrote in comp.lang.perl.misc:
> Mirco Wahab wrote:
> >
> > Needs some more thinking (will look
> > at it today on evening again

>
> As Abigail mentioned in another post,
> Perls Regexes allow code assertions,
> so this task isn't too hard.
>
> The following should work as
> poor-mans wc
>
> [wc.pl] ==>
>
> use strict;
> use warnings;
>
> my %wc = (lines=>1, words=>0, chars=>0);
> my $re = qr/ \b (?{ $wc{words} += 0.25 })
> | \n (?{ $wc{lines} ++ })
> | . (?{ $wc{chars} ++ })
> /x;
>
> my $text = do { local$/; <> };
>
> print map "$wc{$_} $_, ", keys %wc
> if () = $text =~ /$re/g;


Nice.

I don't understand why it finds four /\b/ for each word, but that's
apparently what happens.

You're initializing the line count to one. For me, that makes it
come out one high.

The character count will be missing the line feeds. Make the
second alternative

| \n (?{ $wc{lines} ++; $wc{chars} ++})

Anno
 
Reply With Quote
 
Mirco Wahab
Guest
Posts: n/a
 
      04-24-2007
wrote:
> Mirco Wahab <wahab-> wrote in comp.lang.perl.misc:
>> my %wc = (lines=>1, words=>0, chars=>0);
>> my $re = qr/ \b (?{ $wc{words} += 0.25 })
>> | \n (?{ $wc{lines} ++ })
>> | . (?{ $wc{chars} ++ })
>> /x;

> I don't understand why it finds four /\b/ for each word, but that's
> apparently what happens.


I struggled over this too, but each word has two ends
and the first character *in front* of a word is
/on a word boundary/, as is the first character
*of the word*. Makes #4 \b's.

> You're initializing the line count to one. For me, that makes it
> come out one high.


If you have any text, you start already on line #1,
thats why I modified this. What you see is probably
the last \n of a text.

> The character count will be missing the line feeds. Make the
> second alternative
>
> | \n (?{ $wc{lines} ++; $wc{chars} ++})


OK, you are possibly right. But - I did take them out
because "word processors" don't count them (checked in
Word 97 under wine).

Regards & Thanks

Mirco
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Seek xpath expression where an attribute name is a regular expression GIMME XML 3 12-29-2008 03:11 PM
C/C++ language proposal: Change the 'case expression' from "integral constant-expression" to "integral expression" Adem C++ 42 11-04-2008 12:39 PM
C/C++ language proposal: Change the 'case expression' from "integral constant-expression" to "integral expression" Adem C Programming 45 11-04-2008 12:39 PM
Matching abitrary expression in a regular expression =?iso-8859-1?B?bW9vcJk=?= Java 8 12-02-2005 12:51 AM
Dynamically changing the regular expression of Regular Expression validator VSK ASP .Net 2 08-24-2003 02:47 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57