Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   regular expression for wc (http://www.velocityreviews.com/forums/t902795-regular-expression-for-wc.html)

Zeh Mau 04-23-2007 12:02 PM

regular expression for wc
 
Please go to this thread:

http://groups.google.de/group/regex/...aaafcd30?hl=de

Thanks for your support,

Zeh Mau


Thomas J. 04-23-2007 02:06 PM

Re: regular expression for wc
 
REs are not able to "count".

so the Answer must be: No.

However they may help you to separate words like "wc", but you have to
count those words by yourself (your program).

Thomas


Zeh Mau 04-23-2007 02:21 PM

Re: regular expression for wc
 
Hello Thomas,

I use LEX to count the results of the REs. So I have only to define
the correct REs which I don't know how they could look like.

Zeh


Mirco Wahab 04-23-2007 02:52 PM

Re: regular expression for wc
 
Thomas J. wrote:
> REs are not able to "count".
>
> so the Answer must be: No.
>
> However they may help you to separate words like "wc", but you have to
> count those words by yourself (your program).


First shot:
<===

use strict;
use warnings;

my $text='Hello,

is it possible to create a regular expression,
which does exactly the same as the UNIX tool wc,
which means counting
lines, words and all signs of a file?

Thanks,
Zeh Mau';

my %count = (lines=>0, words=>0, characters=>0);
my $re = qr/(?:
\b(?{$count{words}+=0.25})
|
\n(?{++$count{lines}})
|
.(?{++$count{characters}})
)
/xms;

1 while $text =~ /$re/g;

print "$_ => $count{$_}\n" for keys %count;

<===

Needs some more thinking (will look
at it today on evening again ;-)

Regards

M.

Zeh Mau 04-23-2007 05:07 PM

Re: regular expression for wc
 
> Well, that's quite rude.

Sorry, I did not know where to reach most of the people,
so I have chosen the groups which seems reasonable for me. I hope to
have not offended anyone by doing this so :)



Zeh Mau 04-23-2007 05:12 PM

Re: regular expression for wc
 
> If you restrict yourself to what the regular expression engine can without
> falling back to Perl, than the answer is "no", for a very simple reason:
> you can only match what is present in the string you match against. And
> usually, the number of lines, words, or characters isn't present in
> the file.


In LEX, I may specify
&&
\n {CountLines++;}

So I get the numbers of lines. So every match increments the variable
CountLines++;

But how do can I separate whole words from the rest of the text?

Zeh


Mirco Wahab 04-23-2007 06:34 PM

Re: regular expression for wc
 
Mirco Wahab wrote:
>
> Needs some more thinking (will look
> at it today on evening again ;-)


As Abigail mentioned in another post,
Perls Regexes allow code assertions,
so this task isn't too hard.

The following should work as
poor-mans wc ;-)

[wc.pl] ==>

use strict;
use warnings;

my %wc = (lines=>1, words=>0, chars=>0);
my $re = qr/ \b (?{ $wc{words} += 0.25 })
| \n (?{ $wc{lines} ++ })
| . (?{ $wc{chars} ++ })
/x;

my $text = do { local$/; <> };

print map "$wc{$_} $_, ", keys %wc
if () = $text =~ /$re/g;

<==


Regards

M.

Ala Qumsieh 04-23-2007 08:22 PM

Re: regular expression for wc
 
Zeh Mau wrote:

> Please go to this thread:
>
>

http://groups.google.de/group/regex/...aaafcd30?hl=de
>


If you want to recreate wc in Perl, then it has been already done for you:

http://ppt.cvs.sourceforge.net/*chec...ppt/ppt/bin/wc

--Ala


anno4000@radom.zrz.tu-berlin.de 04-24-2007 10:00 AM

Re: regular expression for wc
 
Mirco Wahab <wahab-mail@gmx.de> wrote in comp.lang.perl.misc:
> Mirco Wahab wrote:
> >
> > Needs some more thinking (will look
> > at it today on evening again ;-)

>
> As Abigail mentioned in another post,
> Perls Regexes allow code assertions,
> so this task isn't too hard.
>
> The following should work as
> poor-mans wc ;-)
>
> [wc.pl] ==>
>
> use strict;
> use warnings;
>
> my %wc = (lines=>1, words=>0, chars=>0);
> my $re = qr/ \b (?{ $wc{words} += 0.25 })
> | \n (?{ $wc{lines} ++ })
> | . (?{ $wc{chars} ++ })
> /x;
>
> my $text = do { local$/; <> };
>
> print map "$wc{$_} $_, ", keys %wc
> if () = $text =~ /$re/g;


Nice.

I don't understand why it finds four /\b/ for each word, but that's
apparently what happens.

You're initializing the line count to one. For me, that makes it
come out one high.

The character count will be missing the line feeds. Make the
second alternative

| \n (?{ $wc{lines} ++; $wc{chars} ++})

Anno

Mirco Wahab 04-24-2007 10:53 AM

Re: regular expression for wc
 
anno4000@radom.zrz.tu-berlin.de wrote:
> Mirco Wahab <wahab-mail@gmx.de> wrote in comp.lang.perl.misc:
>> my %wc = (lines=>1, words=>0, chars=>0);
>> my $re = qr/ \b (?{ $wc{words} += 0.25 })
>> | \n (?{ $wc{lines} ++ })
>> | . (?{ $wc{chars} ++ })
>> /x;

> I don't understand why it finds four /\b/ for each word, but that's
> apparently what happens.


I struggled over this too, but each word has two ends
and the first character *in front* of a word is
/on a word boundary/, as is the first character
*of the word*. Makes #4 \b's.

> You're initializing the line count to one. For me, that makes it
> come out one high.


If you have any text, you start already on line #1,
thats why I modified this. What you see is probably
the last \n of a text.

> The character count will be missing the line feeds. Make the
> second alternative
>
> | \n (?{ $wc{lines} ++; $wc{chars} ++})


OK, you are possibly right. But - I did take them out
because "word processors" don't count them (checked in
Word 97 under wine).

Regards & Thanks

Mirco


All times are GMT. The time now is 01:44 AM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.