Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > How find all overlapping pattern?

Reply
Thread Tools

How find all overlapping pattern?

 
 
Peng Yu
Guest
Posts: n/a
 
      02-07-2011
$string="abcabcabc";
@findall = $string =~ /abcabc/g;
print scalar(@findall), "\n";

The above commands will print 1 rather than 2. Because there are two
overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
correct way to find all overlapping regexes. (Note that I gave
'abcabc' as an example, but it could be any complex regex) Thanks!
 
Reply With Quote
 
 
 
 
jl_post@hotmail.com
Guest
Posts: n/a
 
      02-07-2011
On Feb 7, 8:31*am, Peng Yu <(E-Mail Removed)> wrote:
> $string="abcabcabc";
> @findall = $string =~ /abcabc/g;
> print scalar(@findall), "\n";
>
> The above commands will print 1 rather than 2. Because there are two
> overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
> correct way to find all overlapping regexes. (Note that I gave
> 'abcabc' as an example, but it could be any complex regex) Thanks!



Dear Peng Yu,

Here's one way to do it:

while ($string =~ m/(abcabc)/g)
{
push @findall, $1;
pos($string) = $-[0] + 1;
}

If you prefer to implement it in one line of code, you can do this:

push(@findall, $1) and pos($string) = $-[0] + 1
while $string =~ m/(abcabc)/g;

Here's the explanation of what is happening: Normally, m//g and
s///g both make additional matches AFTER (or right at) the end of the
previous match, meaning that you can't directly use them to find
overlapping patterns. However, inside a while($string =~ m//g) loop
you can manipulate the pos($string) variable to force m//g to begin
looking wherever you want -- or in your case, one character after the
start of the last match. (You have to start one (or more) characters
after, because if you started at (or before) the start of the last
match the loop would be infinite.)

As for the $-[0] variable, that's the first element of the @-
array, which you can look up with "perldoc -v @-". $-[0] is basically
the start of the last successful match, so ($-[0] + 1) would be the
earliest where you would want to continue your search for overlapping
patterns.

I hope this helps, Peng Yu.

Cheers,

-- Jean-Luc
 
Reply With Quote
 
 
 
 
ccc31807
Guest
Posts: n/a
 
      02-07-2011
On Feb 7, 10:31*am, Peng Yu <(E-Mail Removed)> wrote:
> $string="abcabcabc";
> @findall = $string =~ /abcabc/g;
> print scalar(@findall), "\n";
>
> The above commands will print 1 rather than 2. Because there are two
> overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
> correct way to find all overlapping regexes. (Note that I gave
> 'abcabc' as an example, but it could be any complex regex) Thanks!


You don't have to use a regular expression in a case like this. You
can use index($string, $substring, $position) in a loop, ending the
loop which $position is less than zero. This is how you might do it in
a language like C.

Sometimes, the simpler way is better.

CC.
 
Reply With Quote
 
Ilya Zakharevich
Guest
Posts: n/a
 
      02-07-2011
On 2011-02-07, Peng Yu <(E-Mail Removed)> wrote:
> $string="abcabcabc";
> @findall = $string =~ /abcabc/g;
> print scalar(@findall), "\n";
>
> The above commands will print 1 rather than 2. Because there are two
> overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
> correct way to find all overlapping regexes. (Note that I gave
> 'abcabc' as an example, but it could be any complex regex) Thanks!


Do not use RExes which "move the match point too far" (i.e., match
more than one character). In some situations 0-length match may cause
a problem (non-intuitive semantic), but if the REx is ALWAYS matching
0-length substring, the match rules are intuitive again.

So use /(?=(abcabc))/g.

Hope this helps,
Ilya
 
Reply With Quote
 
C.DeRykus
Guest
Posts: n/a
 
      02-08-2011
On Feb 7, 8:46*am, ccc31807 <(E-Mail Removed)> wrote:
> On Feb 7, 10:31*am, Peng Yu <(E-Mail Removed)> wrote:
>
> > $string="abcabcabc";
> > @findall = $string =~ /abcabc/g;
> > print scalar(@findall), "\n";

>
> > The above commands will print 1 rather than 2. Because there are two
> > overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
> > correct way to find all overlapping regexes. (Note that I gave
> > 'abcabc' as an example, but it could be any complex regex) Thanks!

>
> You don't have to use a regular expression in a case like this. You
> can use index($string, $substring, $position) in a loop, ending the
> loop which $position is less than zero. This is how you might do it in
> a language like C.
>
> Sometimes, the simpler way is better.
>


True in some cases but, IMO, a regex
is shorter and arguably much easier
here:


$_ = "abcabcabc";
($count, $pos ) = ( 0, 0 );


# regex
$count++ while /(?=abcabc)/g and ++$pos;

vs.

# index
while ($pos != -1 ) {
$pos = index( $_, 'abcabc', $pos );
$count++,$pos++ unless $pos == -1;
}

# and a trap lurks with this alternative
while ($pos != -1 ) {
$pos = index( $_, 'abcabc', $pos );
$count++ and $pos++ unless $pos == -1;
}

--
Charles DeRykus

 
Reply With Quote
 
Peter J. Holzer
Guest
Posts: n/a
 
      02-08-2011
On 2011-02-07 16:46, ccc31807 <(E-Mail Removed)> wrote:
> On Feb 7, 10:31*am, Peng Yu <(E-Mail Removed)> wrote:
>> $string="abcabcabc";
>> @findall = $string =~ /abcabc/g;
>> print scalar(@findall), "\n";
>>
>> The above commands will print 1 rather than 2. Because there are two
>> overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
>> correct way to find all overlapping regexes. (Note that I gave

^^^^^^^^^^^^^^^^
>> 'abcabc' as an example, but it could be any complex regex) Thanks!

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^
>
> You don't have to use a regular expression in a case like this. You
> can use index($string, $substring, $position) in a loop,


You did read what the OP wrote, did you?

hp

 
Reply With Quote
 
ccc31807
Guest
Posts: n/a
 
      02-08-2011
On Feb 8, 10:09*am, "Peter J. Holzer" <(E-Mail Removed)> wrote:
> You did read what the OP wrote, did you?


I did, and I thought about it. Several times in the past few weeks,
I've had problems with REs acting poorly, and used other means to do
what I needed to do, primarily index() and substr().

My point was not that an RE can always be replaced by built in
functions, but that an RE can sometimes be replaced by built in
functions.

CC.
 
Reply With Quote
 
jl_post@hotmail.com
Guest
Posts: n/a
 
      02-08-2011
> On Feb 7, 8:31*am, Peng Yu <(E-Mail Removed)> wrote:
>
> > $string="abcabcabc";
> > @findall = $string =~ /abcabc/g;
> > print scalar(@findall), "\n";

>
> > The above commands will print 1 rather than 2. Because there are two
> > overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
> > correct way to find all overlapping regexes. (Note that I gave
> > 'abcabc' as an example, but it could be any complex regex) Thanks!



On Feb 7, 9:02*am, "(E-Mail Removed)" <(E-Mail Removed)>
replied:
>
> * *Here's one way to do it:
>
> * * * while ($string =~ m/(abcabc)/g)
> * * * {
> * * * * *push @findall, $1;
> * * * * *pos($string) = $-[0] + 1;
> * * * }



Hmmm... after reading the other replies, I think that:

@findall = $string =~ /(?=abcabc)/g;

(which uses a positive look-head) is probably the cleaner solution.

Just my opinion.

-- Jean-Luc
 
Reply With Quote
 
sln@netherlands.com
Guest
Posts: n/a
 
      02-08-2011
On Mon, 7 Feb 2011 22:24:25 +0000 (UTC), Ilya Zakharevich <(E-Mail Removed)> wrote:

>On 2011-02-07, Peng Yu <(E-Mail Removed)> wrote:
>> $string="abcabcabc";
>> @findall = $string =~ /abcabc/g;
>> print scalar(@findall), "\n";
>>
>> The above commands will print 1 rather than 2. Because there are two
>> overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
>> correct way to find all overlapping regexes. (Note that I gave
>> 'abcabc' as an example, but it could be any complex regex) Thanks!

>
>Do not use RExes which "move the match point too far" (i.e., match
>more than one character). In some situations 0-length match may cause
>a problem (non-intuitive semantic), but if the REx is ALWAYS matching
>0-length substring, the match rules are intuitive again.
>
>So use /(?=(abcabc))/g.
>


s/ALWAYS/ONLY/

Nice, and the behavior should be the same if quantifiers and/or
assertions are added.

-sln
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
How to find all possibly overlapping matches? kj Python 2 08-12-2009 06:00 PM
vpn on 2811 with overlapping networks and all natting on one side Robby Cauwerts Cisco 2 11-27-2007 06:25 AM
Find.find does not find orphaned links? Wybo Dekker Ruby 1 11-15-2005 02:50 PM
regexp to list all sentences and sub sentences, with overlapping? Tony Perl 4 11-27-2003 01:38 PM



Advertisments