Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > How to take acton upon a pattern of nth occurrence?

Reply
Thread Tools

How to take acton upon a pattern of nth occurrence?

 
 
Ross
Guest
Posts: n/a
 
      07-06-2005
Dear all,
For the sequence below (indeed a single line), when i use the conditional
checking

if ($line =~ /(.*)A{10,}(.*)/ ) {
$tmpline = $1;
}

to try to remove substring after 10 or more consecutive A's, perl seems to
recognize the last poly A's and leave the former ones intact. what can i do?
In general, How to take acton upon a pattern of nth occurrence?

TCCTCAGTGGGAATTCGGCATTACGGCCGGGGCACCACAATGAATGATCA TTTTC
TTCTTTGCTCTCCTTGCTATTGCTGCATGCAGCGCCTCTGCGCAGTTTGA TGCTG
TTACTCAAGTTTACAGGCAATATCAGCTGCAGCCGCATCTCATGCTGCAG CAACA
GATGCTTAGCCCATGCGGTGAGTTCGTAAGGCAGCAGTGCAGCACAGTGG CAACC
CCCTTCTTCCAATCACCCGTGTTTCAACTGAGAAACTGCCAAGTCATGCA GCAGC
AGTGCTGCCAACAGCTCAGGATGATCGCGCAACAGTCTCACCGCCAGGCC ATTAG
TAGTGTTCAGGCGATTGTGCAGCAGCTACAGCTACAACAGTTTGCTGGCG TCTAC
TTCGATCAGACTCAAGCTCAAGCCCAAGCTATGTTGGCCCTAAACTTGCT GTCAA
TATGCGGTATCTACCCAAGCTACAACACTGCTCCCTGTAGCATTCCCACC GTCGG
TGGTATCTGGTACTGAATTGTAGCAGTATAGTAGTACAGGAGAGAAAAAT AAAGT
CATGCATCATCGTGTGTGACAAGTTGAAACATCGGGGTGATACAAATCTG AATAA
AAATGTCATGCAAGTTTAAACANNNNANANNNANNNNAAANAAAAAAAAA AAAAA
AAAANANAAAAAAAAAAAAAAAAAAAAAAAAAAANAAAAANAAAAAAAAA AAAAA
AAAAANNNNNNANANNNNNNAAAAAAAAAAAAAAAAANNNNNNNNNNGGG GGGGG
GGGGGGGCGGGAAGAAAAAAAAAAA



 
Reply With Quote
 
 
 
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      07-06-2005
Ross wrote:
> For the sequence below (indeed a single line), when i use the conditional
> checking
>
> if ($line =~ /(.*)A{10,}(.*)/ ) {

------------------------^-^^^^
Why the comma?
Why do you have Perl capture the part you are not interested in?

> $tmpline = $1;
> }
>
> to try to remove substring after 10 or more consecutive A's, perl seems to
> recognize the last poly A's and leave the former ones intact. what can i do?


Did you try making .* non-greedy?

/(.*?A{10})/

(including A{10} in the capturing parenteses matches your description).

Please read about greediness in "perldoc perlre".

> In general, How to take acton upon a pattern of nth occurrence?


Specifically, what is a "pattern of nth occurrence"?

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
 
 
 
Ross
Guest
Posts: n/a
 
      07-06-2005
"Gunnar Hjalmarsson" <> wrote in message
news:...
> Ross wrote:
>> For the sequence below (indeed a single line), when i use the conditional
>> checking
>>
>> if ($line =~ /(.*)A{10,}(.*)/ ) {

> ------------------------^-^^^^
> Why the comma?
> Why do you have Perl capture the part you are not interested in?
>
>> $tmpline = $1;
>> }
>>
>> to try to remove substring after 10 or more consecutive A's, perl seems
>> to recognize the last poly A's and leave the former ones intact. what can
>> i do?

>
> Did you try making .* non-greedy?
>
> /(.*?A{10})/
>
> (including A{10} in the capturing parenteses matches your description).
>
> Please read about greediness in "perldoc perlre".
>
>> In general, How to take acton upon a pattern of nth occurrence?

>
> Specifically, what is a "pattern of nth occurrence"?


Dear Gunnar Hjalmarsson,
I am a beginner. From a Perl bible, {10,} means minimum number and without
maximum. What i'm trying to specify the pattern is to find out, the 1st
occurrence of poly A's (number equal to or more than 10) and then only
retain the substring in front of it. To generalize, i wonder if there is
convenient built-in syntax to test for nth occurrence (indeed, i know a loop
may do the task). the .*, according to the bible, just represents a
substring of any pattern only. Thanks for your response.

--Ross


 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      07-06-2005
Ross wrote:
> Gunnar Hjalmarsson wrote:
>> Ross wrote:
>>> For the sequence below (indeed a single line), when i use the conditional
>>> checking
>>>
>>> if ($line =~ /(.*)A{10,}(.*)/ ) {

>> -----------------------^-^^^^
>> Why the comma?
>> Why do you have Perl capture the part you are not interested in?
>>
>>> $tmpline = $1;
>>> }
>>>
>>> to try to remove substring after 10 or more consecutive A's, perl seems
>>> to recognize the last poly A's and leave the former ones intact. what can
>>> i do?

>>
>> Did you try making .* non-greedy?
>>
>> /(.*?A{10})/
>>
>> (including A{10} in the capturing parenteses matches your description).
>>
>> Please read about greediness in "perldoc perlre".
>>
>>> In general, How to take acton upon a pattern of nth occurrence?

>>
>> Specifically, what is a "pattern of nth occurrence"?

>
> From a Perl bible, {10,} means minimum number and without
> maximum.


I agree that {10,} means minimum, but {10} means exactly 10, not maximum
( which would have been written {0,10} ). If your "bible" says something
else, please drop it and use the Perl documentation instead.

The reason I asked about the comma, i.e. matching at least 10, is that
it seems to be unnecessary considering what you are trying to do. But
it's not wrong. Just trying to avoid a redundant character.

> What i'm trying to specify the pattern is to find out, the 1st
> occurrence of poly A's (number equal to or more than 10) and then only
> retain the substring in front of it.


That differs slightly from how you explained it in your original post,
so I'd better change my suggested regex to

/(.*?)A{10}/

Did you try it?

Did you read about greediness in "perldoc perlre"?

> To generalize, i wonder if there is
> convenient built-in syntax to test for nth occurrence


And I still don't understand the meaning of "nth occurrence".

> the .*, according to the bible, just represents a
> substring of any pattern


Not newline by default, according to the Perl documentation.

Greediness, greediness...

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
A. Sinan Unur
Guest
Posts: n/a
 
      07-06-2005
"Ross" <> wrote in
news:dafs3h$b2l$:

> Dear all,
> For the sequence below (indeed a single line), when i use the
> conditional
> checking
>
> if ($line =~ /(.*)A{10,}(.*)/ ) {
> $tmpline = $1;
> }
>
> to try to remove substring after 10 or more consecutive A's, perl
> seems to recognize the last poly A's and leave the former ones intact.
> what can i do? In general, How to take acton upon a pattern of nth
> occurrence?


It seems to me that you should be using index rather than regular
expressions, although I am not sure what you mean by "nth occurence".

If I understand you correctly, you want to find the first string of at
least 10 As, and only keep the substring up to and including the last
character before that string of at least 10 As. That can be translated
directly to Perl in a very straightforward way:
#!/usr/bin/perl

use strict;
use warnings;

my $s;

while( <DATA> ) {
chomp;
$s .= $_;
}

my $r = substr $s, 0, index $s, 'AAAAAAAAAA';

print "$r\n";


__END__
TCCTCAGTGGGAATTCGGCATTACGGCCGGGGCACCACAATGAATGATCA TTTTC
TTCTTTGCTCTCCTTGCTATTGCTGCATGCAGCGCCTCTGCGCAGTTTGA TGCTG
TTACTCAAGTTTACAGGCAATATCAGCTGCAGCCGCATCTCATGCTGCAG CAACA
GATGCTTAGCCCATGCGGTGAGTTCGTAAGGCAGCAGTGCAGCACAGTGG CAACC
CCCTTCTTCCAATCACCCGTGTTTCAACTGAGAAACTGCCAAGTCATGCA GCAGC
AGTGCTGCCAACAGCTCAGGATGATCGCGCAACAGTCTCACCGCCAGGCC ATTAG
TAGTGTTCAGGCGATTGTGCAGCAGCTACAGCTACAACAGTTTGCTGGCG TCTAC
TTCGATCAGACTCAAGCTCAAGCCCAAGCTATGTTGGCCCTAAACTTGCT GTCAA
TATGCGGTATCTACCCAAGCTACAACACTGCTCCCTGTAGCATTCCCACC GTCGG
TGGTATCTGGTACTGAATTGTAGCAGTATAGTAGTACAGGAGAGAAAAAT AAAGT
CATGCATCATCGTGTGTGACAAGTTGAAACATCGGGGTGATACAAATCTG AATAA
AAATGTCATGCAAGTTTAAACANNNNANANNNANNNNAAANAAAAAAAAA AAAAA
AAAANANAAAAAAAAAAAAAAAAAAAAAAAAAAANAAAAANAAAAAAAAA AAAAA
AAAAANNNNNNANANNNNNNAAAAAAAAAAAAAAAAANNNNNNNNNNGGG GGGGG
GGGGGGGCGGGAAGAAAAAAAAAAA

--
A. Sinan Unur <>
(reverse each component and remove .invalid for email address)

comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/cl...uidelines.html
 
Reply With Quote
 
Brandon
Guest
Posts: n/a
 
      07-06-2005
Try adding ? to the first .* to make (.*?). The * modifier takes up as much
as it can without turning false, so it'll match the last 10 consecutive A's,
not the first.

"Ross" <> wrote in message
news:dafs3h$b2l$...
> Dear all,
> For the sequence below (indeed a single line), when i use the conditional
> checking
>
> if ($line =~ /(.*)A{10,}(.*)/ ) {
> $tmpline = $1;
> }
>
> to try to remove substring after 10 or more consecutive A's, perl seems to
> recognize the last poly A's and leave the former ones intact. what can i
> do? In general, How to take acton upon a pattern of nth occurrence?
>
> TCCTCAGTGGGAATTCGGCATTACGGCCGGGGCACCACAATGAATGATCA TTTTC
> TTCTTTGCTCTCCTTGCTATTGCTGCATGCAGCGCCTCTGCGCAGTTTGA TGCTG
> TTACTCAAGTTTACAGGCAATATCAGCTGCAGCCGCATCTCATGCTGCAG CAACA
> GATGCTTAGCCCATGCGGTGAGTTCGTAAGGCAGCAGTGCAGCACAGTGG CAACC
> CCCTTCTTCCAATCACCCGTGTTTCAACTGAGAAACTGCCAAGTCATGCA GCAGC
> AGTGCTGCCAACAGCTCAGGATGATCGCGCAACAGTCTCACCGCCAGGCC ATTAG
> TAGTGTTCAGGCGATTGTGCAGCAGCTACAGCTACAACAGTTTGCTGGCG TCTAC
> TTCGATCAGACTCAAGCTCAAGCCCAAGCTATGTTGGCCCTAAACTTGCT GTCAA
> TATGCGGTATCTACCCAAGCTACAACACTGCTCCCTGTAGCATTCCCACC GTCGG
> TGGTATCTGGTACTGAATTGTAGCAGTATAGTAGTACAGGAGAGAAAAAT AAAGT
> CATGCATCATCGTGTGTGACAAGTTGAAACATCGGGGTGATACAAATCTG AATAA
> AAATGTCATGCAAGTTTAAACANNNNANANNNANNNNAAANAAAAAAAAA AAAAA
> AAAANANAAAAAAAAAAAAAAAAAAAAAAAAAAANAAAAANAAAAAAAAA AAAAA
> AAAAANNNNNNANANNNNNNAAAAAAAAAAAAAAAAANNNNNNNNNNGGG GGGGG
> GGGGGGGCGGGAAGAAAAAAAAAAA
>
>
>



 
Reply With Quote
 
Debo
Guest
Posts: n/a
 
      07-06-2005
On Wed, 6 Jul 2005, Ross wrote:
R> Dear all,
<snip code>
R> to try to remove substring after 10 or more consecutive A's, perl seems to
R> recognize the last poly A's and leave the former ones intact. what can i do?
R> In general, How to take acton upon a pattern of nth occurrence?

Generally, if you're trying to do something that seems fairly common --
such as trimming a poly-A tail -- it has already been done in bioperl

This seems to be somewhat along the lines of what you're trying to do:

http://doc.bioperl.org/bioperl-run/B...n/trimest.html

If that's not helpful, let me know and I'll see if I can dig up something
better.

-Debo

 
Reply With Quote
 
Ross
Guest
Posts: n/a
 
      07-06-2005
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> my $s;
>
> while( <DATA> ) {
> chomp;
> $s .= $_;
> }


what is .= ?

It is most difficult to find explanation from Google as .= cannot be
searched. any suggestion for a novice? again, does __END__ declare the
beginning of DATA? index can't also be searched as web search engine
interprets as another meaning. anyway, thanks for all the responders'
replies.

>
> my $r = substr $s, 0, index $s, 'AAAAAAAAAA';
>
> print "$r\n";
>
>
> __END__
> TCCTCAGTGGGAATTCGGCATTACGGCCGGGGCACCACAATGAATGATCA TTTTC
> TTCTTTGCTCTCCTTGCTATTGCTGCATGCAGCGCCTCTGCGCAGTTTGA TGCTG
> TTACTCAAGTTTACAGGCAATATCAGCTGCAGCCGCATCTCATGCTGCAG CAACA
> GATGCTTAGCCCATGCGGTGAGTTCGTAAGGCAGCAGTGCAGCACAGTGG CAACC
> CCCTTCTTCCAATCACCCGTGTTTCAACTGAGAAACTGCCAAGTCATGCA GCAGC
> AGTGCTGCCAACAGCTCAGGATGATCGCGCAACAGTCTCACCGCCAGGCC ATTAG
> TAGTGTTCAGGCGATTGTGCAGCAGCTACAGCTACAACAGTTTGCTGGCG TCTAC
> TTCGATCAGACTCAAGCTCAAGCCCAAGCTATGTTGGCCCTAAACTTGCT GTCAA
> TATGCGGTATCTACCCAAGCTACAACACTGCTCCCTGTAGCATTCCCACC GTCGG
> TGGTATCTGGTACTGAATTGTAGCAGTATAGTAGTACAGGAGAGAAAAAT AAAGT
> CATGCATCATCGTGTGTGACAAGTTGAAACATCGGGGTGATACAAATCTG AATAA
> AAATGTCATGCAAGTTTAAACANNNNANANNNANNNNAAANAAAAAAAAA AAAAA
> AAAANANAAAAAAAAAAAAAAAAAAAAAAAAAAANAAAAANAAAAAAAAA AAAAA
> AAAAANNNNNNANANNNNNNAAAAAAAAAAAAAAAAANNNNNNNNNNGGG GGGGG
> GGGGGGGCGGGAAGAAAAAAAAAAA
>
> --
> A. Sinan Unur <>
> (reverse each component and remove .invalid for email address)
>
> comp.lang.perl.misc guidelines on the WWW:
> http://mail.augustmail.com/~tadmc/cl...uidelines.html



 
Reply With Quote
 
Debo
Guest
Posts: n/a
 
      07-06-2005
On Wed, 6 Jul 2005, Ross wrote:

R> > #!/usr/bin/perl
R> >
R> > use strict;
R> > use warnings;
R> >
R> > my $s;
R> >
R> > while( <DATA> ) {
R> > chomp;
R> > $s .= $_;
R> > }
R>
R> what is .= ?
R>
R> It is most difficult to find explanation from Google as .= cannot be
R> searched.

If you have questions about perl's operators, 'perldoc perlop' should help
you out.

-Debo
 
Reply With Quote
 
A. Sinan Unur
Guest
Posts: n/a
 
      07-06-2005
"Ross" <> wrote in news:dagp7h$gm1$1
@justice.itsc.cuhk.edu.hk:

>> #!/usr/bin/perl
>>
>> use strict;
>> use warnings;
>>
>> my $s;
>>
>> while( <DATA> ) {
>> chomp;
>> $s .= $_;
>> }

>
> what is .= ?


perldoc perlop

>> --
>> A. Sinan Unur <>
>> (reverse each component and remove .invalid for email address)
>>
>> comp.lang.perl.misc guidelines on the WWW:
>> http://mail.augustmail.com/~tadmc/cl...uidelines.html


Please do not quote signatures unless you have something to say about
the signature itself. On the other hand, you would benefit from reading
the posting guidelines mentioned above. They contain invaluable
information on how you can help yourself, and help others help you.

With this message, you have given a strong signal that you are not
willing to do much work yourself. I hope, for your sake, that you will
send a strong signal in the opposite direction in your next post.

Sinan
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
change only the nth occurrence of a pattern in a string TP Python 5 01-14-2009 10:08 PM
Algorithm to find nth largest or nth smallest in a range Code4u C++ 4 07-13-2005 03:18 AM
Re:This is for the nth time I am posting. Is there no one to help! ani ASP .Net 1 11-07-2003 09:44 PM
This is for the nth time I am posting. Is there no one to help! ani ASP .Net 4 11-06-2003 03:30 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57