Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > regexp that seems not to work since 5.10

Reply
Thread Tools

regexp that seems not to work since 5.10

 
 
Sébastien Cottalorda
Guest
Posts: n/a
 
      11-18-2009
Hi all,
I use a regexp to split a network frame protocol like this.

#-------------------------------------------------------------------
#!/usr/bin/perl -w
use strict;
use constant ETX => chr( hex('03'));
use constant ACK => chr( hex('06'));
use constant NACK => chr( hex('15'));
my $endcar = ACK.'|'.NACK.'|'.ETX.'.{1}',
my $line = 'hello World'.ETX.'XHow are you today ?'.ETX.'XWell, not so
bad.'.ETX.'X';
while ($line =~ s/([^$endcar]*$endcar)//){
my $buf = $1;
print $buf."\n";
}
print "$line\n";
exit;
#--------------------------------------------------------------------

With 5.8.X version, I use to have:
hello World
How are you today ?
Well, not so bad.

Now I have :
X
X
X
hello WorldHow are you today ?Well, not so bad.

Could someone help me to solve that problem.

Thanks in advance for any help.
Cheers.
Sebastien
 
Reply With Quote
 
 
 
 
Sébastien Cottalorda
Guest
Posts: n/a
 
      11-18-2009
Sorry,

With 5.8.X version, I use to have:
hello World{ETX}X
How are you today ?{ETX}X
Well, not so bad.{ETX}X

 
Reply With Quote
 
 
 
 
C.DeRykus
Guest
Posts: n/a
 
      11-18-2009
On Nov 18, 3:05*am, Sébastien Cottalorda <(E-Mail Removed)>
wrote:
> Hi all,
> I use a regexp to split a network frame protocol like this.
>
> #-------------------------------------------------------------------
> #!/usr/bin/perl -w
> use strict;
> use constant ETX *=> chr( hex('03'));
> use constant ACK *=> chr( hex('06'));
> use constant NACK => chr( hex('15'));
> my $endcar = ACK.'|'.NACK.'|'.ETX.'.{1}',

^
^
typo - trailing , instead of ;

> my $line = 'hello World'.ETX.'XHow are you today ?'.ETX.'XWell, not so
> bad.'.ETX.'X';
> while ($line =~ s/([^$endcar]*$endcar)//){

^
^
Did you know that alternation and quantifiers
aren't special in a character class..? The
| and {1} in $endcar aren't doing what you
might think at first glance. See perlrequick
or perlretut.

> * * * * my $buf = $1;
> * * * * print $buf."\n";}
>
> print "$line\n";
> exit;
> ...


--
Charles DeRykus
 
Reply With Quote
 
Sébastien Cottalorda
Guest
Posts: n/a
 
      11-18-2009
On 18 nov, 15:41, "C.DeRykus" <(E-Mail Removed)> wrote:
> On Nov 18, 3:05*am, Sébastien Cottalorda <(E-Mail Removed)>
> wrote:> Hi all,
> > I use a regexp to split a network frame protocol like this.

>
> > #-------------------------------------------------------------------
> > #!/usr/bin/perl -w
> > use strict;
> > use constant ETX *=> chr( hex('03'));
> > use constant ACK *=> chr( hex('06'));
> > use constant NACK => chr( hex('15'));
> > my $endcar = ACK.'|'.NACK.'|'.ETX.'.{1}';
> > my $line = 'hello World'.ETX.'XHow are you today ?'.ETX.'XWell, not so
> > bad.'.ETX.'X';
> > while ($line =~ s/([^$endcar]*$endcar)//){

>
> * * * * * * * * * * * * * ^
> * * * * * * * * * * * * * ^
> * * * * * * * Did you know that alternation and quantifiers
> * * * * * * * aren't special in a character class..? The
> * * * * * * * | and {1} in $endcar aren't doing what you
> * * * * * * * might think at first glance. See perlrequick
> * * * * * * * or perlretut.
>
> > * * * * my $buf = $1;
> > * * * * print $buf."\n";}

>
> > print "$line\n";
> > exit;
> > ...

>
> --
> Charles DeRykus


I've tried those modifications :
with
my $endcar = ACK.'|'.NACK.'|'.ETX;
my $line = 'hello World'.ETX.'How are you today ?'.ETX.'Well, not so
bad.'.ETX;
while ($line =~ s/([^($endcar)]*($endcar))//){
it works pretty good but I cannot manage to make it works with ACK,
NACK and ETX.'.'


I even tried this:
my $endcar = ACK.'|'.NACK.'|'.ETX.'.';
my $line = 'hello World'.ETX.'XHow are you today ?'.ETX.'XWell, not so
bad.'.ETX.'X';
while ($line =~ s/([[:^cntrl:]]*($endcar))//){
and it works perfectly but it's a particular case : I suppose that
split caracters are controls.

but this regexp didn't work with :
my $endcar = ACK.'|'.NACK.'|'.ETX.'.';
my $line = STX.'hello World'.ETX.'X'.ACK.NACK.STX.'How are you
today ?'.ETX.'X'.ACK.STX.'Well, not so bad.'.ETX.NACK;
Unfortunately I need to make that last sample to work.

If someone as a clue ?
Thanks in advance.
Sebastien
 
Reply With Quote
 
Dr.Ruud
Guest
Posts: n/a
 
      11-18-2009
Sébastien Cottalorda wrote:

> I use a regexp to split a network frame protocol like this.
>
> #-------------------------------------------------------------------
> #!/usr/bin/perl -w
> use strict;
> use constant ETX => chr( hex('03'));


Alternative:

use constant ETX => "\x{03}";


> use constant ACK => chr( hex('06'));
> use constant NACK => chr( hex('15'));
> my $endcar = ACK.'|'.NACK.'|'.ETX.'.{1}',


Why does that line end in a comma?

my ($ETX, $ACK, $NACK) = ("\x{03}", "\x{06}", "\x{15}");

my $endcar = "(?:$ETX|$ACK|$NACK)"; # alternation


Alternative:

my $endcar= "[$ETX$ACK$NACK]"; # charset


> while ($line =~ s/([^$endcar]*$endcar)//){


You are messing up character class and alternation there.


With your $endcar, this would work:

while ($line =~ s/(.*?(?:$endcar))//s){



--
Ruud
 
Reply With Quote
 
Dr.Ruud
Guest
Posts: n/a
 
      11-18-2009
Ben Morrow wrote:

> I suspect what the OP wants here is
>
> my $endcar = "\x3\x6\x15";
>
> while ($line =~ s/([^$endcar]*[$endcar].//) {


That is more or less (count the half captures what I assumed,
and I also assumed that he would find out the rest himself.

--
Ruud
 
Reply With Quote
 
sln@netherlands.com
Guest
Posts: n/a
 
      11-18-2009
On Wed, 18 Nov 2009 18:30:18 +0000, Ben Morrow <(E-Mail Removed)> wrote:

>
>Quoth "Dr.Ruud" <(E-Mail Removed)>:
>> Sébastien Cottalorda wrote:
>>
>> > I use a regexp to split a network frame protocol like this.
>> >
>> > #-------------------------------------------------------------------
>> > #!/usr/bin/perl -w
>> > use strict;
>> > use constant ETX => chr( hex('03'));

>>
>> Alternative:
>>
>> use constant ETX => "\x{03}";
>>
>>
>> > use constant ACK => chr( hex('06'));
>> > use constant NACK => chr( hex('15'));
>> > my $endcar = ACK.'|'.NACK.'|'.ETX.'.{1}',

>>
>> Why does that line end in a comma?
>>
>> my ($ETX, $ACK, $NACK) = ("\x{03}", "\x{06}", "\x{15}");
>>
>> my $endcar = "(?:$ETX|$ACK|$NACK)"; # alternation

>
>You've omitted the trailing '.{1}' (which is equivalent to just '.').
>
> my $endcar = "(?:$ETX|$ACK|$NACK).";

^
Seems reasonable the op meant a single char in the alternation
given his: my $endcar = ACK.'|'.NACK.'|'.ETX.'.{1}',
Otherwise if a group its catenated like:
ACK|NACK|ETX.{1} or
(?:$ACK|$NACK|$ETX.)
where an alternation is ETX plus any character,
which is probably a mistake.

>
>> Alternative:
>>
>> my $endcar= "[$ETX$ACK$NACK]"; # charset

>
>As above.
>
>> > while ($line =~ s/([^$endcar]*$endcar)//){

>>
>> You are messing up character class and alternation there.
>>
>>
>> With your $endcar, this would work:
>>
>> while ($line =~ s/(.*?(?:$endcar))//s){

>
>That depends. /.*?/ is not always equivalent to a negated end condition,
>for instance /.*?>x/ will match all of ">>x" whereas /[^>]*>x/ will only
>match the last two characters. I suspect what the OP wants here is


But in this case it makes no sence to add characters after the endchar
since you want all from beginning, up to that character, not starting the
match in the middle of the string. Its a total sub-expression '.*?>', part
of an alternation.

In that case given ">>x":
/^.*?>x//
works, whereas
/^[^>]*>x/
doesen't.

>
> my $endcar = "\x3\x6\x15";
>
> while ($line =~ s/([^$endcar]*[$endcar].//) {

while ($line =~ s/([^$endcar]*[$endcar].)//) {

>
>possibly with a /s modifier, since this is a binary protocol so random
>newlines seem likely.


Not if you take out the '.'

-sln
 
Reply With Quote
 
C.DeRykus
Guest
Posts: n/a
 
      11-19-2009
On Nov 18, 8:37*am, Sébastien Cottalorda <(E-Mail Removed)>
wrote:
> On 18 nov, 15:41, "C.DeRykus" <(E-Mail Removed)> wrote:
> ...
>
> I even tried this:
> my $endcar = ACK.'|'.NACK.'|'.ETX.'.';
> my $line = 'hello World'.ETX.'XHow are you today ?'.ETX.'XWell, not so
> bad.'.ETX.'X';
> while ($line =~ s/([[:^cntrl:]]*($endcar))//){
> and it works perfectly but it's a particular case : I suppose that
> split caracters are controls.
>
> but this regexp didn't work with :
> my $endcar = ACK.'|'.NACK.'|'.ETX.'.';
> my $line = STX.'hello World'.ETX.'X'.ACK.NACK.STX.'How are you
> today ?'.ETX.'X'.ACK.STX.'Well, not so bad.'.ETX.NACK;
> Unfortunately I need to make that last sample to work.


Here's a closer cut I think since you were negating
the character class:

my $endcar = STX . '|' . ACK . '|' . NACK . '|' . ETX ;
while ($line =~ s/([[:cntrl:]]*($endcar))//){
...
}
print $line;


Case 1:
my $line = 'hello World'.ETX.'XHow are you today
?'.ETX.'XWell, not so
output: hello WorldXHow are you today ?XWell, not so
bad.X

Case 2:
my $line = STX.'hello World'.ETX.'X'.ACK.NACK.STX.'How are you
today ?'.ETX.'X'.ACK.STX.'Well, not so bad.'.ETX.NACK;

output: hello WorldXHow are you
today ?XWell, not so bad.

--
Charles DeRykus
 
Reply With Quote
 
Sébastien Cottalorda
Guest
Posts: n/a
 
      11-19-2009
Found a solution with the help of Olivier Makinen.

use constant STX => chr( hex('02'));
use constant ETX => chr( hex('03'));
use constant ACK => chr( hex('06'));
use constant NACK => chr( hex('15'));
my $line = STX.'hello World'.ETX.'X'.ACK.NACK.STX.'How are you
today ?'.ETX.'X'.ACK.STX.'Well, not so bad.'.ETX.NACK;

my $noendcar = '[^' . ACK . ETX . NACK . ']';
my $endstring = '(' . ACK . '|' . ETX . '.|' . NACK . ')';
while ($line =~ s/$noendcar*$endstring//) {
print "buf=$&\n";
}
print "lastbuffer = $line\n";

I obtains:
buf={STX}hello World{ETX}X
buf={ACK}
buf={NACK}
buf={STX}How are you today ?{ETX}X
buf={ACK}
buf={STX}Well, not so bad.{ETX}X
buf={NACK}
lastbuffer = .... (empty)

It works perfectly.
Thanks all for your help.
Sebastien
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
problem in running a basic code in python 3.3.0 that includes HTML file Satabdi Mukherjee Python 1 04-04-2013 07:48 PM
[regexp] How to convert string "/regexp/i" to /regexp/i - ? Joao Silva Ruby 16 08-21-2009 05:52 PM
Byters? Since the distinction between interpreters and compilers seems to be hazy sometimes, has anybody proposed a third distinction? Casey Hawthorne Java 4 10-20-2005 03:29 PM
since noone seems to know in this group Dark Alchemist Digital Photography 6 09-24-2003 04:16 PM
Expiration date seems not to work Frederic Gignac ASP .Net 2 07-08-2003 01:27 PM



Advertisments