Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Unexpected RegEx results

Reply
Thread Tools

Unexpected RegEx results

 
 
QoS@domain.invalid.com
Guest
Posts: n/a
 
      02-26-2007

Hello, having some trouble solving this regular expression puzzle.
It is possible to solve the issue using some if statements, but im
curious why this is occurring.

The data involved looks similar to the following:

ALWAYSPRESENT:0008:0:OPTIONAL
OPTIONAL
OPTIONAL

Where the data will always start with a name.
This is followed by a colon some numbers a colon some numbers and a colon,
which will all be discarded.
Then there may or may not be some additional data after that.

Next there might be a newline followed by some optional data.
Finally there might be a newline followed by some optional data.

Ok here is my issue, the RegEx im using to do this will place data found
in the 3rd memory variable in the variable $4 when there is no match
to fill $4. So $4 will contain data but $3 will not, when i expected rather
that $3 would contain data and $4 would not.

Example troublesome data:

ALWAYSPRESENT:0008:0RESENT
PRESENT
NOTPRESENT

This is the offending RegEx.

$msg =~ /(.*?):.*.*)\n*(^.*)\n*(^.*)/m;

Thanks for any assistance.


 
Reply With Quote
 
 
 
 
QoS@domain.invalid.com
Guest
Posts: n/a
 
      02-26-2007

Jim Gibson <(E-Mail Removed)> wrote in message-id:
<260220071249048703%(E-Mail Removed)>
>
>In article <wyGEh.14260$sv6.3728@trndny08>, <(E-Mail Removed)>
>wrote:
>
>> Hello, having some trouble solving this regular expression puzzle.
>> It is possible to solve the issue using some if statements, but im
>> curious why this is occurring.
>>
>> The data involved looks similar to the following:
>>
>> ALWAYSPRESENT:0008:0:OPTIONAL
>> OPTIONAL
>> OPTIONAL
>>
>> Where the data will always start with a name.
>> This is followed by a colon some numbers a colon some numbers and a colon,
>> which will all be discarded.
>> Then there may or may not be some additional data after that.
>>
>> Next there might be a newline followed by some optional data.
>> Finally there might be a newline followed by some optional data.
>>
>> Ok here is my issue, the RegEx im using to do this will place data found
>> in the 3rd memory variable in the variable $4 when there is no match
>> to fill $4. So $4 will contain data but $3 will not, when i expected rather
>> that $3 would contain data and $4 would not.
>>
>> Example troublesome data:
>>
>> ALWAYSPRESENT:0008:0RESENT
>> PRESENT
>> NOTPRESENT
>>
>> This is the offending RegEx.
>>
>> $msg =~ /(.*?):.*.*)\n*(^.*)\n*(^.*)/m;

>
>I can't follow your logic entirely, but I suspect that you simply have
>too many unqualified '*' characters in your regex (I count 6) and it is
>causing confusion. For example, '\n*' need not match any characters at
>all. Perhaps you want '\n?' or '\n+' there instead.
>
>In any case, please post a complete, runnable program and somebody,
>perhaps even me, will be able to help you.
>
>--
>Jim Gibson
>
> Posted Via Usenet.com Premium Usenet Newsgroup Services
>----------------------------------------------------------
> ** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
>----------------------------------------------------------
> http://www.usenet.com


Ok, here is an example that demonstrates the quirks.
Notice in the second printout that what was in $3 in the first
printout is now in $4 and $3 contains ''.

And thanks very much for giving this a go!

#!usr/bin/Perl
use strict;
use warnings;

my $data;
$data = 'Some Text:0000:0:More Text'."\n".
'Text text'."\n".
'Text text text.'."\n";
&reformat($data);
$data = 'Some Text:0000:0:More Text'."\n".
'Text text'."\n";
&reformat($data);

exit;

sub reformat
{
my $msg = $_[0] || die "Invalid option in reformat\n";
my $out;
$msg =~ /(.*?):.*.*)\n*(^.*)\n*(^.*)/m;
$out = "$1,".
"1,".
"00000000000,".
"0000000,".
"000,".
"$2,".
"$3,".
"$4\n";
print $out;
print '================================================= ======',"\n";
return(1);
}



 
Reply With Quote
 
 
 
 
John W. Krahn
Guest
Posts: n/a
 
      02-26-2007
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> Hello, having some trouble solving this regular expression puzzle.
> It is possible to solve the issue using some if statements, but im
> curious why this is occurring.
>
> The data involved looks similar to the following:
>
> ALWAYSPRESENT:0008:0:OPTIONAL
> OPTIONAL
> OPTIONAL
>
> Where the data will always start with a name.
> This is followed by a colon some numbers a colon some numbers and a colon,
> which will all be discarded.
> Then there may or may not be some additional data after that.
>
> Next there might be a newline followed by some optional data.
> Finally there might be a newline followed by some optional data.
>
> Ok here is my issue, the RegEx im using to do this will place data found
> in the 3rd memory variable in the variable $4 when there is no match
> to fill $4. So $4 will contain data but $3 will not, when i expected rather
> that $3 would contain data and $4 would not.
>
> Example troublesome data:
>
> ALWAYSPRESENT:0008:0RESENT
> PRESENT
> NOTPRESENT
>
> This is the offending RegEx.
>
> $msg =~ /(.*?):.*.*)\n*(^.*)\n*(^.*)/m;


$ perl -le'
my @x = ( <<ONE, <<TWO );
ALWAYSPRESENT:0008:0:OPTIONAL
OPTIONAL
OPTIONAL
ONE
ALWAYSPRESENT:0008:0RESENT
PRESENT
TWO

for ( @x ) {
print "1=$1 2=$2 3=$3 4=$4" if /(.*?):.*.*)\n*(^.*)\n*(^.*)/m;
}
'
1=ALWAYSPRESENT 2=OPTIONAL 3=OPTIONAL 4=OPTIONAL
1=ALWAYSPRESENT 2=PRESENT 3= 4=PRESENT



You are using the /m option and the ^ anchor which tells perl that there
*must* be at least three lines even if there are only two lines.

$ perl -le'
my @x = ( <<ONE, <<TWO );
ALWAYSPRESENT:0008:0:OPTIONAL
OPTIONAL
OPTIONAL
ONE
ALWAYSPRESENT:0008:0RESENT
PRESENT
TWO

for ( @x ) {
print "1=$1 2=$2 3=$3 4=$4" if /(.*?):.*.*)\n*(.*)\n*(.*)/;
}
'
1=ALWAYSPRESENT 2=OPTIONAL 3=OPTIONAL 4=OPTIONAL
1=ALWAYSPRESENT 2=PRESENT 3=PRESENT 4=




John
--
Perl isn't a toolbox, but a small machine shop where you can special-order
certain sorts of tools at low cost and in short order. -- Larry Wall
 
Reply With Quote
 
Mirco Wahab
Guest
Posts: n/a
 
      02-26-2007
(E-Mail Removed) wrote:
> my $data;
> $data = 'Some Text:0000:0:More Text'."\n".
> 'Text text'."\n".
> 'Text text text.'."\n";


Thats better. Real data

My first shot:


....
my $data='
Some Text:0000:0:More Text
Text text
Text text text
';

my $rg = qr/
^([^:]+) : \d+ : \d+ : ([^\n]+)?\n
(?: ^([^:\n]+?) \n)?
(?: ^([^:\n]+?) (?:\n|$) )?/mx;

if( $data =~ /$rg/ ) {
print join "\n", map defined $_?$_:'undef', ($1, $2, $3, $4);
}


Regards

M.
 
Reply With Quote
 
QoS@domain.invalid.com
Guest
Posts: n/a
 
      02-26-2007

(E-Mail Removed) wrote in message-id:
<wyGEh.14260$sv6.3728@trndny08>
>

[Snip]

Thank you everybody for helping solve this little mystery.

Your solutions and workarounds are quite clever!
I was unaware of that 'm' option side-effect.



 
Reply With Quote
 
Mirco Wahab
Guest
Posts: n/a
 
      02-26-2007
(E-Mail Removed) wrote:
> (E-Mail Removed) wrote in message-id:
> <wyGEh.14260$sv6.3728@trndny08>
> [Snip]
>
> Thank you everybody for helping solve this little mystery.
>
> Your solutions and workarounds are quite clever!
> I was unaware of that 'm' option side-effect.


I was under the impression your data
would not only consist of /one/ record
but rather a good sequence of them, so
the regex would need to climb down
(find) the records and spit out the
correct matches,

# Example: four record thing with "offending" structure ==>

my $morestuff='
ALWAYSPRESENT:0008:0RESENT
PRESENT
MAYBEPRESENT
Some Text 1:0000:0:More Text 1
Some Text 2:0000:0:More Text 2
Text22 text22 text22
Some Text 3:0000:0:More Text 3
Text3 text3
Text33 text33 text33
';
# and so on ...

# Now, the regex should identify them
# and step along ==>

my $rg = qr/ \s*
^([^:]+) : \d+ : \d+ : ([^\n]+)?\n
(?: ^([^:\n]+?) \n)?
(?: ^([^:\n]+?) (?:\n|$) )?/mx;

# This was the "shortest" thing I could find so
# far (within your constraints), the record-
# walking would be within a while ==>

while( $morestuff =~ /$rg/g ) {

printf "%s %s\n\t%s\n\t%s\n",
$1||'undef', $2||'undef',
$3||'undef', $4||'undef';

}

# ... which would give the correct matches.


Maybe I misunderstood your problem somehow,
but I found the task quite nice and interesting
(maybe somebody would write down a really simple
regular expression for that - (not me, sleeping
time now in this country .

Regards

Mirco
 
Reply With Quote
 
Broke
Guest
Posts: n/a
 
      03-07-2007
Mirco Wahab <(E-Mail Removed)> wrote:

Very good job Mr. Wahab !
I didn't know yet the
secret of the qr in your
code and just learned it.
It's extremely useful.

Many thanks !
--
B.

> My first shot:
>
>
> ...
> my $data='
> Some Text:0000:0:More Text
> Text text
> Text text text
> ';
>
> my $rg = qr/
> ^([^:]+) : \d+ : \d+ : ([^\n]+)?\n
> (?: ^([^:\n]+?) \n)?
> (?: ^([^:\n]+?) (?:\n|$) )?/mx;
>
> if( $data =~ /$rg/ ) {
> print join "\n", map defined $_?$_:'undef', ($1, $2, $3, $4);
> }
>
>
> Regards
>
> M.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How make regex that means "contains regex#1 but NOT regex#2" ?? seberino@spawar.navy.mil Python 3 07-01-2008 03:06 PM
Page inherting from .master - unexpected results =?Utf-8?B?QXJ0?= ASP .Net 0 05-26-2006 01:13 AM
Unexpected results in matching a regex Dave C++ 5 02-08-2006 07:11 PM
Unexpected performance results Dave C++ 1 04-08-2004 07:06 PM
Re: unexpected results Scott Lander Perl 0 07-07-2003 02:28 PM



Advertisments