Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Unexpected regex Behavior

Reply
Thread Tools

Unexpected regex Behavior

 
 
Mark Shelor
Guest
Posts: n/a
 
      05-14-2006
Is it true that defining $/ to an integer reference (to read
fixed-length records) affects the meaning of the end-of-string symbol
($) in regex's?

For example, let's say I'm reading 4096-byte chunks from a file, and
wish to do special processing if any chunk ends with the carriage-return
character (\015). So, I start with code that looks like:

local $/ = \4096;
while (defined (my $rec = <F>)) {
while ($rec =~ /\015$/) {
# do special processing ...
}
...
}

Oddly, this doesn't seem to work. It ends up matching chunks that
contain, but don't necessarily end with, \015.

Instead, I have to do this:

local $/ = \4096;
while (defined (my $rec = <F>)) {
while (substr($rec, -1) eq "\015") {
# do special processing ...
}
...
}

Any idea what's going on?

Thanks, Mark
 
Reply With Quote
 
 
 
 
MSG
Guest
Posts: n/a
 
      05-14-2006
Mark Shelor wrote:
> Is it true that defining $/ to an integer reference (to read
> fixed-length records) affects the meaning of the end-of-string symbol
> ($) in regex's?
>
> local $/ = \4096;
> while (defined (my $rec = <F>)) {
> while ($rec =~ /\015$/) {
> # do special processing ...

($) is not exactly the end-of-string symbol, it is end-of-line symbol.
(Z) or (z) is end-of-string symbol and should serves your purpose.

Also I feel that "if" is better than a "while" loop ( the 2nd one),
since
you only want to match one \015 at the end of the string.

 
Reply With Quote
 
 
 
 
John W. Krahn
Guest
Posts: n/a
 
      05-14-2006
Mark Shelor wrote:
> Is it true that defining $/ to an integer reference (to read
> fixed-length records) affects the meaning of the end-of-string symbol
> ($) in regex's?


No, it is not true.

> For example, let's say I'm reading 4096-byte chunks from a file, and
> wish to do special processing if any chunk ends with the carriage-return
> character (\015). So, I start with code that looks like:
>
> local $/ = \4096;
> while (defined (my $rec = <F>)) {
> while ($rec =~ /\015$/) {
> # do special processing ...
> }
> ...
> }
>
> Oddly, this doesn't seem to work. It ends up matching chunks that
> contain, but don't necessarily end with, \015.
>
> Instead, I have to do this:
>
> local $/ = \4096;
> while (defined (my $rec = <F>)) {
> while (substr($rec, -1) eq "\015") {
> # do special processing ...
> }
> ...
> }
>
> Any idea what's going on?


perldoc perlre
[snip]
By default, the "^" character is guaranteed to match only the beginning
of the string, the "$" character only the end (or before the newline at
the end), and Perl does certain optimizations with the assumption that
the string contains only one line. Embedded newlines will not be
matched by "^" or "$". You may, however, wish to treat a string as a
multi-line buffer, such that the "^" will match after any newline
within the string, and "$" will match before any newline. At the cost
of a little more overhead, you can do this by using the /m modifier on
the pattern match operator. (Older programs did this by setting $*,
but this practice is now deprecated.)


So the regular expression will match with either "\015" or "\015\012" at the
end of the string. If you want it to only match at the end of the string use
/\015\z/ or the substr() expression.



John
--
use Perl;
program
fulfillment
 
Reply With Quote
 
Mark Shelor
Guest
Posts: n/a
 
      05-14-2006
John W. Krahn wrote:
> Mark Shelor wrote:
>
>>Is it true that defining $/ to an integer reference (to read
>>fixed-length records) affects the meaning of the end-of-string symbol
>>($) in regex's?

>
>
> No, it is not true.
>
>
>>For example, let's say I'm reading 4096-byte chunks from a file, and
>>wish to do special processing if any chunk ends with the carriage-return
>>character (\015). So, I start with code that looks like:
>>
>>local $/ = \4096;
>>while (defined (my $rec = <F>)) {
>> while ($rec =~ /\015$/) {
>> # do special processing ...
>> }
>> ...
>>}
>>
>>Oddly, this doesn't seem to work. It ends up matching chunks that
>>contain, but don't necessarily end with, \015.
>>
>>Instead, I have to do this:
>>
>>local $/ = \4096;
>>while (defined (my $rec = <F>)) {
>> while (substr($rec, -1) eq "\015") {
>> # do special processing ...
>> }
>> ...
>>}
>>
>>Any idea what's going on?

>
>
> perldoc perlre
> [snip]
> By default, the "^" character is guaranteed to match only the beginning
> of the string, the "$" character only the end (or before the newline at
> the end), and Perl does certain optimizations with the assumption that
> the string contains only one line. Embedded newlines will not be
> matched by "^" or "$". You may, however, wish to treat a string as a
> multi-line buffer, such that the "^" will match after any newline
> within the string, and "$" will match before any newline. At the cost
> of a little more overhead, you can do this by using the /m modifier on
> the pattern match operator. (Older programs did this by setting $*,
> but this practice is now deprecated.)
>
>
> So the regular expression will match with either "\015" or "\015\012" at the
> end of the string. If you want it to only match at the end of the string use
> /\015\z/ or the substr() expression.



Now it all makes perfect sense. Thanks for citing the reference, and
thanks to you and MSG for the helpful replies.

As a side remark to MSG's response, both $ and \Z match *before* newline
at the end, so only /\015\z/ will work in this case.

Regards, Mark
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How make regex that means "contains regex#1 but NOT regex#2" ?? seberino@spawar.navy.mil Python 3 07-01-2008 03:06 PM
Unexpected page designer behavior Chuck Bowling ASP .Net 1 07-04-2005 02:06 PM
Unexpected datagrid behavior G Dean Blake ASP .Net 0 01-13-2005 04:56 PM
Re: std::ostringstream unexpected behavior with .net 2003. Russell Hanneken C++ 0 06-25-2003 10:22 PM
Re: std::ostringstream unexpected behavior with .net 2003. Victor Bazarov C++ 0 06-25-2003 10:20 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57