John W. Krahn wrote:
> Mark Shelor wrote:
>
>>Is it true that defining $/ to an integer reference (to read
>>fixed-length records) affects the meaning of the end-of-string symbol
>>($) in regex's?
>
>
> No, it is not true.
>
>
>>For example, let's say I'm reading 4096-byte chunks from a file, and
>>wish to do special processing if any chunk ends with the carriage-return
>>character (\015). So, I start with code that looks like:
>>
>>local $/ = \4096;
>>while (defined (my $rec = <F>)) {
>> while ($rec =~ /\015$/) {
>> # do special processing ...
>> }
>> ...
>>}
>>
>>Oddly, this doesn't seem to work. It ends up matching chunks that
>>contain, but don't necessarily end with, \015.
>>
>>Instead, I have to do this:
>>
>>local $/ = \4096;
>>while (defined (my $rec = <F>)) {
>> while (substr($rec, -1) eq "\015") {
>> # do special processing ...
>> }
>> ...
>>}
>>
>>Any idea what's going on?
>
>
> perldoc perlre
> [snip]
> By default, the "^" character is guaranteed to match only the beginning
> of the string, the "$" character only the end (or before the newline at
> the end), and Perl does certain optimizations with the assumption that
> the string contains only one line. Embedded newlines will not be
> matched by "^" or "$". You may, however, wish to treat a string as a
> multi-line buffer, such that the "^" will match after any newline
> within the string, and "$" will match before any newline. At the cost
> of a little more overhead, you can do this by using the /m modifier on
> the pattern match operator. (Older programs did this by setting $*,
> but this practice is now deprecated.)
>
>
> So the regular expression will match with either "\015" or "\015\012" at the
> end of the string. If you want it to only match at the end of the string use
> /\015\z/ or the substr() expression.
Now it all makes perfect sense. Thanks for citing the reference, and
thanks to you and MSG for the helpful replies.
As a side remark to MSG's response, both $ and \Z match *before* newline
at the end, so only /\015\z/ will work in this case.
Regards, Mark
|