Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   Issue: unexpected value in $2 (Perl 5.10.1) (http://www.velocityreviews.com/forums/t958979-issue-unexpected-value-in-2-perl-5-10-1-a.html)

John Bokma 03-22-2013 10:21 PM

Issue: unexpected value in $2 (Perl 5.10.1)
 


The following piece of code assigns an unexpected value to $2 in a
program I am working on. When I run tests, it works as expected, but
when I run the actual program (which parses currently 2 large XML files)
$2 gets assigned a random value when the second XML file is parsed [1]:

$iso_gmt =~ /^(\d{4})-(\d\d)-(\d\d)T(\d\d):(\d\d):(\d\d)Z$/
or croak "Not a valid ISO GMT date time: '$iso_gmt'";
# Note: without the assingment below $2 might be set to a different
# value for an unknown reason (bug in XML::Parser C code?)
# perl, v5.10.1
#my $month = $2;
my $seconds_since_epoch;
my $error;
$seconds_since_epoch = timegm( $6, $5, $4, $3, $2 - 1, $1 - 1900 );

$iso_gmt = '2012-10-30T21:05:01Z';

Examples of errors:

Month '20129103' out of range 0..11
Month '11826511' out of range 0..11
Month '20079951' out of range 0..11
Month '15160655' out of range 0..11
Month '21972303' out of range 0..11
Month '10208591' out of range 0..11

0133254f - 20129103
00b4754f - 11826511
0132654f - 20079951
00e7554f - 15160655
014f454f - 21972303
009bc54f - 10208591

^^^ looks like there is a pattern.

Uncommenting the assignment of $2 to $month removes the effect.
Reversing the parse order of the 2 XML files also removes the effect.

Anyone a suggestion (or 2) to pin point what goes wrong here (or is this
a know bug?). It looks like memory accidentally gets overwritten.


[1] using XML::Parser

--
John Bokma j3b

Blog: http://johnbokma.com/ Perl Consultancy: http://castleamber.com/
Perl for books: http://johnbokma.com/perl/help-in-ex...for-books.html

John Bokma 03-23-2013 02:58 AM

Re: Issue: unexpected value in $2 (Perl 5.10.1)
 
Ben Morrow <ben@morrow.me.uk> writes:

> Quoth John Bokma <john@castleamber.com>:
>>
>> The following piece of code assigns an unexpected value to $2 in a
>> program I am working on. When I run tests, it works as expected, but
>> when I run the actual program (which parses currently 2 large XML files)
>> $2 gets assigned a random value when the second XML file is parsed [1]:
>>
>> $iso_gmt =~ /^(\d{4})-(\d\d)-(\d\d)T(\d\d):(\d\d):(\d\d)Z$/
>> or croak "Not a valid ISO GMT date time: '$iso_gmt'";
>> # Note: without the assingment below $2 might be set to a different
>> # value for an unknown reason (bug in XML::Parser C code?)
>> # perl, v5.10.1
>> #my $month = $2;
>> my $seconds_since_epoch;
>> my $error;
>> $seconds_since_epoch = timegm( $6, $5, $4, $3, $2 - 1, $1 - 1900 );

>
> This code doesn't call XML::Parser. Where are you calling it?


My apologies for the unclear write up. The above piece of code is called
by code that uses XML::Parser to parse two XML files. Since the issue
seems to depend on the order in which the XML files are parsed I don't
think it's possible (or: easy) to create a minimal test case (otherwise
I certainly would've posted that one). Note: tests that I run via prove
to test the above code don't show the issue.

> Also, I would usually want to assign the return value of m// rather than
> relying on globals:


Yes, same here, usually ;-).

>> Examples of errors:
>>
>> Month '20129103' out of range 0..11
>> Month '11826511' out of range 0..11
>> Month '20079951' out of range 0..11
>> Month '15160655' out of range 0..11
>> Month '21972303' out of range 0..11
>> Month '10208591' out of range 0..11
>>
>> 0133254f - 20129103
>> 00b4754f - 11826511
>> 0132654f - 20079951
>> 00e7554f - 15160655
>> 014f454f - 21972303
>> 009bc54f - 10208591
>>
>> ^^^ looks like there is a pattern.

>
> Where does this output come from?


timegm throws the "Month ... out of range ...". I used a one liner to
convert those values to hex (given above).

[..]
> The output of Devel::Peek::Dump($2) at various points would be helpful.
> If you can it would also be worth trying with a perl built with
> -DDEBUGGING, to see if you get an assertion failure.


Thanks, will look into those two options. Right now I am not able to
reproduce it. Most likely due to code changes.

--
John Bokma j3b

Blog: http://johnbokma.com/ Perl Consultancy: http://castleamber.com/
Perl for books: http://johnbokma.com/perl/help-in-ex...for-books.html

Klaus 03-23-2013 01:06 PM

Re: Issue: unexpected value in $2 (Perl 5.10.1)
 
On 23 mar, 00:44, Ben Morrow <b...@morrow.me.uk> wrote:
> Quoth John Bokma <j...@castleamber.com>:
> > * * $iso_gmt =~ /^(\d{4})-(\d\d)-(\d\d)T(\d\d):(\d\d):(\d\d)Z$/
> > * *or croak "Not a valid ISO GMT date time: '$iso_gmt'";


> Also, I would usually want to assign the return value of m// rather than
> relying on globals:
> * * my ($year, $month, $day, $hour, $min, $sec) =
> * * * * $iso_gmt =~ /.../;


I agree, but I would also add that it is perfectly safe to use $1, $2,
$3, ... in an assignment to local variables immediatly after $iso_gmt
=~ /.../ or croak "..."

For example, the following lines are perfectly safe:

$iso_gmt =~ /.../ or croak "...";
my ($year, $month, $day, $hour, $min, $sec)
= ($1, $2, $3, $4, $5, $6);

However, the following line is not safe:

> > $seconds_since_epoch = timegm( $6, $5, $4, $3, $2 - 1, $1 - 1900 );


I am very suspicious about the fact that $1, $2, $3, ... are passed by
reference to a subroutine timegm( $6, $5, $4, $3, $2 - 1, $1 - 1900 ).

Variables such as $1, $2, $3, ... have very complicated scoping rules.

Passing such variables by reference to a subroutine makes it very hard
to guarantee the integrity of $1, $2, $3, ...

John Bokma 03-23-2013 03:01 PM

Re: Issue: unexpected value in $2 (Perl 5.10.1)
 
Klaus <klaus03@gmail.com> writes:

> However, the following line is not safe:
>
>> > $seconds_since_epoch = timegm( $6, $5, $4, $3, $2 - 1, $1 - 1900 );


Can you give me an example of calling a sub that shows this unsafe
behavior? And/or the scoping issues you mention?

I've always considered it perfectly safe to do something like:

$foo =~ /(.)/;
bar( $1 );

if not, I like to know cases this can go wrong. Thanks.

--
John Bokma j3b

Blog: http://johnbokma.com/ Perl Consultancy: http://castleamber.com/
Perl for books: http://johnbokma.com/perl/help-in-ex...for-books.html

Klaus 03-23-2013 04:10 PM

Re: Issue: unexpected value in $2 (Perl 5.10.1)
 
On 23 mar, 16:01, John Bokma <j...@castleamber.com> wrote:
> Klaus <klau...@gmail.com> writes:
> > However, the following line is not safe:

>
> >> > $seconds_since_epoch = timegm( $6, $5, $4, $3, $2 - 1, $1 - 1900 );

>
> Can you give me an example of calling a sub that shows this unsafe
> behavior? And/or the scoping issues you mention?
>
> I've always considered it perfectly safe to do something like:
>
> $foo =~ /(.)/;
> bar( $1 );
>
> if not, I like to know cases this can go wrong. Thanks.


Here is an example where

Case 1 uses the safe bar($var)
Case 2 uses the unsafe bar($1)

==============================================
use strict;
use warnings;

my $foo = 'abc';

$foo =~ /^(..).$/ or die "Error 1";
my $var = $1;
print "Case 1: ", bar($var), "\n";

$foo =~ /^(..).$/ or die "Error 2";
print "Case 2: ", bar($1), "\n";

sub bar {
my $z = '';

$_[0] =~ /^(.).$/ or return "Error 3";
$z .= $1;

$_[0] =~ /^.(.)$/ or return "Error 4";
$z .= $1;

return "ok ('$z')";
}
==============================================

The output is:
==============================================
Case 1: ok ('ab')
Case 2: Error 4
==============================================

Klaus 03-23-2013 04:36 PM

Re: Issue: unexpected value in $2 (Perl 5.10.1)
 
On 23 mar, 16:01, John Bokma <j...@castleamber.com> wrote:
> Klaus <klau...@gmail.com> writes:
> > However, the following line is not safe:
> >> > $seconds_since_epoch = timegm( $6, $5, $4, $3, $2 - 1, $1 - 1900 );


> And/or the scoping issues you mention?


see perldoc perlre:

++ These special variables, like the %+ hash and the numbered match
++ variables ($1 , $2 , $3 , etc.) are dynamically scoped until the
++ end of the enclosing block or until the next successful match,
++ whichever comes first.

The scoping issue is that the execution of timegm($6, ...) falls in
the same dynamically scoped block as the previously executed $iso_gmt
=~ /.../ or croak "...";

timegm($6, ...) can itself have regular expressions, in which case
that regular expression overwrites whatever is contained in $6.

The issue is that $6 is aliased to the first parameter of
timegm($6, ...).

These cases are not necessarily wrong, but they are very difficult to
get right.

Rainer Weikusat 03-23-2013 06:49 PM

Re: Issue: unexpected value in $2 (Perl 5.10.1)
 
Klaus <klaus03@gmail.com> writes:

[...]

> ==============================================
> use strict;
> use warnings;
>
> my $foo = 'abc';
>
> $foo =~ /^(..).$/ or die "Error 1";
> my $var = $1;
> print "Case 1: ", bar($var), "\n";
>
> $foo =~ /^(..).$/ or die "Error 2";
> print "Case 2: ", bar($1), "\n";
>
> sub bar {
> my $z = '';
>
> $_[0] =~ /^(.).$/ or return "Error 3";
> $z .= $1;


'use warnings' wouldn't complain about .= with a 'no value' variable
on the left hand side and even if this wasn't the case, the useless
concatenation could simply be replaced with a proper assignment in
order to avoid the "cargo cult 'initialization'" (my $z = '').

Peter J. Holzer 03-23-2013 07:31 PM

Re: Issue: unexpected value in $2 (Perl 5.10.1)
 
On 2013-03-23 16:36, Klaus <klaus03@gmail.com> wrote:
> On 23 mar, 16:01, John Bokma <j...@castleamber.com> wrote:
>> Klaus <klau...@gmail.com> writes:
>> > However, the following line is not safe:
>> >> > $seconds_since_epoch = timegm( $6, $5, $4, $3, $2 - 1, $1 - 1900 );

>
>> And/or the scoping issues you mention?

>
> see perldoc perlre:
>
> ++ These special variables, like the %+ hash and the numbered match
> ++ variables ($1 , $2 , $3 , etc.) are dynamically scoped until the
> ++ end of the enclosing block or until the next successful match,
> ++ whichever comes first.
>
> The scoping issue is that the execution of timegm($6, ...) falls in
> the same dynamically scoped block as the previously executed $iso_gmt
>=~ /.../ or croak "...";
>
> timegm($6, ...) can itself have regular expressions, in which case
> that regular expression overwrites whatever is contained in $6.


While that is theoretically possible, it doesn't seem to be the case
with timegm(). I just skimmed the source and don't see any use of
regexps there and frankly I can't think of any legitimate use timegm()
could have for regexps.


> The issue is that $6 is aliased to the first parameter of
> timegm($6, ...).


So I doubt that this is the issue. (Also I think John would have been
able to extract a minimal test case if it was that easy)

hp


--
_ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
|_|_) | Sysadmin WSR | Man feilt solange an seinen Text um, bis
| | | hjp@hjp.at | die Satzbestandteile des Satzes nicht mehr
__/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel

Eric Pozharski 03-24-2013 10:15 AM

Re: Issue: unexpected value in $2 (Perl 5.10.1)
 
with <87ehf6cg8x.fsf@castleamber.com> John Bokma wrote:
> Klaus <klaus03@gmail.com> writes:
>
>> However, the following line is not safe:
>>
>>> > $seconds_since_epoch = timegm( $6, $5, $4, $3, $2 - 1, $1 - 1900 );

>
> Can you give me an example of calling a sub that shows this unsafe
> behavior? And/or the scoping issues you mention?


#!/usr/bin/perl

use strict;
use warnings;
use feature qw{ say };

sub holy_cosmos {
my $input = shift @_;
'holy_hallelujah' =~ m{holy_(.*)};
say $1 // 'missing'
}

'holy_rainbow' =~ m{(.*)_(.*)};
say $1 // 'not found';
holy_cosmos $1;
say $1 // 'not found';

__END__

{2109:6} [0:0]% perl ~/foo.90HmzN.pl
holy
hallelujah
holy
{2374:7} [0:0]%

I doubt there's any. Probably, that was the case in pre-historic Perls.
Or this is some XS that directly manipulates variables in other scope.

*CUT*

--
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom

Randal L. Schwartz 03-25-2013 12:36 AM

Re: Issue: unexpected value in $2 (Perl 5.10.1)
 
>>>>> "Eric" == Eric Pozharski <whynot@pozharski.name> writes:

Eric> sub holy_cosmos {
Eric> my $input = shift @_;
Eric> 'holy_hallelujah' =~ m{holy_(.*)};
Eric> say $1 // 'missing'
Eric> }

This is bad. Very bad. If the match fails, it DOES NOT RESET the match
variables. So you have broken code from the getgo.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.posterous.com/ for Smalltalk discussion


All times are GMT. The time now is 01:57 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.