Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > regexp problem in perl 5.6.1 and 5.8.4

Reply
Thread Tools

regexp problem in perl 5.6.1 and 5.8.4

 
 
Thomas Stauffer
Guest
Posts: n/a
 
      06-04-2004
I have done some Perl programming in the past but I am by no means and
expert. I am currently working on changing some code written some time
ago by an employee no longer with the company. The code is currently
running under 5.005.02. I am making changes and adding some ucs2 ->
utf8 conversion. I want to run the code under Perl 5.8.4 to take
advantage of Perl's internal Unicode support. At any rate, there is a
regular expression in the code the works fine under 5.005.02 but loops
under 5.6.1 and above. Following code illustrates the problem:

$orig_string = 'JKXXAF';

$regex = qr {\G
# Match as many characters as possible
# that can be passed thru as-is
([^\x00-\xFF]+)

# Then try to match $A1 and next two bytes
| (@..)

# Otherwise just get the next byte
| (.)
}sx;

print "regex = $regex\n";

while ($orig_string =~ /$regex/g) {
print "\$1=$1\n";
print "\$2=$2\n";
print "\$3=$3\n";
}

The problem seems to be with the use of the \G attribute. If I take it
out, the regular expression works the same in all versions of Perl.
However, since I did not write the code and the programmer who did was
considerably more experienced using Perl than I am, I am hesitant just
to remove it. Anyhow, I have been looking at this for several days
without success. My Perl expert suggested I post it to this forum. Any
help would be greatly appreciated.

Following is the details of the version of Perl I'm using:

Summary of my perl5 (revision 5 version 8 subversion 4) configuration:
Platform:
osname=solaris, osvers=2.8, archname=sun4-solaris
uname='sunos cwu21awu 5.8 generic_108528-29 sun4u sparc
sunw,sun-blade-100 '
config_args=''
hint=recommended, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef
usemultiplicity=undef
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='/opt/SUNWspro/bin/cc', ccflags =' -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64',
optimize='-O',
cppflags=''
ccversion='Sun WorkShop 6 update 2 C 5.3 Patch 111679-08
2002/05/09', gccversion='', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=4321
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='/opt/SUNWspro/bin/cc', ldflags =' -L/usr/lib -L/usr/ccs/lib
-L/opt/SUNWspro/WS6U2/lib -L/usr/local/lib '
libpth=/usr/lib /usr/ccs/lib /opt/SUNWspro/WS6U2/lib /usr/local/lib
libs=-lsocket -lnsl -ldl -lm -lc
perllibs=-lsocket -lnsl -ldl -lm -lc
libc=/lib/libc.so, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
cccdlflags='-KPIC', lddlflags='-G -L/usr/lib -L/usr/ccs/lib
-L/opt/SUNWspro/WS6U2/lib -L/usr/local/lib'


Characteristics of this binary (from libperl):
Compile-time options: USE_LARGE_FILES
Built under solaris
Compiled at Apr 22 2004 16:07:19
@INC:
/usr/local/perl5/lib/5.8.4/sun4-solaris
/usr/local/perl5/lib/5.8.4
/usr/local/perl5/lib/site_perl/5.8.4/sun4-solaris
/usr/local/perl5/lib/site_perl/5.8.4
/usr/local/perl5/lib/site_perl

 
Reply With Quote
 
 
 
 
Anno Siegel
Guest
Posts: n/a
 
      06-05-2004
Thomas Stauffer <> wrote in comp.lang.perl.misc:
> I have done some Perl programming in the past but I am by no means and
> expert. I am currently working on changing some code written some time
> ago by an employee no longer with the company. The code is currently
> running under 5.005.02. I am making changes and adding some ucs2 ->
> utf8 conversion. I want to run the code under Perl 5.8.4 to take
> advantage of Perl's internal Unicode support. At any rate, there is a
> regular expression in the code the works fine under 5.005.02 but loops
> under 5.6.1 and above. Following code illustrates the problem:
>
> $orig_string = 'JKXXAF';
>
> $regex = qr {\G
> # Match as many characters as possible
> # that can be passed thru as-is
> ([^\x00-\xFF]+)
>
> # Then try to match $A1 and next two bytes
> | (@..)
>
> # Otherwise just get the next byte
> | (.)
> }sx;
>
> print "regex = $regex\n";
>
> while ($orig_string =~ /$regex/g) {
> print "\$1=$1\n";
> print "\$2=$2\n";
> print "\$3=$3\n";
> }
>
> The problem seems to be with the use of the \G attribute. If I take it
> out, the regular expression works the same in all versions of Perl.
> However, since I did not write the code and the programmer who did was
> considerably more experienced using Perl than I am, I am hesitant just
> to remove it. Anyhow, I have been looking at this for several days
> without success. My Perl expert suggested I post it to this forum. Any
> help would be greatly appreciated.


The \G is really not needed for the function of the loop. //g in scalar
context makes sure \G is implicitly matched before each match is attempted.

Note that adding \G only anchors the first alternative explicitly,
the second and third are free to match anywhere. One could argue
that scalar //g should still anchor the whole match, so the current
would be a bug. In any case, the behavior in presence of both
/G and //g appears to have changed.

Adding non-capturing parentheses around the alternative fixes the
behavior:

my $regex = qr { \G
(?:
# Match as many characters as possible
# that can be passed thru as-is
([^\x00-\xFF]+)

# Then try to match $A1 and next two bytes
| (@..)

# Otherwise just get the next byte
| (.)
)
}sx;

I'd say you can safely leave it \G off. If you want to keep it, add
the grouping, otherwise it doesn't make much sense.

Anno
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
new RegExp().test() or just RegExp().test() Matěj Cepl Javascript 3 11-24-2009 02:41 PM
[regexp] How to convert string "/regexp/i" to /regexp/i - ? Joao Silva Ruby 16 08-21-2009 05:52 PM
Ruby 1.9 - ArgumentError: incompatible encoding regexp match(US-ASCII regexp with ISO-2022-JP string) Mikel Lindsaar Ruby 0 03-31-2008 10:27 AM
Programmatically turning a Regexp into an anchored Regexp Greg Hurrell Ruby 4 02-14-2007 06:56 PM
RegExp.exec() returns null when there is a match - a JavaScript RegExp bug? Uldis Bojars Javascript 2 12-17-2006 09:50 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57