Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > backreference oddity

Reply
Thread Tools

backreference oddity

 
 
poncenby
Guest
Posts: n/a
 
      10-06-2006
i have a file which has lines of text with fields separated by a space.
some of the fields are prefixed with a number and a space, like the
line below...

bar1 bar2 XX 10 bar3tooten
foo1 foo2 XX 15 foo3uptofifteen

as you can see, the numbers (10 and 15) are the length of the field
after the number.
so i want to use these numbers as length specifier to match the field
after the number, with a regex like either of these:

/(.+)\s(.+)\sXX\s([0-9)+)\s(.{$3})/
/(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/

both regexs will make the program fall over when attempting to print
$4.

i've figured out a solution with a regex over two lines but am curious
why this doesn't work.

thanks in advance

poncenby

 
Reply With Quote
 
 
 
 
A. Sinan Unur
Guest
Posts: n/a
 
      10-06-2006
"poncenby" <> wrote in
news: ups.com:

> i have a file which has lines of text with fields separated by a
> space. some of the fields are prefixed with a number and a space, like
> the line below...
>
> bar1 bar2 XX 10 bar3tooten
> foo1 foo2 XX 15 foo3uptofifteen
>
> as you can see, the numbers (10 and 15) are the length of the field
> after the number.
> so i want to use these numbers as length specifier to match the field
> after the number, with a regex like either of these:
>
> /(.+)\s(.+)\sXX\s([0-9)+)\s(.{$3})/


You can't used the capture variable here, but your problem is ...

> /(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/


Ahem ... Did you read the error message? Without any testing, I can see
that you should havve [0-9]+ rather than the [0-9)+ you used above.

Have you read the posting guidelines yet? You should always post a short
but complete script that illustrates the problem, so others can try it
with the minimum of effort.

Sinan

Sinan
 
Reply With Quote
 
 
 
 
anno4000@radom.zrz.tu-berlin.de
Guest
Posts: n/a
 
      10-07-2006
poncenby <> wrote in comp.lang.perl.misc:
> i have a file which has lines of text with fields separated by a space.
> some of the fields are prefixed with a number and a space, like the
> line below...
>
> bar1 bar2 XX 10 bar3tooten
> foo1 foo2 XX 15 foo3uptofifteen
>
> as you can see, the numbers (10 and 15) are the length of the field
> after the number.
> so i want to use these numbers as length specifier to match the field
> after the number, with a regex like either of these:
>
> /(.+)\s(.+)\sXX\s([0-9)+)\s(.{$3})/
> /(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/
>
> both regexs will make the program fall over when attempting to print
> $4.


Earlier than that, as has been noted.

>
> i've figured out a solution with a regex over two lines but am curious
> why this doesn't work.


If a regex gets that big it's time to try something else. The
pack/unpack functions have a template that can deal with an embedded
length field. The following code shows how.

We first use split() to retrieve the three blank-separated variables
and the rest of the line. The rest starts with the length-delimited
field. We can use unpack to split off the length-delimited part
(the 'a3/a' template does that) and capture whatever is left over
after that ('a*'). I have added some extra noise at the line ends
to show that the length field is interpreted correctly. See
"perldoc -f pack" for the details.

while ( <DATA> ) {
chomp;
my ( $one, $two, $three, $rest) = split ' ', $_, 4;
my $four;
( $four, $rest) = unpack 'a3/a a*', $rest;
print "$one, $two, $three, $four, $rest\n";
}

__DATA__
bar1 bar2 XX 10 bar3tooten+some
foo1 foo2 XX 15 foo3uptofifteen+more

Anno
 
Reply With Quote
 
Eric Amick
Guest
Posts: n/a
 
      10-07-2006
On Fri, 06 Oct 2006 22:50:13 GMT, "A. Sinan Unur"
<> wrote:

>> /(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/

>
>Ahem ... Did you read the error message? Without any testing, I can see
>that you should havve [0-9]+ rather than the [0-9)+ you used above.


Maybe this is version-dependent, but that won't do what the OP wants
even after fixing the syntax error with [0-9). Repeat counts in curly
brackets have to be constants. Try this and see what I mean:

perl -Mre=debug -e "/(.+)\s(.{\1})/"
--
Eric Amick
Columbia, MD
 
Reply With Quote
 
Bob Walton
Guest
Posts: n/a
 
      10-07-2006
poncenby wrote:
> i have a file which has lines of text with fields separated by a space.
> some of the fields are prefixed with a number and a space, like the
> line below...
>
> bar1 bar2 XX 10 bar3tooten
> foo1 foo2 XX 15 foo3uptofifteen
>
> as you can see, the numbers (10 and 15) are the length of the field
> after the number.
> so i want to use these numbers as length specifier to match the field
> after the number, with a regex like either of these:
>
> /(.+)\s(.+)\sXX\s([0-9)+)\s(.{$3})/

]-----------------------^
> /(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/

]-----------------------^
>
> both regexs will make the program fall over when attempting to print
> $4.
>
> i've figured out a solution with a regex over two lines but am curious
> why this doesn't work.


Doesn't work because of the syntax error. And because the contents of
the {...} construction have to be literal digits or digits,digits .

For a one-liner, try something like:

use strict;
use warnings;
my $v;
while(<DATA>){
chomp;
s/^(.+)\s(.+)\sXX\s(\d+)\s(.*)/$v=substr($4,0,$3);"$1 $2 XX $3 $4";/e;
print "line:$_:\nv:$v:\n";
}
__END__
bar1 bar2 XX 10 bar3tootenblahblahblah
foo1 foo2 XX 15 foo3uptofifteenyadayadayada

(Data was padded to illustrate that it works.) The second expression in
the replacement expression is present so the value of the replacement
string is the same as the original string so the "matched" variable is
preserved in the substitution. Also, I anchored the start so it won't
match starting partway through a line. Generates:

D:\junk>perl junk574.pl
line:bar1 bar2 XX 10 bar3tootenblahblahblah:
v:bar3tooten:
line:foo1 foo2 XX 15 foo3uptofifteenyadayadayada:
v:foo3uptofifteen:

D:\junk>
....
> poncenby

--
Bob Walton
Email: http://bwalton.com/cgi-bin/emailbob.pl
 
Reply With Quote
 
A. Sinan Unur
Guest
Posts: n/a
 
      10-07-2006
Eric Amick <eric-> wrote in
news::

> On Fri, 06 Oct 2006 22:50:13 GMT, "A. Sinan Unur"
> <> wrote:
>
>>> /(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/

>>
>>Ahem ... Did you read the error message? Without any testing, I can see
>>that you should havve [0-9]+ rather than the [0-9)+ you used above.

>
> Repeat counts in curly brackets have to be constants.


I knew that, of course

Thanks for the correction. I focused on the most obvious error and missed
the other one.

Sinan
 
Reply With Quote
 
Dr.Ruud
Guest
Posts: n/a
 
      10-07-2006
poncenby schreef:

> i have a file which has lines of text with fields separated by a
> space. some of the fields are prefixed with a number and a space,
> like the line below...
>
> bar1 bar2 XX 10 bar3tooten
> foo1 foo2 XX 15 foo3uptofifteen
>
> as you can see, the numbers (10 and 15) are the length of the field
> after the number.


Are these meant for fields with embedded blanks? If not, see split().


> so i want to use these numbers as length specifier to match the field
> after the number, with a regex like either of these:
>
> /(.+)\s(.+)\sXX\s([0-9)+)\s(.{$3})/


In addition to the other comments: the "(.+)\s" might first match up to
the last space, and backtrack from there. Change to "(\S+)\s", or to
"(.+?)\s".

--
Affijn, Ruud

"Gewoon is een tijger."


 
Reply With Quote
 
Brian McCauley
Guest
Posts: n/a
 
      10-07-2006


On Oct 7, 1:58 am, Eric Amick <eric-am...@comcast.net> wrote:
> Repeat counts in curly
> brackets have to be constants. Try this and see what I mean:
>
> perl -Mre=debug -e "/(.+)\s(.{\1})/"


You can use (??{})

/ (.+) \s ( (??{ ".{$1}" }) )/x

But this is neither vert readable nor very efficient.

 
Reply With Quote
 
Ala Qumsieh
Guest
Posts: n/a
 
      10-07-2006
Eric Amick wrote:

> On Fri, 06 Oct 2006 22:50:13 GMT, "A. Sinan Unur"
> <> wrote:
>
>
>>>/(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/

>>
>>Ahem ... Did you read the error message? Without any testing, I can see
>>that you should havve [0-9]+ rather than the [0-9)+ you used above.

>
>
> Maybe this is version-dependent, but that won't do what the OP wants
> even after fixing the syntax error with [0-9). Repeat counts in curly
> brackets have to be constants.


No. They can also be variables:

% perl -le '$_ = "aaa"; $c = 2; print $& if /a{$c}/'
aa

--Ala

 
Reply With Quote
 
Ala Qumsieh
Guest
Posts: n/a
 
      10-07-2006
Bob Walton wrote:

> Doesn't work because of the syntax error. And because the contents of
> the {...} construction have to be literal digits or digits,digits .


Not true. They can be variables. See my other post in this thread.

--Ala

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Bug? concatenate a number to a backreference: re.sub(r'(zzz:)xxx',r'\1'+str(4444), somevar) abdulet Python 2 10-23-2009 12:27 PM
No regex backreference with four backslashes gabriel.birke@gmail.com Ruby 4 09-16-2006 09:30 AM
re.sub() backreference bug? jemminger@gmail.com Python 4 08-18-2006 12:47 AM
backreference in regexp Fredrik Lundh Python 2 01-31-2006 03:02 PM
Newbie backreference question paulm Python 6 06-30-2005 11:00 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57