Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > negative backreference?

Reply
Thread Tools

negative backreference?

 
 
eric.hall@gmail.com
Guest
Posts: n/a
 
      03-04-2005
I'm a relative newbie with perl/regexp

I'm trying to write a rule for SpamAssassin that looks at the top-most
Received header and checks if the HELO identifer and the reverse DNS
hostname are the same, and apply a weight accordingly.

It's easy to see if they are the same, using an internal debug header
and a backreference. Assume HEADER is of the form "rdns=hostname
helo=hostname" then the simple rule of:

HEADER =~ /rdns=(.*) helo=\1/

will match when they are the same. But I need to match when they are
different.

I've tried negative look-ahead of various forms, but nothing seems to
work correctly when backreferences are included. Is there a way out of
this hole?

Perl 5.8.1 on SuSE Linux Professional 9.0, if it matters.

Thanks

 
Reply With Quote
 
 
 
 
Anno Siegel
Guest
Posts: n/a
 
      03-04-2005
<(E-Mail Removed)> wrote in comp.lang.perl.misc:
> I'm a relative newbie with perl/regexp
>
> I'm trying to write a rule for SpamAssassin that looks at the top-most
> Received header and checks if the HELO identifer and the reverse DNS
> hostname are the same, and apply a weight accordingly.
>
> It's easy to see if they are the same, using an internal debug header
> and a backreference. Assume HEADER is of the form "rdns=hostname
> helo=hostname" then the simple rule of:
>
> HEADER =~ /rdns=(.*) helo=\1/
>
> will match when they are the same. But I need to match when they are
> different.
>
> I've tried negative look-ahead of various forms, but nothing seems to
> work correctly when backreferences are included. Is there a way out of
> this hole?


What have you tried? Negative lookahead should work just fine for this.

Anno
 
Reply With Quote
 
 
 
 
eric.hall@gmail.com
Guest
Posts: n/a
 
      03-04-2005
I've tried wrapping just the "rdns" part in a negative look-ahead, and
I've tried wrapping the whole thing, and neither produces a match when
the values are different.

Here's the first test, using non-matching names:

#!/usr/bin/perl

$_ = "[ rdns=hostname1 helo=hostname2 ]";

if ( /^[^\]]+ (?!rdns=(.*) helo=\1)/ ) {
print "got <$1>\n";
}

$ test.pl
got <>

That looks like it works, but it produces the same results ("got <>")
even when rdns and helo are the same, meaning that the test seems to
err out in all cases.

Am I trapping the wrong output or something?

 
Reply With Quote
 
eric.hall@gmail.com
Guest
Posts: n/a
 
      03-04-2005
^[^\]]+ rdns=(\S*) helo=(?!\1) returns hostname1, as needed. From my
admittedly limited understanding, it does not appear that the negative
look-ahead is being interpreted as such, and this gobbledygoo should
not work.

I'm content to live with the mystery, but if somebody could explain it
or reference material that says why it works, I'd be appreciative.

 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      03-04-2005
<(E-Mail Removed)> wrote in comp.lang.perl.misc:

Please give an attribution and some context in your reply.

> I've tried wrapping just the "rdns" part in a negative look-ahead, and
> I've tried wrapping the whole thing, and neither produces a match when
> the values are different.
>
> Here's the first test, using non-matching names:
>
> #!/usr/bin/perl
>
> $_ = "[ rdns=hostname1 helo=hostname2 ]";
>
> if ( /^[^\]]+ (?!rdns=(.*) helo=\1)/ ) {
> print "got <$1>\n";
> }
>
> $ test.pl
> got <>
>
> That looks like it works, but it produces the same results ("got <>")
> even when rdns and helo are the same, meaning that the test seems to
> err out in all cases.


To me it doesn't look at all like it works, given that it failed to
capture anything in $1.

It's really rather simple. The test for equality can be done with just a
backreference, without lookahead:

for ( ( 'rdns=AAA helo=AAA', 'rdns=BBB helo=AAA') ) {
print "got <$1>\n" if /rdns=(.*) helo=\1/;
}

That reports "AAA", the case where both are equal. Now turn the sense
of the test around, wrapping the backreference in a negative lookahead:

for ( ( 'rdns=AAA helo=AAA', 'rdns=BBB helo=AAA') ) {
print "got <$1>\n" if /rdns=(.*) helo=(?!\1)/;
}

Now it reports "BBB". That's it.

Anno
 
Reply With Quote
 
Ilya Zakharevich
Guest
Posts: n/a
 
      03-05-2005
[A complimentary Cc of this posting was sent to
Anno Siegel
<(E-Mail Removed)-berlin.de>], who wrote in article <d0aq0m$a3t$(E-Mail Removed)-Berlin.DE>:
> That reports "AAA", the case where both are equal. Now turn the sense
> of the test around, wrapping the backreference in a negative lookahead:
>
> for ( ( 'rdns=AAA helo=AAA', 'rdns=BBB helo=AAA') ) {
> print "got <$1>\n" if /rdns=(.*) helo=(?!\1)/;
> }
>
> Now it reports "BBB". That's it.


Do not think so. One needs some anchor at the end. Something like

/rdns=(\w+) helo=(?!\1\b)/;

(mutatis mutandis). Having different match than \w+ will lead so more
complicated stuff than \b... In perfect life, one would use something
like my (proposed) onion rings:

/rdns=(\S*) helo=(?& \S* & (?!\1)/;

Hope this helps,
Ilya
 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      03-05-2005
Ilya Zakharevich <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> [A complimentary Cc of this posting was sent to
> Anno Siegel
> <(E-Mail Removed)-berlin.de>], who wrote in article
> <d0aq0m$a3t$(E-Mail Removed)-Berlin.DE>:
> > That reports "AAA", the case where both are equal. Now turn the sense
> > of the test around, wrapping the backreference in a negative lookahead:
> >
> > for ( ( 'rdns=AAA helo=AAA', 'rdns=BBB helo=AAA') ) {
> > print "got <$1>\n" if /rdns=(.*) helo=(?!\1)/;
> > }
> >
> > Now it reports "BBB". That's it.

>
> Do not think so. One needs some anchor at the end. Something like
>
> /rdns=(\w+) helo=(?!\1\b)/;


That's right.

> (mutatis mutandis). Having different match than \w+ will lead so more
> complicated stuff than \b... In perfect life, one would use something
> like my (proposed) onion rings:
>
> /rdns=(\S*) helo=(?& \S* & (?!\1)/;


Is that proposal available somewhere? I'm not sure how "(?&" is supposed
to work. Should the parens balance?

Anno
 
Reply With Quote
 
Ilya Zakharevich
Guest
Posts: n/a
 
      03-09-2005
[A complimentary Cc of this posting was sent to
Anno Siegel
<(E-Mail Removed)-berlin.de>], who wrote in article <d0c48i$39r$(E-Mail Removed)-Berlin.DE>:
> > (mutatis mutandis). Having different match than \w+ will lead so more
> > complicated stuff than \b... In perfect life, one would use something
> > like my (proposed) onion rings:
> >
> > /rdns=(\S*) helo=(?& \S* & (?!\1)/;


/rdns=(\S*) helo=(?& \S* & (?!\1))/

maybe even

/rdns=(\S*) helo=(?& \S* &! \1)/

> Is that proposal available somewhere?


I think so. google for it...

> I'm not sure how "(?&" is supposed to work.


A & B & C & D ...

B should match a substring of what A matched, C should match a
substring of what B matched etc... One can replace & by &! (negating
the following group). Actually, another, "anchored", flavor is useful
in other situations: one where B should match *exactly* the string
which A matched (and not a substring thereof). [example above uses
the second flavor]

It was never clear to me how to distinguish these two flavors; maybe
something as simple as && vs &...

> Should the parens balance?


Sure, thanks.

Yours,
Ilya
 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      03-09-2005
Ilya Zakharevich <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> [A complimentary Cc of this posting was sent to
> Anno Siegel
> <(E-Mail Removed)-berlin.de>], who wrote in article
> <d0c48i$39r$(E-Mail Removed)-Berlin.DE>:
> > > (mutatis mutandis). Having different match than \w+ will lead so more
> > > complicated stuff than \b... In perfect life, one would use something
> > > like my (proposed) onion rings:
> > >
> > > /rdns=(\S*) helo=(?& \S* & (?!\1)/;

>
> /rdns=(\S*) helo=(?& \S* & (?!\1))/
>
> maybe even
>
> /rdns=(\S*) helo=(?& \S* &! \1)/
>
> > Is that proposal available somewhere?

>
> I think so. google for it...


I tried. In the presence of a number of _State of the Onion_s the puns
overwhelmed me.

> > I'm not sure how "(?&" is supposed to work.

>
> A & B & C & D ...
>
> B should match a substring of what A matched, C should match a
> substring of what B matched etc... One can replace & by &! (negating
> the following group).


Ah... Now I'm getting the name too -- successive substrings.

> Actually, another, "anchored", flavor is useful
> in other situations: one where B should match *exactly* the string
> which A matched (and not a substring thereof). [example above uses
> the second flavor]


With infinitesimal onion rings...

> It was never clear to me how to distinguish these two flavors; maybe
> something as simple as && vs &...


Hard to remember which is which in a not-too-often-used construct.
How about =& ?

Anno
 
Reply With Quote
 
Ilya Zakharevich
Guest
Posts: n/a
 
      03-10-2005
[A complimentary Cc of this posting was sent to
Anno Siegel
<(E-Mail Removed)-berlin.de>], who wrote in article <d0nlnj$6c1$(E-Mail Removed)-Berlin.DE>:
> > > I'm not sure how "(?&" is supposed to work.

> >
> > A & B & C & D ...
> >
> > B should match a substring of what A matched, C should match a
> > substring of what B matched etc... One can replace & by &! (negating
> > the following group).

>
> Ah... Now I'm getting the name too -- successive substrings.
>
> > Actually, another, "anchored", flavor is useful
> > in other situations: one where B should match *exactly* the string
> > which A matched (and not a substring thereof). [example above uses
> > the second flavor]

>
> With infinitesimal onion rings...
>
> > It was never clear to me how to distinguish these two flavors; maybe
> > something as simple as && vs &...

>
> Hard to remember which is which in a not-too-often-used construct.
> How about =& ?


The idea of &= is appealing indeed. Uniformizing, it may become

&~ &= &!~ &!=

or just

& &= &! &!=

Hard to decide between these two...

Thanks,
Ilya
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Negative setup and Negative hold prem_eda VHDL 5 10-11-2004 12:14 PM
Negative Numbers? Charles A. Lackman ASP .Net 4 09-28-2004 12:27 AM
how to represent the negative value in data sequence? lezah VHDL 0 02-04-2004 04:45 PM
negative indexes valentin tihomirov VHDL 2 01-06-2004 04:33 PM
Replace negative numbers with 0 in a file Mayank Perl 2 11-28-2003 11:08 AM



Advertisments