Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Small confusion about negative lookbehind

Reply
Thread Tools

Small confusion about negative lookbehind

 
 
david.karr@wamu.net
Guest
Posts: n/a
 
      05-30-2005
I'm writing a small test program to illustrate several aspects of
regular expressions. In the section illustrating "lookaround"s, I
found something I didn't understand. My testing is with JDK 1.4.2.

My candidate string is "ab".

The expressions I'm testing this string against are the following,
which also lists whether the string matched or not

a(?=b) // succeeds
(?=a)b // fails
(?<=a)b // succeeds
a(?<=b) // fails
(?<!x)b // succeeds
a(?<!x) // succeeds(!)

Looking at these, I first wonder what exactly is the semantic
difference between a "lookbehind" and "lookahead" construct. The
syntactic difference is obvious, but I find the question of why pattern
1 succeeds and pattern 2 fails is a little hazy. The one that really
bothers me, however, is pattern 6. Despite the lack of clarity I have
in how this is supposed to work, I was pretty certain that this pattern
would fail.

I could use some clarification of these constructs.

 
Reply With Quote
 
 
 
 
Lasse Reichstein Nielsen
Guest
Posts: n/a
 
      05-30-2005
http://www.velocityreviews.com/forums/(E-Mail Removed) writes:

> I'm writing a small test program to illustrate several aspects of
> regular expressions. In the section illustrating "lookaround"s, I
> found something I didn't understand. My testing is with JDK 1.4.2.


Hey, I didn't even know about look-behinds

> My candidate string is "ab".
>
> The expressions I'm testing this string against are the following,
> which also lists whether the string matched or not

....
> Looking at these, I first wonder what exactly is the semantic
> difference between a "lookbehind" and "lookahead" construct.


Both are zero-width predicates, which means (kindof) that it matches
not a character, but the position between characters. See a string
as not just a sequence of characters, but of alternating characters
and in-between positions. These positions are where the cursor is
when you write (if you use a bar cursor, not a block, obviously .

Regular expressions describe not only strings, but also the positions
between the chars in strings, e.g. "\b" which matches a position which
is at a word boundary (word-charater on one side, non-word-character
on the other). The look-around patters work just the same.

The exact predicate determines how the position is matched. For a
look-ahead, the zero-width position is matched if the following
characters is matched by the look-ahead expression. For the
look-behind, the zero-width position is matched if the previous
characters match the look-behind expression.

So, "a(?=b)" matches an "a" followed by a zero-width string which is
followed by a "b". The matched substring of "ab" is "a".

"(?=a)b" matches a zero-width string which is followed by an "a",
followed by a "b". Since no position can be followed by both an "a"
and a "b", no string will match.

"(?<=a)b" matches a zero-width string preceeded by an "a", followed
by a "b". The matched substring of "ab" is "b".

"a(?<=b)" matches an "a" followed by a zero-width string preceeded by
a "b". Since that's not possible for any string, it fails.

> (?<!x)b // succeeds


"(?<!x)b" matches a zero-width string not preceeded by an "x",
followed by a "b". The matched substring of "ab" is "b".

> a(?<!x) // succeeds(!)


"a(?<!x)" matches an "a" followed by a zero-width string not preceeded
by an "x". This matches the string "a", even as a substring of "ab".

/L
--
Lasse Reichstein Nielsen - (E-Mail Removed)
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'
 
Reply With Quote
 
 
 
 
hiwa
Guest
Posts: n/a
 
      05-31-2005
(E-Mail Removed) wrote in message news:<(E-Mail Removed) roups.com>...
> I'm writing a small test program to illustrate several aspects of
> regular expressions. In the section illustrating "lookaround"s, I
> found something I didn't understand. My testing is with JDK 1.4.2.
>
> My candidate string is "ab".
>
> The expressions I'm testing this string against are the following,
> which also lists whether the string matched or not
>
> a(?=b) // succeeds
> (?=a)b // fails
> (?<=a)b // succeeds
> a(?<=b) // fails
> (?<!x)b // succeeds
> a(?<!x) // succeeds(!)
>
> Looking at these, I first wonder what exactly is the semantic
> difference between a "lookbehind" and "lookahead" construct. The
> syntactic difference is obvious, but I find the question of why pattern

a(?=b) There is a 'b' after me 'a' //succeeds, matches 'a' of "ab"
(?=a)b There is a 'a' of which prefix is 'b' //fails with "ab"
(?<=a)b There is a 'a' before a 'b' //succeeds, matches 'b' of "ab"
a(?<=b) There is a 'b' before a 'a' //fails with "ab"
(?<!x)b There is no 'x' before 'b' //succeeds, matches 'b' of "ab"
a(?<!x) There is no 'x' before 'a' //succeeds, matches 'a' of "ab"
> 1 succeeds and pattern 2 fails is a little hazy. The one that really
> bothers me, however, is pattern 6. Despite the lack of clarity I have
> in how this is supposed to work, I was pretty certain that this pattern
> would fail.
>
> I could use some clarification of these constructs.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: regex negative lookbehind assertion not working correctly? MRAB Python 0 03-31-2009 05:08 PM
regex negative lookbehind assertion not working correctly? Gabriel Rossetti Python 0 03-31-2009 03:38 PM
Negative Lookbehind Replacement? mail Perl 1 03-02-2004 03:14 PM
Negative Lookbehind and Wildcards Thomas F. O'Connell Perl 1 02-28-2004 01:50 PM
Negative Lookbehind Using Windows Scripting Host mail Javascript 0 02-07-2004 08:53 PM



Advertisments