Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > How make regex that means "contains regex#1 but NOT regex#2" ??

Reply
Thread Tools

How make regex that means "contains regex#1 but NOT regex#2" ??

 
 
seberino@spawar.navy.mil
Guest
Posts: n/a
 
      07-01-2008
I'm looking over the docs for the re module and can't find how to
"NOT" an entire regex.

For example.....

How make regex that means "contains regex#1 but NOT regex#2" ?

Chris
 
Reply With Quote
 
 
 
 
Paul McGuire
Guest
Posts: n/a
 
      07-01-2008
On Jul 1, 2:34*am, "A.T.Hofkamp" <h...@se-162.se.wtb.tue.nl> wrote:
> On 2008-07-01, seber...@spawar.navy.mil <seber...@spawar.navy.mil> wrote:
>
> > I'm looking over the docs for the re module and can't find how to
> > "NOT" an entire regex.

>
> (?! R)
>
> > How make regex that means "contains regex#1 but NOT regex#2" ?

>
> (\1|(?!\2))
>
> should do what you want.
>
> Albert


I think the OP wants both A AND not B, not A OR not B. If the OP want
to do re.match(A and not B), then I think this can be done as ((?!
\2)\1), but if he really wants CONTAINS A and not B, then I think this
requires 2 calls to re.search. See test code below:

import re

def test(restr,instr):
print "%s match %s? %s" %
(restr,instr,bool(re.match(restr,instr)))

a = "AAA"
b = "BBB"

aAndNotB = "(%s|(?!%s))" % (a,b)

test(aAndNotB,"AAA")
test(aAndNotB,"BBB")
test(aAndNotB,"AAABBB")
test(aAndNotB,"zAAA")
test(aAndNotB,"CCC")

aAndNotB = "((?!%s)%s)" % (b,a)

test(aAndNotB,"AAA")
test(aAndNotB,"BBB")
test(aAndNotB,"AAABBB")
test(aAndNotB,"zAAA")
test(aAndNotB,"CCC")

def test2(arestr,brestr,instr):
print "%s contains %s but NOT %s? %s" % \
(instr,arestr,brestr,
bool(re.search(arestr,instr) and
not re.search(brestr,instr)))

test2(a,b,"AAA")
test2(a,b,"BBB")
test2(a,b,"AAABBB")
test2(a,b,"zAAA")
test2(a,b,"CCC")

Prints:

(AAA|(?!BBB)) match AAA? True
(AAA|(?!BBB)) match BBB? False
(AAA|(?!BBB)) match AAABBB? True
(AAA|(?!BBB)) match zAAA? True
(AAA|(?!BBB)) match CCC? True
((?!BBB)AAA) match AAA? True
((?!BBB)AAA) match BBB? False
((?!BBB)AAA) match AAABBB? True
((?!BBB)AAA) match zAAA? False
((?!BBB)AAA) match CCC? False
AAA contains AAA but NOT BBB? True
BBB contains AAA but NOT BBB? False
AAABBB contains AAA but NOT BBB? False
zAAA contains AAA but NOT BBB? True
CCC contains AAA but NOT BBB? False


As we've all seen before, posters are not always the most precise when
describing whether they want match vs. search. Given that the OP used
the word "contains", I read that to mean "search". I'm not an RE pro
by any means, but I think the behavior that the OP wants is given in
the last 4 tests, and I don't know how to do that in a single RE.

-- Paul
 
Reply With Quote
 
 
 
 
Reedick, Andrew
Guest
Posts: n/a
 
      07-01-2008


> -----Original Message-----
> From: python-list-bounces+jr9445= [mailtoython-
> list-bounces+jr9445=] On Behalf Of
>
> Sent: Tuesday, July 01, 2008 2:29 AM
> To: python-
> Subject: How make regex that means "contains regex#1 but NOT regex#2"
> ??
>
> I'm looking over the docs for the re module and can't find how to
> "NOT" an entire regex.
>
> For example.....
>
> How make regex that means "contains regex#1 but NOT regex#2" ?
>


Match 'foo.*bar', except when 'not' appears between foo and bar.


import re

s = 'fooAAABBBbar'
print "Should match:", s
m = re.match(r'(foo(.(?!not))*bar)', s);
if m:
print m.groups()

print

s = 'fooAAAnotBBBbar'
print "Should not match:", s
m = re.match(r'(foo(.(?!not))*bar)', s);
if m:
print m.groups()


== Output ==
Should match: fooAAABBBbar
('fooAAABBBbar', 'B')

Should not match: fooAAAnotBBBbar



*****

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA621


 
Reply With Quote
 
Reedick, Andrew
Guest
Posts: n/a
 
      07-01-2008


> -----Original Message-----
> From: python-list-bounces+jr9445= [mailtoython-
> list-bounces+jr9445=] On Behalf Of Reedick, Andrew
> Sent: Tuesday, July 01, 2008 10:07 AM
> To: ; python-
> Subject: RE: How make regex that means "contains regex#1 but NOT
> regex#2" ??
>
> Match 'foo.*bar', except when 'not' appears between foo and bar.
>
>
> import re
>
> s = 'fooAAABBBbar'
> print "Should match:", s
> m = re.match(r'(foo(.(?!not))*bar)', s);
> if m:
> print m.groups()
>
> print
>
> s = 'fooAAAnotBBBbar'
> print "Should not match:", s
> m = re.match(r'(foo(.(?!not))*bar)', s);
> if m:
> print m.groups()
>
>
> == Output ==
> Should match: fooAAABBBbar
> ('fooAAABBBbar', 'B')
>
> Should not match: fooAAAnotBBBbar
>



Fixed a bug with 'foonotbar'. Conceptually it breaks down into:

First_half_of_Regex#1(not
Regex#2)(any_char_Not_followed_by_Regex#2)*Second_ half_of_Regex#1

However, if possible, I would make it a two pass regex. Match on
Regex#1, throw away any matches that then match on Regex#2. A two pass
is faster and easier to code and understand. Easy to understand == less
chance of a bug. If you're worried about performance, then a) a
complicated regex may or may not be faster than two simple regexes, and
b) if you're passing that much data through a regex, you're probably I/O
bound anyway.


import re

ss = ('foobar', 'fooAAABBBbar', 'fooAAAnotBBBbar', 'fooAAAnotbar',
'foonotBBBbar', 'foonotbar')

for s in ss:
print s,
m = re.match(r'(foo(?!not)(?:.(?!not))*bar)', s);
if m:
print m.groups()
else:
print


== output ==
foobar ('foobar',)
fooAAABBBbar ('fooAAABBBbar',)
fooAAAnotBBBbar
fooAAAnotbar
foonotBBBbar
foonotbar

*****

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA621


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
problem in running a basic code in python 3.3.0 that includes HTML file Satabdi Mukherjee Python 1 04-04-2013 07:48 PM
how to make to transmit the parameters of a cgi between these various pages I would like to use param but I do not see how to make john.swilting Perl Misc 1 03-27-2007 09:46 AM
They are gaming Hays, Schwartkopff adn Mclean today, which means they are ealy doing something else, but I have to figure that yet. The game is fixed, how can the company ever win? profpsychoticmrs@aol.com Computer Support 0 03-08-2006 04:24 PM
Re: Sorry my computer inglish is not from highest but again what means from listmanager 49k error mail ? is popupstopmessage a solotion? °Mike° Computer Support 0 07-01-2003 03:21 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57