Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > regular expression negate a word (not character)

Reply
Thread Tools

regular expression negate a word (not character)

 
 
Summercool
Guest
Posts: n/a
 
      01-26-2008

somebody who is a regular expression guru... how do you negate a word
and grep for all words that is

tire

but not

snow tire

or

snowtire

so for example, it will grep for

winter tire
tire
retire
tired

but will not grep for

snow tire
snow tire
some snowtires

need to do it in one regular expression

 
Reply With Quote
 
 
 
 
Summercool
Guest
Posts: n/a
 
      01-26-2008
On Jan 25, 5:16 pm, Summercool <(E-Mail Removed)> wrote:
> somebody who is a regular expression guru... how do you negate a word
> and grep for all words that is
>
> tire
>
> but not
>
> snow tire
>
> or
>
> snowtire


i could think of something like

/[^s][^n][^o][^w]\s*tire/i

but what if it is not snow but some 20 character-word, then do we need
to do it 20 times to negate it? any shorter way?

 
Reply With Quote
 
 
 
 
Ben Morrow
Guest
Posts: n/a
 
      01-26-2008
[newsgroups line fixed, f'ups set to clpm]

Quoth Summercool <(E-Mail Removed)>:
> On Jan 25, 5:16 pm, Summercool <(E-Mail Removed)> wrote:
> > somebody who is a regular expression guru... how do you negate a word
> > and grep for all words that is
> >
> > tire
> >
> > but not
> >
> > snow tire
> >
> > or
> >
> > snowtire

>
> i could think of something like
>
> /[^s][^n][^o][^w]\s*tire/i
>
> but what if it is not snow but some 20 character-word, then do we need
> to do it 20 times to negate it? any shorter way?


This is no good, since 'snoo tire' fails to match even though you want
it to. You need something more like

/ (?: [^s]... | [^n].. | [^o]. | [^w] | ^ ) \s* tire /ix

but that gets *really* tedious for long strings, unless you generate it.

Ben

 
Reply With Quote
 
Mark Tolonen
Guest
Posts: n/a
 
      01-26-2008

"Summercool" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
>
> somebody who is a regular expression guru... how do you negate a word
> and grep for all words that is
>
> tire
>
> but not
>
> snow tire
>
> or
>
> snowtire
>
> so for example, it will grep for
>
> winter tire
> tire
> retire
> tired
>
> but will not grep for
>
> snow tire
> snow tire
> some snowtires
>
> need to do it in one regular expression
>


What you want is a negative lookbehind assertion:

>>> re.search(r'(?<!snow)tire','snowtire') # no match
>>> re.search(r'(?<!snow)tire','baldtire')

<_sre.SRE_Match object at 0x00FCD608>

Unfortunately you want variable whitespace:

>>> re.search(r'(?<!snow\s*)tire','snow tire')

Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
File "C:\dev\python\lib\re.py", line 134, in search
return _compile(pattern, flags).search(string)
File "C:\dev\python\lib\re.py", line 233, in _compile
raise error, v # invalid expression
error: look-behind requires fixed-width pattern
>>>


Python doesn't support lookbehind assertions that can vary in size. This
doesn't work either:

>>> re.search(r'(?<!snow)\s*tire','snow tire')

<_sre.SRE_Match object at 0x00F93480>

Here's some code (not heavily tested) that implements a variable lookbehind
assertion, and a function to mark matches in a string to demonstrate it:

### BEGIN CODE ###

import re

def finditerexcept(pattern,notpattern,string):
for matchobj in
re.finditer('(?:%s)|(?:%s)'%(notpattern,pattern),s tring):
if not re.match(notpattern,matchobj.group()):
yield matchobj

def markexcept(pattern,notpattern,string):
substrings = []
current = 0

for matchobj in finditerexcept(pattern,notpattern,string):
substrings.append(string[current:matchobj.start()])
substrings.append('[' + matchobj.group() + ']')
current = matchobj.end() #

substrings.append(string[current:])
return ''.join(substrings)

### END CODE ###

>>> sample='''winter tire

.... tire
.... retire
.... tired
.... snow tire
.... snow tire
.... some snowtires
.... '''
>>> print markexcept('tire','snow\s*tire',sample)

winter [tire]
[tire]
re[tire]
[tire]d
snow tire
snow tire
some snowtires

--Mark

 
Reply With Quote
 
Summercool
Guest
Posts: n/a
 
      01-26-2008
to add to the test cases, the regular expression must be able to grep


snowbird tire
tired on a snow day
snow tire and regular tire


 
Reply With Quote
 
bearophileHUGS@lycos.com
Guest
Posts: n/a
 
      01-26-2008
Summercool:
> to add to the test cases, the regular expression must be able to grep
> snow tire and regular tire


I presume there only the second tire has to be found.

This is my first try:

text = """
tire
word tire word
word retire word
word tired word
snowbird tire word
tired on a snow day word
snow tire and regular tire word
word snow tire word
word snow tire word
word some snowtires word
"""

import re

def finder(text):
patt = re.compile( r"\b (\w*) \s* (tire)", re.VERBOSE)
for mo in patt.finditer(text):
if not mo.group(1).endswith("snow"):
yield mo.start(2)

for end in finder(text):
print end

The (lazy) output is the starting point of the "tire" that match:


1
11
28
43
63
73
120

Bye,
bearophile
 
Reply With Quote
 
Paddy
Guest
Posts: n/a
 
      01-26-2008
On Jan 26, 1:16 am, Summercool <(E-Mail Removed)> wrote:
> somebody who is a regular expression guru... how do you negate a word
> and grep for all words that is
>
> tire
>
> but not
>
> snow tire
>
> or
>
> snowtire
>
> so for example, it will grep for
>
> winter tire
> tire
> retire
> tired
>
> but will not grep for
>
> snow tire
> snow tire
> some snowtires
>
> need to do it in one regular expression


Try the answer here:
http://mail.python.org/pipermail/tut...st/024902.html
 
Reply With Quote
 
bearophileHUGS@lycos.com
Guest
Posts: n/a
 
      01-26-2008
Paddy:
> Try the answer here:
> http://mail.python.org/pipermail/tut...st/024902.html


But in the OP problem there can be variable-sized spaces in the
middle...

Bye,
bearophile
 
Reply With Quote
 
Ilya Zakharevich
Guest
Posts: n/a
 
      01-26-2008
[A complimentary Cc of this posting was sent to
Summercool
<(E-Mail Removed)>], who wrote in article <(E-Mail Removed)>:
> so for example, it will grep for
>
> winter tire
> tire
> retire
> tired
>
> but will not grep for
>
> snow tire
> snow tire
> some snowtires


This does not describe the problem completely. What about

thisnow tire
snow; tire

etc? Anyway, one of the obvious modifications of

(^ | \b(?!snow) \w+ ) \W* tire

should work.

Hope this helps,
Ilya

 
Reply With Quote
 
Greg Bacon
Guest
Posts: n/a
 
      01-28-2008
The code below at least passes your tests.

Hope it helps,
Greg

#! /usr/bin/perl

use warnings;
use strict;

use constant {
MATCH => 1,
NO_MATCH => 0,
};

my @tests = (
[ "winter tire", => MATCH ],
[ "tire", => MATCH ],
[ "retire", => MATCH ],
[ "tired", => MATCH ],
[ "snowbird tire", => MATCH ],
[ "tired on a snow day", => MATCH ],
[ "snow tire and regular tire", => MATCH ],
[ " tire" => MATCH ],
[ "snow tire" => NO_MATCH ],
[ "snow tire" => NO_MATCH ],
[ "some snowtires" => NO_MATCH ],
);

my $not_snow_tire = qr/
^ \s* tire |
([^w\s]|[^o]w|[^n]ow|[^s]now)\s*tire
/xi;

my $fail;
for (@tests) {
my($str,$want) = @$_;
my $got = $str =~ /$not_snow_tire/;
my $pass = !!$want == !!$got;

print "$str: ", ($pass ? "PASS" : "FAIL"), "\n";

++$fail unless $pass;
}

print "\n", (!$fail ? "PASS" : "FAIL"), "\n";

__END__

--
... all these cries of having 'abolished slavery,' of having 'preserved the
union,' of establishing a 'government by consent,' and of 'maintaining the
national honor' are all gross, shameless, transparent cheats -- so trans-
parent that they ought to deceive no one. -- Lysander Spooner, "No Treason"
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
regular expression negate a word (not character) Summercool Ruby 22 08-06-2010 08:58 AM
regular expression negate a word (not character) Summercool Perl Misc 14 02-01-2008 10:36 AM
Negate a character sequence in a regular expression? crm_114@mac.com Ruby 11 12-02-2007 09:15 AM
need to negate regex in middle of expression Sherm Pendley Perl Misc 8 06-20-2005 04:43 PM
regular expression to negate the start of line char '^'? Neil Morris Javascript 1 07-15-2003 10:07 PM



Advertisments