Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Regular expression to match only strings NOT containing particular words

Reply
Thread Tools

Regular expression to match only strings NOT containing particular words

 
 
Dylan Nicholson
Guest
Posts: n/a
 
      10-19-2007
I can write a regular expression that will only match strings that are
NOT the word apple:

^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$

But is there a neater way, and how would I do it to match strings that
are NOT the word apple OR banana? Then what would be needed to match
only strings that do not CONTAIN the word "apple" or "banana" or
"cherry"?

I'd love it if the following worked:

^[^(apple)(banana)(cherry)]*$

But it appears the parantheses are ignored, as

^[(apple)(banana)(cherry)]*$

simply matches any string that consists entire of the characters
a,b,c,e,h,l,n,r,p & y.

 
Reply With Quote
 
 
 
 
Jürgen Exner
Guest
Posts: n/a
 
      10-19-2007
Dylan Nicholson wrote:
> I can write a regular expression that will only match strings that are
> NOT the word apple:
>
> ^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$
>
> But is there a neater way, and how would I do it to match strings that
> are NOT the word apple OR banana? Then what would be needed to match
> only strings that do not CONTAIN the word "apple" or "banana" or
> "cherry"?


!(/apple/ or /banana/ or /cherry/)

jue


 
Reply With Quote
 
 
 
 
Stephane CHAZELAS
Guest
Posts: n/a
 
      10-19-2007
2007-10-18, 22:00(-07), Dylan Nicholson:
> I can write a regular expression that will only match strings that are
> NOT the word apple:
>
> ^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$
>
> But is there a neater way, and how would I do it to match strings that
> are NOT the word apple OR banana? Then what would be needed to match
> only strings that do not CONTAIN the word "apple" or "banana" or
> "cherry"?
>
> I'd love it if the following worked:
>
> ^[^(apple)(banana)(cherry)]*$
>
> But it appears the parantheses are ignored, as
>
> ^[(apple)(banana)(cherry)]*$
>
> simply matches any string that consists entire of the characters
> a,b,c,e,h,l,n,r,p & y.


With perl regexps:

perl -ne 'print if /^(??!apple|banana).)*$/'
or probably better:
perl -ne 'print if /^(?!.*(?:apple|banana))/'

But then, why not

perl -ne 'print if !/apple|banana/'

Note that vim's regexps have an equivalent negative look-ahead
operator.

--
Stéphane
 
Reply With Quote
 
Rad [Visual C# MVP]
Guest
Posts: n/a
 
      10-19-2007
On Thu, 18 Oct 2007 22:00:28 -0700, Dylan Nicholson
<> wrote:

>I can write a regular expression that will only match strings that are
>NOT the word apple:
>
>^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$
>
>But is there a neater way, and how would I do it to match strings that
>are NOT the word apple OR banana? Then what would be needed to match
>only strings that do not CONTAIN the word "apple" or "banana" or
>"cherry"?
>
>I'd love it if the following worked:
>
>^[^(apple)(banana)(cherry)]*$
>
>But it appears the parantheses are ignored, as
>
>^[(apple)(banana)(cherry)]*$
>
>simply matches any string that consists entire of the characters
>a,b,c,e,h,l,n,r,p & y.


A simple way is to write the regex to match apple or banana or cherry,
do the match and then check the Success property of the match object.

Execute the following mini program

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Regex r = new Regex(".*apple|banana|cherry.*");
string[] strings =
"apple,banana,cherry,applebanana,applebananacherry ,fishapple,chips,chip
and apple,apple pie".Split(',');
foreach (string s in strings)
{
Console.WriteLine("{0} Match? {1}", s,
r.Match(s).Success);
}
Console.ReadLine();
}
}
}

You should get this:

apple Match? True
banana Match? True
cherry Match? True
applebanana Match? True
applebananacherry Match? True
fishapple Match? True
chips Match? False
chip and apple Match? True
apple pie Match? True

--
http://bytes.thinkersroom.com
 
Reply With Quote
 
Michele Dondi
Guest
Posts: n/a
 
      10-19-2007
On Thu, 18 Oct 2007 22:00:28 -0700, Dylan Nicholson
<> wrote:

>But is there a neater way, and how would I do it to match strings that
>are NOT the word apple OR banana? Then what would be needed to match
>only strings that do not CONTAIN the word "apple" or "banana" or
>"cherry"?


The general answer is that you should use separate regexen and logical
operators, or an explicit !~ but the subject of negating regexen is
discussed to some depth in the following thread @ PM:

http://perlmonks.org/?node_id=588315


Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
 
Reply With Quote
 
A. Sinan Unur
Guest
Posts: n/a
 
      10-19-2007
Dylan Nicholson <> wrote in
news: oups.com:

[
newsgroup list trimmed, follow-ups set
There is no reason to cross-post to both c.l.p.misc and m.p.d.l.csharp
]

> I can write a regular expression that will only match strings that are
> NOT the word apple:
>
> ^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$
>
> But is there a neater way, and how would I do it to match strings that
> are NOT the word apple OR banana?



When you say "are not" rather than does not contain, it means you should
not be using regular expressions at all.


unless ( $s eq 'apple' or $s eq 'banana' or $s eq 'cherry' ) {

....

}


> Then what would be needed to match only strings that do not
> CONTAIN the word "apple" or "banana" or "cherry"?


unless (
index( $s, 'apple' ) > -1
index( $s, 'banana' ) > -1
index( $s, 'cherry' ) > -1
) {

....

}

If you have a long list of words, you could use


#!/usr/bin/perl

use strict;
use warnings;

use List::MoreUtils qw( first_index );

my $text = <<EO_TEXT;
Sed ut perspiciatis unde omnis iste natus error
sit voluptatem accusantium doloremque laudantium,
totam rem aperiam, eaque ipsa quae ab illo
inventore veritatis et quasi architecto beatae
vitae dicta sunt explicabo. Nemo enim ipsam
voluptatem quia voluptas sit aspernatur aut odit
aut fugit, sed quia consequuntur magni dolores eos
qui ratione voluptatem sequi nesciunt. Neque porro
quisquam est, qui dolorem ipsum quia dolor sit
amet, consectetur, adipisci velit, sed quia non
numquam eius modi tempora incidunt ut labore et
dolore magnam aliquam quaerat voluptatem. Ut enim
ad minima veniam, quis nostrum exercitationem
ullam corporis suscipit laboriosam, nisi ut
aliquid ex ea commodi consequatur? Quis autem vel
eum iure reprehenderit qui in ea voluptate velit
esse quam nihil molestiae consequatur, vel illum
qui dolorem eum fugiat quo voluptas nulla pariatur
EO_TEXT

my @wordlist = qw( hello explicabo reprehenderit random );

unless ( -1 == first_index { index( $text, $_ ) > -1 } @wordlist ) {
print "One of the words in the word list appears in the text.\n";
}

__END__





--
A. Sinan Unur <>
(remove .invalid and reverse each component for email address)
clpmisc guidelines: <URL:http://www.augustmail.com/~tadmc/clpmisc.shtml>

 
Reply With Quote
 
A. Sinan Unur
Guest
Posts: n/a
 
      10-19-2007
"A. Sinan Unur" <> wrote in
news:Xns99CE6A93E8341asu1cornelledu@127.0.0.1:


>> Then what would be needed to match only strings that do not
>> CONTAIN the word "apple" or "banana" or "cherry"?

>
> unless (
> index( $s, 'apple' ) > -1
> index( $s, 'banana' ) > -1
> index( $s, 'cherry' ) > -1
> ) {


Oooops.

unless (
index( $s, 'apple' ) > -1
or index( $s, 'banana' ) > -1
or index( $s, 'cherry' ) > -1
) {

Sinan

--
A. Sinan Unur <>
(remove .invalid and reverse each component for email address)
clpmisc guidelines: <URL:http://www.augustmail.com/~tadmc/clpmisc.shtml>

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Regular expression for matching words containing underscore _character Raj Perl Misc 5 12-13-2007 10:11 AM
Regular expression to match any line that DOESN'T begin with a particular string weyus@att.net Perl Misc 7 03-25-2006 12:32 PM
Regular Expression - looking to match 'www' only if it is the start of a URL hooterbite@yahoo.com ASP .Net 0 07-20-2005 04:11 PM
Regular Expression - looking to match 'www' only if it the start of a URL hooterbite@yahoo.com ASP .Net 4 07-12-2005 01:01 PM
Regular expression to match particular lines between markers MENTAT Perl Misc 5 02-15-2005 10:40 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57