Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Ruby (http://www.velocityreviews.com/forums/f66-ruby.html)
-   -   Negate a character sequence in a regular expression? (http://www.velocityreviews.com/forums/t846180-negate-a-character-sequence-in-a-regular-expression.html)

crm_114@mac.com 11-29-2007 09:47 PM

Negate a character sequence in a regular expression?
 
For the following string:

'cat sheep horse cat tac dog'

I would like to write a regular expression that matches any substring
that is prefixed by the word 'cat', is then followed by any characters
as long as those characters do not comprise the word 'cat', and then
finally suffixed by the string 'dog'. Therefore, this expression
should match the substring 'cat tac dog' in the above string.

Obviously, if I write an expression like:

irb(main):002:0> /cat.*dog/.match('cat sheep horse cat tac dog').to_s
=> "cat sheep horse cat tac dog"

it will match the entire string.

And the non-greedy Kleene doesn't buy me anything either since the
expression matches the first cat found anyway:

irb(main):003:0> /cat.*?dog/.match('cat sheep horse cat tac dog').to_s
=> "cat sheep horse cat tac dog"


What I think I want to do is to negate a sequence of characters,
rather than just a character class, but I have looked around and not
found anything quite right.

Of course, there are ways of hacking this out, e.g. I could reverse
the string first and match 'god' followed by the first instance of
'tac', but I am hoping there is a more elegant way to do this with a
single regular expression.

Thanks--


James Moore 11-29-2007 11:06 PM

Re: Negate a character sequence in a regular expression?
 
I think this gets you close to where you want to be:

irb(main):045:0> 'cat sheep horse cat tac dog' =~ /cat(?!.*cat)(.*)dog/
=> 16
irb(main):046:0> $1
=> " tac "
irb(main):047:0>

The (?! bit is a nonmatching lookahead.

On Nov 29, 2007 2:07 PM, <crm_114@mac.com> wrote:
> For the following string:
>
> 'cat sheep horse cat tac dog'
>
> I would like to write a regular expression that matches any substring
> that is prefixed by the word 'cat', is then followed by any characters
> as long as those characters do not comprise the word 'cat', and then
> finally suffixed by the string 'dog'. Therefore, this expression
> should match the substring 'cat tac dog' in the above string.


I think this gets you closer to where you want to be:

irb(main):045:0> 'cat sheep horse cat tac dog' =~ /cat(?!.*cat)(.*)dog/
=> 16
irb(main):046:0> $1
=> " tac "
irb(main):047:0>

The (?! bit is a nonmatching lookahead.

--
James Moore | james@restphone.com
Ruby and Ruby on Rails consulting
blog.restphone.com


MonkeeSage 11-30-2007 12:26 AM

Re: Negate a character sequence in a regular expression?
 
On Nov 29, 3:47 pm, crm_...@mac.com wrote:
> For the following string:
>
> 'cat sheep horse cat tac dog'
>
> I would like to write a regular expression that matches any substring
> that is prefixed by the word 'cat', is then followed by any characters
> as long as those characters do not comprise the word 'cat', and then
> finally suffixed by the string 'dog'. Therefore, this expression
> should match the substring 'cat tac dog' in the above string.
>
> Obviously, if I write an expression like:
>
> irb(main):002:0> /cat.*dog/.match('cat sheep horse cat tac dog').to_s
> => "cat sheep horse cat tac dog"
>
> it will match the entire string.
>
> And the non-greedy Kleene doesn't buy me anything either since the
> expression matches the first cat found anyway:
>
> irb(main):003:0> /cat.*?dog/.match('cat sheep horse cat tac dog').to_s
> => "cat sheep horse cat tac dog"
>
> What I think I want to do is to negate a sequence of characters,
> rather than just a character class, but I have looked around and not
> found anything quite right.
>
> Of course, there are ways of hacking this out, e.g. I could reverse
> the string first and match 'god' followed by the first instance of
> 'tac', but I am hoping there is a more elegant way to do this with a
> single regular expression.
>
> Thanks--


If you just want the right-most match of the substring prefixed by
'cat' and suffixed by 'dog', this should work: /.*(cat.*dog)/.match(s)
[1].

Regards,
Jordan

yermej 11-30-2007 01:21 AM

Re: Negate a character sequence in a regular expression?
 
On Nov 29, 5:06 pm, James Moore <jamesthepi...@gmail.com> wrote:
> On Nov 29, 2007 2:07 PM, <crm_...@mac.com> wrote:
>
> > For the following string:

>
> > 'cat sheep horse cat tac dog'

>
> > I would like to write a regular expression that matches any substring
> > that is prefixed by the word 'cat', is then followed by any characters
> > as long as those characters do not comprise the word 'cat', and then
> > finally suffixed by the string 'dog'. Therefore, this expression
> > should match the substring 'cat tac dog' in the above string.

>
> I think this gets you closer to where you want to be:
>
> irb(main):045:0> 'cat sheep horse cat tac dog' =~ /cat(?!.*cat)(.*)dog/
> => 16
> irb(main):046:0> $1
> => " tac "
> irb(main):047:0>
>
> The (?! bit is a nonmatching lookahead.


Nonmatching (or negative) lookahead is what you want, and with some
adjustment of the capture you get:

> 'cat sheep horse cat tac dog' =~ /(cat(?!.*cat).*dog)/

=> 16
> $1

=> "cat tac dog"

Daniel Sheppard 11-30-2007 01:24 AM

Re: Negate a character sequence in a regular expression?
 
> For the following string:
>=20
> 'cat sheep horse cat tac dog'
>=20
> I would like to write a regular expression that matches any substring
> that is prefixed by the word 'cat', is then followed by any characters
> as long as those characters do not comprise the word 'cat', and then
> finally suffixed by the string 'dog'. Therefore, this expression
> should match the substring 'cat tac dog' in the above string.


Working out negative regular expressions is normally best avoided.

One step is hard. Two steps is not:

x =3D 'cat sheep horse cat tac dog'
/(cat.*?dog)/.match(x) && $1.sub(/.*cat/,'cat')

Or if you want multiple matches:

x =3D 'cat sheep horse cat tac dog cat cat sheep dog'
x.scan(/cat.*?dog/).map {|x| x.sub(/.*cat/,'cat')}
=3D> ["cat tac dog", "cat sheep dog"]

Dan.


Raul Parolari 12-01-2007 08:19 AM

Re: Negate a character sequence in a regular expression?
 
yermej wrote:
>
> Nonmatching (or negative) lookahead is what you want, and with some
> adjustment of the capture you get:
>
> 'cat sheep horse cat tac dog' =~ /(cat(?!.*cat).*dog)/
> => 16
> $1
> => "cat tac dog"


Negative lookaheads that contain '.*' are hard to comprehend (at least
for me). It is enough to add a 'cat' at the end and the regexp does not
find any more the 'cat tac dog' that should be matched:

'cat sheep horse cat tac dog lion cat' =~ /(cat(?!.*cat).*dog)/
=> nil

Also notice that, when it works, it will always give the last expression
present:
'cat sheep horse cat tac dog lion cat dog' =~ /(cat(?!.*cat).*dog)/
=> 33
p $1 # => "cat dog"


Daniel Sheppard wrote:
> Working out negative regular expressions is normally best avoided.


I would say that it is true when they contain '.*' type expressions;
else they can be extremely useful.

> Or if you want multiple matches:
>
> x = 'cat sheep horse cat tac dog cat cat sheep dog'
> x.scan(/cat.*?dog/).map {|x| x.sub(/.*cat/,'cat')}
> => ["cat tac dog", "cat sheep dog"]


Very ingenious...

Raul
--
Posted via http://www.ruby-forum.com/.


yermej 12-01-2007 09:10 AM

Re: Negate a character sequence in a regular expression?
 
On Dec 1, 2:19 am, Raul Parolari <raulparol...@gmail.com> wrote:
> yermej wrote:
>
> > Nonmatching (or negative) lookahead is what you want, and with some
> > adjustment of the capture you get:

>
> > 'cat sheep horse cat tac dog' =~ /(cat(?!.*cat).*dog)/
> > => 16
> > $1
> > => "cat tac dog"

>
> Negative lookaheads that contain '.*' are hard to comprehend (at least
> for me). It is enough to add a 'cat' at the end and the regexp does not
> find any more the 'cat tac dog' that should be matched:
>
> 'cat sheep horse cat tac dog lion cat' =~ /(cat(?!.*cat).*dog)/
> => nil
>
> Also notice that, when it works, it will always give the last expression
> present:
> 'cat sheep horse cat tac dog lion cat dog' =~ /(cat(?!.*cat).*dog)/
> => 33
> p $1 # => "cat dog"
>
> Daniel Sheppard wrote:
> > Working out negative regular expressions is normally best avoided.

>
> I would say that it is true when they contain '.*' type expressions;
> else they can be extremely useful.
>
> > Or if you want multiple matches:

>
> > x = 'cat sheep horse cat tac dog cat cat sheep dog'
> > x.scan(/cat.*?dog/).map {|x| x.sub(/.*cat/,'cat')}
> > => ["cat tac dog", "cat sheep dog"]

>
> Very ingenious...
>
> Raul
> --
> Posted viahttp://www.ruby-forum.com/.


Thanks, Raul, for the clarification on that.

To the original poster, I apologize for the misinformation. I guess
when I'm not completely sure about such things, I should start a new
thread, but attempting to answer questions here (some of which I do
get right) has been a great help to me in my own learning. I think
that starting new threads on all occasions would get to be too much
and I probably wouldn't get many responses. In the future, I may just
keep quiet until I'm either sure I have the correct answer or at least
don't understand why my answer isn't correct.

MonkeeSage 12-01-2007 09:29 AM

Re: Negate a character sequence in a regular expression?
 
On Dec 1, 3:10 am, yermej <yer...@gmail.com> wrote:
> On Dec 1, 2:19 am, Raul Parolari <raulparol...@gmail.com> wrote:
>
>
>
> > yermej wrote:

>
> > > Nonmatching (or negative) lookahead is what you want, and with some
> > > adjustment of the capture you get:

>
> > > 'cat sheep horse cat tac dog' =~ /(cat(?!.*cat).*dog)/
> > > => 16
> > > $1
> > > => "cat tac dog"

>
> > Negative lookaheads that contain '.*' are hard to comprehend (at least
> > for me). It is enough to add a 'cat' at the end and the regexp does not
> > find any more the 'cat tac dog' that should be matched:

>
> > 'cat sheep horse cat tac dog lion cat' =~ /(cat(?!.*cat).*dog)/
> > => nil

>
> > Also notice that, when it works, it will always give the last expression
> > present:
> > 'cat sheep horse cat tac dog lion cat dog' =~ /(cat(?!.*cat).*dog)/
> > => 33
> > p $1 # => "cat dog"

>
> > Daniel Sheppard wrote:
> > > Working out negative regular expressions is normally best avoided.

>
> > I would say that it is true when they contain '.*' type expressions;
> > else they can be extremely useful.

>
> > > Or if you want multiple matches:

>
> > > x = 'cat sheep horse cat tac dog cat cat sheep dog'
> > > x.scan(/cat.*?dog/).map {|x| x.sub(/.*cat/,'cat')}
> > > => ["cat tac dog", "cat sheep dog"]

>
> > Very ingenious...

>
> > Raul
> > --
> > Posted viahttp://www.ruby-forum.com/.

>
> Thanks, Raul, for the clarification on that.
>
> To the original poster, I apologize for the misinformation. I guess
> when I'm not completely sure about such things, I should start a new
> thread, but attempting to answer questions here (some of which I do
> get right) has been a great help to me in my own learning. I think
> that starting new threads on all occasions would get to be too much
> and I probably wouldn't get many responses. In the future, I may just
> keep quiet until I'm either sure I have the correct answer or at least
> don't understand why my answer isn't correct.


Not to blow my own horn, but I think the behavior requested by the OP
is still /.*(cat.*dog)/...("a regular expression that matches any
substring
that is prefixed by the word 'cat', is then followed by any characters
as long as those characters do not comprise the word 'cat', and then
finally suffixed by the string 'dog'").

But don't feel like you have to be 100% correct in order to help out.
My answers are often bass-ackwards (not really a bragging point, heh).
But the point is, like you say, to learn from each other and help
where we can. None of us knows it all; all we can do is our best, and
pray it might help someone here or there. :)

Regards,
Jordan

Tanaka Akira 12-01-2007 09:35 AM

Re: Negate a character sequence in a regular expression?
 
In article <238b37f9-2df1-42b8-b821-dbb6c24a9d9e@s8g2000prg.googlegroups.com>,
crm_114@mac.com writes:

> For the following string:
>
> 'cat sheep horse cat tac dog'
>
> I would like to write a regular expression that matches any substring
> that is prefixed by the word 'cat', is then followed by any characters
> as long as those characters do not comprise the word 'cat', and then
> finally suffixed by the string 'dog'. Therefore, this expression
> should match the substring 'cat tac dog' in the above string.


% /usr/bin/ruby -e 'p /cat((?!cat).)*dog/.match("cat sheep horse cat tac dog").to_s'
"cat tac dog"
--
Tanaka Akira


Raul Parolari 12-01-2007 08:22 PM

Re: Negate a character sequence in a regular expression?
 
yermej wrote:

>> Raul
>> --
>> Posted viahttp://www.ruby-forum.com/.

>
> Thanks, Raul, for the clarification on that.
>
> To the original poster, I apologize for the misinformation.
> In the future, I may just keep quiet until I'm either sure I have
> the correct answer or at least
> don't understand why my answer isn't correct.


yermej,

I often follow your postings, that I find very interesting. But do not
take hard this; no one can be perfect all of the time.

In this case, it is very easy to fall in the alluring trap of the
negative lookahead with '.*' (see how many people keep posting a variant
of that solution.. ?!).

However, I totally share one's feeling of dismay when we give an advice
that turns out not to be right (or just not completely right); we just
have to accept that we are fallible.

In this case hats off to Daniel Sheppard, who showed us how to do it (I
rewrite it to reinstate it, after all the emails that followed):

x = 'cat sheep horse cat tac dog cat cat sheep dog cat'

x.scan(/cat.*?dog/).map {|x| x.sub(/.*cat/,'cat')}

=> ["cat tac dog", "cat sheep dog"]

Absolutely brilliant

Raul



--
Posted via http://www.ruby-forum.com/.



All times are GMT. The time now is 06:27 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.