Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Ruby regex engine behavior question

Reply
Thread Tools

Ruby regex engine behavior question

 
 
Daniel Berger
Guest
Posts: n/a
 
      09-13-2004
I read this in a journal entry:

"[In the Ruby 1.6 regex engine] \G doesn't prohibit regex bump-along
(it's 'start of current match' rather than 'end of last match'), which
makes relatively useless to write complex parsers with."

Can anyone comment on this? I'm not quite certain what he means. And
is it still the same in 1.8?

Regards,

Dan
 
Reply With Quote
 
 
 
 
ts
Guest
Posts: n/a
 
      09-13-2004
>>>>> "D" == Daniel Berger <(E-Mail Removed)> writes:

D> "[In the Ruby 1.6 regex engine] \G doesn't prohibit regex bump-along
^^^^^^^

are you sure of this ?

D> (it's 'start of current match' rather than 'end of last match'), which
D> makes relatively useless to write complex parsers with."


Guy Decoux





 
Reply With Quote
 
 
 
 
Daniel Berger
Guest
Posts: n/a
 
      09-14-2004
ts <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>...
> >>>>> "D" == Daniel Berger <(E-Mail Removed)> writes:

>
> D> "[In the Ruby 1.6 regex engine] \G doesn't prohibit regex bump-along
> ^^^^^^^
>
> are you sure of this ?
>
> D> (it's 'start of current match' rather than 'end of last match'), which
> D> makes relatively useless to write complex parsers with."
>
>
> Guy Decoux


No. That's why I'm asking. I'm merely quoting the entry I saw. Thoughts?

Dan
 
Reply With Quote
 
nobu.nokada@softhome.net
Guest
Posts: n/a
 
      09-14-2004
Hi,

At Tue, 14 Sep 2004 01:04:58 +0900,
Daniel Berger wrote in [ruby-talk:112395]:
> "[In the Ruby 1.6 regex engine] \G doesn't prohibit regex bump-along
> (it's 'start of current match' rather than 'end of last match'), which
> makes relatively useless to write complex parsers with."


I don't understand he means too. Th 'start' and the 'end'
should be same, since global match starts to match from the end
of last match.

--
Nobu Nakada


 
Reply With Quote
 
Daniel Berger
Guest
Posts: n/a
 
      09-14-2004
ts <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>...
> >>>>> "D" == Daniel Berger <(E-Mail Removed)> writes:

>
> D> "[In the Ruby 1.6 regex engine] \G doesn't prohibit regex bump-along
> ^^^^^^^
>
> are you sure of this ?
>
> D> (it's 'start of current match' rather than 'end of last match'), which
> D> makes relatively useless to write complex parsers with."
>
>
> Guy Decoux


The OP has further clarified. To quote:

When trying to match abcde with /\Gx?/g, the first match is
successful, because no x is found but the question mark allows zero
characters to be consumed. This match ends after zero characters into
the string — at start-of-string. In order to avoid infinite loops on a
zero-length matches, the engine then retries the match one position
down the string.

In Perl, \G means end-of-last-match, and since end-of-last-match was
at start-of-string, \G can't possibly match at one character into the
string:

$ perl -le'$_="abcde"; s/\Gx?/!/; print'
!abcde

In Ruby (both 1.6 and 1.8, I found), \G merely means
start-of-current-match, which, of course, is satisfiable at that
point:

$ ruby1.6 -e'puts "abcde".gsub(/\Gx?/,"!")'
!a!b!c!d!e!
$ ruby1.8 -e'puts "abcde".gsub(/\Gx?/,"!")'
!a!b!c!d!e!

Perl's \G is a powerful tool to write parsers because the regex engine
is prohibited from skipping characters to find a match — you can work
your way through a string with a multitude of patterns using /c (to
avoid resetting the end-of-last-match on match failure) applied
against the same string in turn, without them sabotaging each other.

End quote.

Thoughts?

Dan
 
Reply With Quote
 
ts
Guest
Posts: n/a
 
      09-14-2004
>>>>> "D" == Daniel Berger <(E-Mail Removed)> writes:

D> In Perl, \G means end-of-last-match, and since end-of-last-match was
D> at start-of-string, \G can't possibly match at one character into the
D> string:

This is one way to say it, another is

* on a zero length match, perl prohibit the second zero length match

* on a zero length match, ruby move its internal cursor


Guy Decoux


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How make regex that means "contains regex#1 but NOT regex#2" ?? seberino@spawar.navy.mil Python 3 07-01-2008 03:06 PM
.Net Search Engine - Has anyone used dtSearch .Net Engine? Sasha ASP .Net 3 05-22-2007 04:20 PM
wiki engine (just engine) available? loguser@almad.net Python 1 04-10-2006 07:58 AM
Is ASP Validator Regex Engine Same As VS2003 Find Regex Engine? =?Utf-8?B?SmViQnVzaGVsbA==?= ASP .Net 2 10-22-2005 02:43 PM
Which Regex-Engine will be used in Ruby 1.8.3 Release? Wolfgang Nádasi-Donner Ruby 3 07-29-2005 11:16 PM



Advertisments