Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > attaching code to run on regular expression match

Reply
Thread Tools

attaching code to run on regular expression match

 
 
Eyal Oren
Guest
Posts: n/a
 
      10-19-2005
Hi,

I am parsing query expressions, using a regular expression with
multiple matches in it, e.g. /(\w+)\w+)/.

I would like some code to execute on the first match (e.g.
constructing some object out of it) and some other code on the second
match (e.g. constructing some other object).

I can of course check the array of matches and find the non-nil
element, and decide which code to execute. But that becomes very
cumbersome with a large regex (with say 10 different matches).

So I would rather like to attach some code in a match directly, as one
does in parsing generators, e.g.
/(\w+:do_method)\w+:do_other_method)/.

Would something like that be possible in Ruby? I tried searching but
I'm not sure how such a feature would be called.


 
Reply With Quote
 
 
 
 
Brian Schröder
Guest
Posts: n/a
 
      10-19-2005
On 19/10/05, Eyal Oren <(E-Mail Removed)> wrote:
> Hi,
>
> I am parsing query expressions, using a regular expression with
> multiple matches in it, e.g. /(\w+)\w+)/.
>
> I would like some code to execute on the first match (e.g.
> constructing some object out of it) and some other code on the second
> match (e.g. constructing some other object).
>
> I can of course check the array of matches and find the non-nil
> element, and decide which code to execute. But that becomes very
> cumbersome with a large regex (with say 10 different matches).
>
> So I would rather like to attach some code in a match directly, as one
> does in parsing generators, e.g.
> /(\w+:do_method)\w+:do_other_method)/.
>
> Would something like that be possible in Ruby? I tried searching but
> I'm not sure how such a feature would be called.
>
>


Maybe you can refactor your regexp to be used with scan.

irb(main):001:0> "some words to change".scan(/\w+/) do | w | puts w.upcase =
end
SOME
WORDS
TO
CHANGE
=3D> "some words to change"

hth,
Brian

--
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/


 
Reply With Quote
 
 
 
 
Robert Klemme
Guest
Posts: n/a
 
      10-19-2005
Eyal Oren wrote:
> Hi,
>
> I am parsing query expressions, using a regular expression with
> multiple matches in it, e.g. /(\w+)\w+)/.
>
> I would like some code to execute on the first match (e.g.
> constructing some object out of it) and some other code on the second
> match (e.g. constructing some other object).
>
> I can of course check the array of matches and find the non-nil
> element, and decide which code to execute. But that becomes very
> cumbersome with a large regex (with say 10 different matches).
>
> So I would rather like to attach some code in a match directly, as one
> does in parsing generators, e.g.
> /(\w+:do_method)\w+:do_other_method)/.
>
> Would something like that be possible in Ruby? I tried searching but
> I'm not sure how such a feature would be called.


No, I don't think it's possible. You can do this

string.scan(/(\w+)\w+)/) do |match|
case match.inject(1) {|pos,x| break pos if x;pos + 1}
when 1
# code for group 1
when 2
# ...
end
end

Kind regards

robert

 
Reply With Quote
 
Eyal Oren
Guest
Posts: n/a
 
      10-19-2005
On 19/10/05, Brian Schr=F6der <(E-Mail Removed)> wrote:
> On 19/10/05, Eyal Oren <(E-Mail Removed)> wrote:
> > Hi,
> >
> > I am parsing query expressions, using a regular expression with
> > multiple matches in it, e.g. /(\w+)\w+)/.
> >
> > I would like some code to execute on the first match (e.g.
> > constructing some object out of it) and some other code on the second
> > match (e.g. constructing some other object).
> >
> > I can of course check the array of matches and find the non-nil
> > element, and decide which code to execute. But that becomes very
> > cumbersome with a large regex (with say 10 different matches).
> >
> > So I would rather like to attach some code in a match directly, as one
> > does in parsing generators, e.g.
> > /(\w+:do_method)\w+:do_other_method)/.
> >
> > Would something like that be possible in Ruby? I tried searching but
> > I'm not sure how such a feature would be called.

>
> Maybe you can refactor your regexp to be used with scan.
>
> irb(main):001:0> "some words to change".scan(/\w+/) do | w | puts w.upcas=

e end
> SOME
> WORDS
> TO
> CHANGE
> =3D> "some words to change"

I am not sure that would help, I need to know which of the matches
occurred, because the actions are different for different matches (you
just 'put' all matches).

In your example, "Some words To change" say I want to print the
capitalised words normally, and print the others reversed. I can make
a regex that captures both these words in two groups, but scan
wouldn't work because I wouldn't know if a match was from group one or
group two.

But AFAIK I cannot ask the resulting match which regex he was matched
by, so I still do not know what to do. I could of course test each
regex on the matched word again, but that is not efficient.


 
Reply With Quote
 
Pit Capitain
Guest
Posts: n/a
 
      10-19-2005
Eyal Oren schrieb:
> So I would rather like to attach some code in a match directly, as one
> does in parsing generators, e.g.
> /(\w+:do_method)\w+:do_other_method)/.
>
> Would something like that be possible in Ruby? I tried searching but
> I'm not sure how such a feature would be called.


I'm sure I'm missing something, but wouldn't this work:

string.scan(/(\w+)\w+)/) do |m1, m2|
do_method(m1)
do_other_method(m2)
end

Maybe you can show us one of your complex regex?

Regards,
Pit


 
Reply With Quote
 
Eyal Oren
Guest
Posts: n/a
 
      10-19-2005
thanks. that might work, but the problem is I think in the unions of
the regexps that I use, see example:

because of the unions, I don't really want to decide after the match
what to do with it, but rather state it in the constituent regexp's
(e.g., I would like to say in the ImplicitWiki regexp what should
happen if it is encountered)


ExplicitWiki = /\[\[([^\]]+)\]\]/

# CamelCase followed by some non-word character, e.g. 'CamelCase.'
ImplicitWiki = /([A-Z]+[a-z]+[A-Z]+\w*)\W/

# <...>, no space inside brackets
Uri = /<([^<>]+)>/

# dc:title
Prefix = /(\w*)\w+)/

# "hello"
Literal = /"([^"]*)"/

Wiki = Regexp.union ExplicitWiki, ImplicitWiki
Pred = Regexp.union Wiki, Uri, Prefix
Obj = Regexp.union Pred, Literal
Annotation = /(#{Pred})\s*(#{Obj})\s*\./

Variable = /(\?\w+)/
UriPattern = Regexp.union Variable, Pred
LiteralPattern = Regexp.union Variable, Obj
Query = /\[\?\s+#{UriPattern}\s+#{UriPattern}\s+#{LiteralPatt ern}\]/

 
Reply With Quote
 
Pit Capitain
Guest
Posts: n/a
 
      10-19-2005
Eyal Oren schrieb:
> thanks. that might work, but the problem is I think in the unions of
> the regexps that I use, see example:
>
> because of the unions, I don't really want to decide after the match
> what to do with it, but rather state it in the constituent regexp's
> (e.g., I would like to say in the ImplicitWiki regexp what should
> happen if it is encountered)
>
>
> ExplicitWiki = /\[\[([^\]]+)\]\]/
>
> # CamelCase followed by some non-word character, e.g. 'CamelCase.'
> ImplicitWiki = /([A-Z]+[a-z]+[A-Z]+\w*)\W/
>
> # <...>, no space inside brackets
> Uri = /<([^<>]+)>/
>
> # dc:title
> Prefix = /(\w*)\w+)/
>
> # "hello"
> Literal = /"([^"]*)"/
>
> Wiki = Regexp.union ExplicitWiki, ImplicitWiki
> Pred = Regexp.union Wiki, Uri, Prefix
> Obj = Regexp.union Pred, Literal
> Annotation = /(#{Pred})\s*(#{Obj})\s*\./
>
> Variable = /(\?\w+)/
> UriPattern = Regexp.union Variable, Pred
> LiteralPattern = Regexp.union Variable, Obj
> Query = /\[\?\s+#{UriPattern}\s+#{UriPattern}\s+#{LiteralPatt ern}\]/


OK, thanks for your example. I think the regexp engine of Ruby 1.9
called Oniguruma supports something like named sub-expressions, which
might be what you need.

Regards,
Pit


 
Reply With Quote
 
David Holroyd
Guest
Posts: n/a
 
      10-19-2005
On Wed, Oct 19, 2005 at 08:16:58PM +0900, Eyal Oren wrote:
> thanks. that might work, but the problem is I think in the unions of
> the regexps that I use, see example:
>
> because of the unions, I don't really want to decide after the match
> what to do with it, but rather state it in the constituent regexp's
> (e.g., I would like to say in the ImplicitWiki regexp what should
> happen if it is encountered)
>
>
> ExplicitWiki = /\[\[([^\]]+)\]\]/
>
> # CamelCase followed by some non-word character, e.g. 'CamelCase.'
> ImplicitWiki = /([A-Z]+[a-z]+[A-Z]+\w*)\W/
>
> # <...>, no space inside brackets
> Uri = /<([^<>]+)>/
>
> # dc:title
> Prefix = /(\w*)\w+)/
>
> # "hello"
> Literal = /"([^"]*)"/
>
> Wiki = Regexp.union ExplicitWiki, ImplicitWiki
> Pred = Regexp.union Wiki, Uri, Prefix
> Obj = Regexp.union Pred, Literal
> Annotation = /(#{Pred})\s*(#{Obj})\s*\./
>
> Variable = /(\?\w+)/
> UriPattern = Regexp.union Variable, Pred
> LiteralPattern = Regexp.union Variable, Obj
> Query = /\[\?\s+#{UriPattern}\s+#{UriPattern}\s+#{LiteralPatt ern}\]/


I wrote the following a long time ago when I was new to Ruby. Maybe you
could use a similar pattern,

----------------------------------------------------------------------
# Perform (possibly) multiple global substitutions on a string.
# the regexps given as keys must not use capturing subexpressions
# '(...)'
class MultiSub
# hash has regular expression fragments (as strings) as keys, mapped
# to
# Procs that will generate replacement text, given the matched value.
def initialize(hash)
@mash = Array.new
expr = nil
hash.each do |key,val|
if expr == nil ; expr="(" else expr<<"|(" end
expr << key << ")"
@mash << val
end
@re = Regexp.new(expr)
end

# perform a global multi-sub on the given text, modifiying the passed
# string
# 'in place'
def gsub!(text)
text.gsub!(@re) { |match|
idx = -1
$~.to_a.each { |subexp|
break unless idx==-1 || subexp==nil
idx += 1
}
idx==-1 ? match : @mash[idx].call(match)
}
end
end

# example,

mailSub = proc { |match| "<a href=\"mailto:#{match}\">#{match}</a>" }
urlSub = proc { |match| "<a href=\"#{match}\">#{match}</a>" }

sub = MultiSub.new ({
'(?:mailto?[\w\.\-\+\=]+\@[\w\-]+(?:\.[\w\-]+)+\b' => mailSub,
'\b(?:http|https|ftp):[^ \t\n<>"]+[\w/]' => urlSub
})

test = "...."
sub.gsub!(test)
puts test
----------------------------------------------------------------------

ta,
dave

--
http://david.holroyd.me.uk/


 
Reply With Quote
 
Kevin Ballard
Guest
Posts: n/a
 
      10-19-2005

Pit Capitain wrote:
> OK, thanks for your example. I think the regexp engine of Ruby 1.9
> called Oniguruma supports something like named sub-expressions, which
> might be what you need.


Oniguruma is indeed the regexp engine of Ruby, but are you sure named
subexpressions aren't already in Ruby? I thought they were, but I've
only actually used them in TextMate (an OS X text editor that uses
Oniguruma as its regex engine).

Hrm, I just tested and it does appear that named subexpressions aren't
in Ruby 1.8. That's interesting, because I thought Oniguruma supported
them quite a while ago.

 
Reply With Quote
 
Christophe Grandsire
Guest
Posts: n/a
 
      10-19-2005
Selon Kevin Ballard <(E-Mail Removed)>:

>
> Hrm, I just tested and it does appear that named subexpressions aren't
> in Ruby 1.8. That's interesting, because I thought Oniguruma supported
> them quite a while ago.
>


I thought Oniguruma was not yet the regex engine of Ruby, but would becom=
e it
from Ruby2 on (is it already the engine in Ruby 1.9?), i.e. it is not the=
regex
engine of Ruby 1.8.
--
Christophe Grandsire.

http://rainbow.conlang.free.fr

It takes a straight mind to create a twisted conlang.


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Regular Expression - looking to match 'www' only if it is the start of a URL hooterbite@yahoo.com ASP .Net 0 07-20-2005 04:11 PM
Regular Expression - looking to match 'www' only if it the start of a URL hooterbite@yahoo.com ASP .Net 4 07-12-2005 01:01 PM
how to match regular expression from right to left Liang Perl 2 08-27-2004 10:03 PM
match three digit number using regular expression championsleeper Perl 6 04-06-2004 08:54 PM
Dynamically changing the regular expression of Regular Expression validator VSK ASP .Net 2 08-24-2003 02:47 PM



Advertisments