Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > [Brainstorming Input] Ruby-Oniguruma interoperability on Named Groups

Reply
Thread Tools

[Brainstorming Input] Ruby-Oniguruma interoperability on Named Groups

 
 
Wolfgang Nádasi-Donner
Guest
Posts: n/a
 
      07-30-2005
Let me first explain the reason for and the kind of this message.

I have an vague idea on coming to more readable Regular Expression, and the possibility to build Libraries of
Regular Expressions. The hook are the named groups ('(?<name>...)') which are part of Oniguruma. The idea was
influenced by the ancient Snobol4-Language and its '*'-operator for unevaluated expressions.

This input is brainstorming material and not a change proposal, because it is not mature enough. I hope that
something like this will appear sometimes in the future in Ruby.

Now the idea.

Ruby and Onigurama should be extended somehow to allow Ruby-objects (usually regular expressions) to be
registered somehow to the class Regexp, so that they can be referenced later in regular expressions.

In detail a regular expression that consists only of a named group definition (starts with '(?<name>') kann be
registered by something like 'Regex.register(/(?<example>a|b|c|d)/)', and be deleted by
Regex.remove('<example>'). If the regular expression is assigned to a variable this can be used, how to manage
this in the 'remove' case has to be clearified. I used class methods for this example, but it may be better to
introduce a named Regexp objects which will be created by something like '/(?<example>a|b|c|d)/.create. Some
possibility for explicit deletion should be there, because the regex engine Oniguruma must know about the
object to take care about.

These Object can later on be referenced in regular expressions by '\k<name>' or '\g<name>' as if they were
defined there.

This could made regular expressions be much more readable, because one can build them based on smaller parts,
one can build special Libraries of regular expression parts that are usable in the applications, and one can
use regular expression parts that were build by others without complete understanding of their details.

I think that this is worth to think about.

Best regards, Wolfgang

--
Wolfgang Nádasi-Donner
http://www.velocityreviews.com/forums/(E-Mail Removed)


 
Reply With Quote
 
 
 
 
Brian Schröder
Guest
Posts: n/a
 
      07-30-2005
On 30/07/05, Wolfgang N=E1dasi-Donner <(E-Mail Removed)> wrote:
> Let me first explain the reason for and the kind of this message.
>=20
> I have an vague idea on coming to more readable Regular Expression, and t=

he possibility to build Libraries of
> Regular Expressions. The hook are the named groups ('(?<name>...)') which=

are part of Oniguruma. The idea was
> influenced by the ancient Snobol4-Language and its '*'-operator for uneva=

luated expressions.
>=20
> This input is brainstorming material and not a change proposal, because i=

t is not mature enough. I hope that
> something like this will appear sometimes in the future in Ruby.
>=20
> Now the idea.
>=20
> Ruby and Onigurama should be extended somehow to allow Ruby-objects (usua=

lly regular expressions) to be
> registered somehow to the class Regexp, so that they can be referenced la=

ter in regular expressions.
>=20
> In detail a regular expression that consists only of a named group defini=

tion (starts with '(?<name>') kann be
> registered by something like 'Regex.register(/(?<example>a|b|c|d)/)', and=

be deleted by
> Regex.remove('<example>'). If the regular expression is assigned to a var=

iable this can be used, how to manage
> this in the 'remove' case has to be clearified. I used class methods for =

this example, but it may be better to
> introduce a named Regexp objects which will be created by something like =

'/(?<example>a|b|c|d)/.create. Some
> possibility for explicit deletion should be there, because the regex engi=

ne Oniguruma must know about the
> object to take care about.
>=20
> These Object can later on be referenced in regular expressions by '\k<nam=

e>' or '\g<name>' as if they were
> defined there.
>=20
> This could made regular expressions be much more readable, because one ca=

n build them based on smaller parts,
> one can build special Libraries of regular expression parts that are usab=

le in the applications, and one can
> use regular expression parts that were build by others without complete u=

nderstanding of their details.
>=20
> I think that this is worth to think about.
>=20
> Best regards, Wolfgang


Hello Wolfgang,

where is the difference to



example =3D "(?<example>a|b|c)"
regex =3D /#{example}|nothing/

except that you make Regexp hold the example variable, and have a
parse test on the regexp. And you may get these by something like
this:

bschroed@black:~/svn/projekte/ruby-things$ cat regexp.rb

class Regexp
def self.register(name, regexp)
self.new(regexp.to_s)
(@registered_res ||=3D {})[name] =3D regexp.to_s
end

def self.[](name)
@registered_res[name]
end
end

Regexp.register(:example, 'a|b|c')

if /#{Regexp[:example]}|nothing/ =3D~ 'Well, that was just nothing'
puts "Contains an example or nothing"
end

Regexp.register(:invalid, '(invalid(')
bschroed@black:~/svn/projekte/ruby-things$ ruby regexp.rb=20
Contains an example or nothing
regexp.rb:4:in `initialize': premature end of regular expression:
/(invalid(/ (RegexpError)
from regexp.rb:4:in `new'
from regexp.rb:4:in `register'
from regexp.rb:19


So it seems a very specialized whish to me.

Regards,

Brian

--=20
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/


 
Reply With Quote
 
 
 
 
Wolfgang Nádasi-Donner
Guest
Posts: n/a
 
      07-30-2005
>>>>> snip >>>>>
Can't we do that already?

example = /a|b|c|d/
mybigregex = /#{example}|foo/

If you need more scope, use constants.
>>>>> snap >>>>>


It is not the same, because you include the textual data (it is somehow like usind the C preprocessor). There
are two disadvantages:

1) During debugging or things like this you don't see your constructed structure - you have to work with the
final regular expression

2) You cannot manage recursive constructs, which are possible using '\g<name>'. This is a standard part on
Oniguruma.

--
Wolfgang Nádasi-Donner
(E-Mail Removed)


 
Reply With Quote
 
Dominik Bathon
Guest
Posts: n/a
 
      07-31-2005
On Sat, 30 Jul 2005 20:31:03 +0200, Wolfgang N=E1dasi-Donner =20
<(E-Mail Removed)> wrote:

> Now the idea.
>
> Ruby and Onigurama should be extended somehow to allow Ruby-objects =20
> (usually regular expressions) to be
> registered somehow to the class Regexp, so that they can be referenced =

=20
> later in regular expressions.


I think I generally like the idea to compose regular expressions that way=
=20
...

> In detail a regular expression that consists only of a named group =20
> definition (starts with '(?<name>') kann be
> registered by something like 'Regex.register(/(?<example>a|b|c|d)/)', =20
> and be deleted by
> Regex.remove('<example>'). If the regular expression is assigned to a =20
> variable this can be used, how to manage
> this in the 'remove' case has to be clearified. I used class methods fo=

r =20
> this example, but it may be better to
> introduce a named Regexp objects which will be created by something lik=

e =20
> '/(?<example>a|b|c|d)/.create. Some
> possibility for explicit deletion should be there, because the regex =20
> engine Oniguruma must know about the
> object to take care about.


... but I think registering all named groups in one global place is not a=
=20
good idea (even if you can unregister): what if two libraries use the sam=
e =20
group names? I think there would be many name clashes.

So here is another idea: Let the caller manage the named groups himself. =
=20
Maybe in arrays or hashes. Something like:

groups =3D [/(?<example>a|b|c|d)/, /(?<example2>e|f|g)/]

or with hashes:

groups =3D { "example" =3D> /a|b|c|d/, "example2" =3D> /e|f|g/ }

or maybe in some specialized named groups library class.

> These Object can later on be referenced in regular expressions by =20
> '\k<name>' or '\g<name>' as if they were
> defined there.


To use those groups I would suggest something like:

/\k<example>/.with(groups)

RegExp#with would return the "composed" RegExp that can be used like any =
=20
other RegExp.

What do you think?

Dominik

Disclaimer: I do not really know how named groups work in Oniguruma, just=
=20
wanted to point out that one global registry might be a bad idea.


 
Reply With Quote
 
Nikolai Weibull
Guest
Posts: n/a
 
      07-31-2005
Wolfgang N_dasi-Donner wrote:

[blurb about named groups in regular expressions]

I hope that the people responsible for the regular-expression code in
Ruby 2.0 read http://www.perl.com/pub/a/2002/06/04/apo5.html before
going along with a Perl-5-inspired syntax with hopelessly ugly
extensions (I'm sorry, but \k<name> and \g<name> are just horrendous).
Perl 6=E2=80=99s way of defining grammars is quite neat and simple to
understand. I also have some ideas for a better syntax, which is
inspired by the afforementioned document, but I have yet to release
anything (it was part of my master=E2=80=99s thesis),
nikolai

--=20
Nikolai Weibull: now available free of charge at http://bitwi.se/!
Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}


 
Reply With Quote
 
Wolfgang Nádasi-Donner
Guest
Posts: n/a
 
      07-31-2005
"Dominik Bathon" <(E-Mail Removed)> schrieb im Newsbeitrag newsp.sur8hdg2d62ajc@localhost...
On Sat, 30 Jul 2005 20:31:03 +0200, Wolfgang Nádasi-Donner

>>>>> snip >>>>>

Disclaimer: I do not really know how named groups work in Oniguruma, ...
>>>>> snap >>>>>


I think Oniguruma is somehow stable and used for other projects too, but this may be a wrong information. I
took the Uniguruma syntax 'as given'.

--
Wolfgang Nádasi-Donner
(E-Mail Removed)


 
Reply With Quote
 
Wolfgang Nádasi-Donner
Guest
Posts: n/a
 
      07-31-2005
"Nikolai Weibull" <(E-Mail Removed)> schrieb im Newsbeitrag
news:(E-Mail Removed)...

>>>>> snip >>>>>

I hope that the people responsible for the regular-expression code in
Ruby 2.0 read http://www.perl.com/pub/a/2002/06/04/apo5.html before
going along with a Perl-5-inspired syntax with hopelessly ugly
extensions (I'm sorry, but \k<name> and \g<name> are just horrendous).
Perl 6's way of defining grammars is quite neat and simple to
understand. ...
>>>>> snap >>>>>


Is it a realistic idea to produce change proposals against Oniguruma? - As I understood it is a project in its
own right and used in different projects, not only Ruby.

--
Wolfgang Nádasi-Donner
(E-Mail Removed)


 
Reply With Quote
 
Simon Strandgaard
Guest
Posts: n/a
 
      07-31-2005
On 7/31/05, Nikolai Weibull
<(E-Mail Removed)> wrote:
[snip]
> Perl 6's way of defining grammars is quite neat and simple to
> understand. I also have some ideas for a better syntax, which is
> inspired by the afforementioned document, but I have yet to release
> anything (it was part of my master's thesis),


Indeed.. perl6's new regexp/grammar syntax is sweet

--
Simon Strandgaard


 
Reply With Quote
 
Wolfgang Nádasi-Donner
Guest
Posts: n/a
 
      07-31-2005
>>>>> snip >>>>>
"Dominik Bathon" <(E-Mail Removed)> schrieb im Newsbeitrag newsp.sur8hdg2d62ajc@localhost...
On Sat, 30 Jul 2005 20:31:03 +0200, Wolfgang Nádasi-Donner
..
..
..
... but I think registering all named groups in one global place is not a
good idea (even if you can unregister): what if two libraries use the same
group names? I think there would be many name clashes.

So here is another idea: Let the caller manage the named groups himself.
Maybe in arrays or hashes. Something like:

groups = [/(?<example>a|b|c|d)/, /(?<example2>e|f|g)/]

or with hashes:

groups = { "example" => /a|b|c|d/, "example2" => /e|f|g/ }

or maybe in some specialized named groups library class.

> These Object can later on be referenced in regular expressions by
> '\k<name>' or '\g<name>' as if they were
> defined there.


To use those groups I would suggest something like:

/\k<example>/.with(groups)

RegExp#with would return the "composed" RegExp that can be used like any
other RegExp.

What do you think?
>>>>> snap >>>>>


First of all - I made a mistake. Please forget all '\k<name>...'-stuff. This is the same as '\1', '\2', ...,
which means, it is a reference to a match result of applying this group in the actual matching process. We are
talking here about the '\g<name>...' reference only, which is a call to the group during match time. For
simply prematch time replacement the '#{...}' Ruby construct is still usable.

It is clear for my understanding that in the Ruby environment the class 'Regexp' must be changed, as well as
'Oniguruma' itself, because it must be able to find the predefined patterns during a match process.

My suggestion based this on the prerequisite to have minimal changes in Oniguruma and Ruby's Regexp class -
making such changes acceptable and possible This implies not to change existing things in Ruby and
Oniguruma. Insofar I prefer the usage of '\g<name>' instead of some other notation, but that are only my
thoughts for it.

The idea of using hashes in Ruby and an extension of class Regexp having a 'with' method sounds very good.
This method is a candidate for building the connection to Oniguruma, which then knows where to search for a
'(?<paul>...)' expression, if it isn't defined in the actual regular expression, but referenced via '\g<paul>'
there.

The 'with' method may be able to have a list of hashes as parameter (or even multiple hashes as parameters),
because one may use more than one predefined pattern groups (may happen if one uses a general pattern library
and a special one for the application).

--
Wolfgang Nádasi-Donner
(E-Mail Removed)


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
re.sub and named groups Emanuele D'Arrigo Python 7 02-11-2009 11:53 PM
PIX object-groups automatically created named "_ref" ? thefunnel@aol.com Cisco 1 10-17-2007 12:47 AM
Typed named groups in regular expression Hugo Ferreira Python 5 05-20-2007 10:40 AM
Working with named groups in re module Neil Cerutti Python 2 01-10-2007 04:14 PM
redemo.py with named groups moriwaka Python 0 08-21-2006 08:24 AM



Advertisments