Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Ruby (http://www.velocityreviews.com/forums/f66-ruby.html)
-   -   Announcing Reg 0.4.0 (http://www.velocityreviews.com/forums/t821247-announcing-reg-0-4-0-a.html)

vikkous 04-24-2005 02:31 AM

Announcing Reg 0.4.0
 
I would like to announce the first version, 0.4.0, of Reg, the Ruby
Extended Grammar. Reg is a library for pattern matching in ruby data
structures. Reg provides Regexp-like match and match-and-replace for
all data structures (particularly Arrays, Objects, and Hashes), not
just Strings.

The Reg RubyForge project: http://rubyforge.org/projects/reg/

The Reg Tarball:
http://rubyforge.org/frs/download.ph...-0.4.0.tar.bz2

Reg is best thought of in analogy to regular expressions; Regexps are
special data structures for matching Strings; Regs are special data
structures for matching ANY type of ruby data (Strings included, using
Regexps).

This table compares syntax of reg and regexp for various constructs.
Keep
in mind that all Regs are ordinary ruby expressions. The special syntax

is acheived by overriding ruby operators.
These abbreviations are used:
re,re1,re2 represent arbitrary regexp subexpressions,
r,r1,r2 represent arbitrary reg subexpressions
s,t represent any single character (perhaps appropriately escaped, if
the char is magical)


reg regexp #description

+[r1,r2,r3] /re1re2re3/ #sequence
-[r1,r2] (re1re2) #subsequence
r.lit \re #escaping a magical
regproc{r} #{re} #dynamic inclusion
r1|r2 or :OR (re1|re2) or [st] #alternation
~r [^s] #negation (for scalar r and s)
r+0 re* #zero or more matches
r+1 re+ #one or more matches
r-1 re? #zero or one matches
r*n re{n} #exactly n matches
r*(n..m) re{n,m} #at least n, at most m matches
r-n re{n,} #at least n matches
r+m re{,m} #at most m matches
OB . #a single item
OBS .* #zero or more items
BR[1,2] \1,\2 #backreference ***
r>>x or sub sub,gsub #search and replace ***


here are features of reg that don't have an equivalent in regexp
r.la #lookahead ***
~-[] #subsequence negation w/lookahead ***
& or :AND #all alternatives match
^ or :XOR #exactly one of alternatives matches
+{r1=>r2} #hash matcher
-{name=>r} #object matcher
obj.reg #turn any ruby object into a reg that matches if
obj.=== succeeds
/re/.sym #a symbol regex
proceq(klass){rcode} #a proc{} that responds to === by invoking the
proc's call
OBS as un-anchor #opposite of ^ and $ when placed at edges of a
reg array (kinda cheesy)
name=r #named subexpressions

recursive matches via regvariables&regconstants ***

*** = not implemented yet.


Reg is kind of hard to wrap your mind around, so here are some
examples:

Matches array containing exactly 2 elements; 1st is another array, 2nd
is integer:
+[Array,Integer]

Like above, but 1st is array of arrays of symbol
+[+[+[Symbol.reg+0]+0],Integer]

Matches array of at least 3 consecutive symbols and nothing else:
+[Symbol.reg+3]

Matches array with at least 3 symbols in it somewhere:
+[OBS, Symbol.reg+3, OBS]

Matches array of at most 6 strings starting with 'g'
+[/^g/-6] #no .reg necessary for regexp

Matches array of between 5 and 9 hashes containing a key :k pointing to
something non-nil:
+[ +{:k=>~nil.reg}*(5..9) ]

Matches an object with Integer instance variable @k and property (ie
method) foobar that returns a string with 'baz' somewhere in it:
-{:@k=>Integer, :foobar=>/baz/}

Matches array of 6 hashes with 6 as a value of at least one key,
followed by 18 objects with an attribute @s which is a String:
+[ +{OB=>6}*6, -{:@s=>String}*18 ]


Status:
Some highly nested vector reg constructions still don't work quite
right. (For examples, search on eat_unworking in regtest.rb.) A number
of features are unimplemented at this point, most notably
backreferences and substitutions.


Jon Raphaelson 04-24-2005 03:51 AM

Re: Announcing Reg 0.4.0
 
Ok, I'm going to go out on a limb here and say HOLY GOD THIS IS AWESOME.

Sorry for the shouting.




vikkous wrote:
> I would like to announce the first version, 0.4.0, of Reg, the Ruby
> Extended Grammar. Reg is a library for pattern matching in ruby data
> structures. Reg provides Regexp-like match and match-and-replace for
> all data structures (particularly Arrays, Objects, and Hashes), not
> just Strings.
>
> The Reg RubyForge project: http://rubyforge.org/projects/reg/
>
> The Reg Tarball:
> http://rubyforge.org/frs/download.ph...-0.4.0.tar.bz2
>
> Reg is best thought of in analogy to regular expressions; Regexps are
> special data structures for matching Strings; Regs are special data
> structures for matching ANY type of ruby data (Strings included, using
> Regexps).
>
> This table compares syntax of reg and regexp for various constructs.
> Keep
> in mind that all Regs are ordinary ruby expressions. The special syntax
>
> is acheived by overriding ruby operators.
> These abbreviations are used:
> re,re1,re2 represent arbitrary regexp subexpressions,
> r,r1,r2 represent arbitrary reg subexpressions
> s,t represent any single character (perhaps appropriately escaped, if
> the char is magical)
>
>
> reg regexp #description
>
> +[r1,r2,r3] /re1re2re3/ #sequence
> -[r1,r2] (re1re2) #subsequence
> r.lit \re #escaping a magical
> regproc{r} #{re} #dynamic inclusion
> r1|r2 or :OR (re1|re2) or [st] #alternation
> ~r [^s] #negation (for scalar r and s)
> r+0 re* #zero or more matches
> r+1 re+ #one or more matches
> r-1 re? #zero or one matches
> r*n re{n} #exactly n matches
> r*(n..m) re{n,m} #at least n, at most m matches
> r-n re{n,} #at least n matches
> r+m re{,m} #at most m matches
> OB . #a single item
> OBS .* #zero or more items
> BR[1,2] \1,\2 #backreference ***
> r>>x or sub sub,gsub #search and replace ***
>
>
> here are features of reg that don't have an equivalent in regexp
> r.la #lookahead ***
> ~-[] #subsequence negation w/lookahead ***
> & or :AND #all alternatives match
> ^ or :XOR #exactly one of alternatives matches
> +{r1=>r2} #hash matcher
> -{name=>r} #object matcher
> obj.reg #turn any ruby object into a reg that matches if
> obj.=== succeeds
> /re/.sym #a symbol regex
> proceq(klass){rcode} #a proc{} that responds to === by invoking the
> proc's call
> OBS as un-anchor #opposite of ^ and $ when placed at edges of a
> reg array (kinda cheesy)
> name=r #named subexpressions
>
> recursive matches via regvariables&regconstants ***
>
> *** = not implemented yet.
>
>
> Reg is kind of hard to wrap your mind around, so here are some
> examples:
>
> Matches array containing exactly 2 elements; 1st is another array, 2nd
> is integer:
> +[Array,Integer]
>
> Like above, but 1st is array of arrays of symbol
> +[+[+[Symbol.reg+0]+0],Integer]
>
> Matches array of at least 3 consecutive symbols and nothing else:
> +[Symbol.reg+3]
>
> Matches array with at least 3 symbols in it somewhere:
> +[OBS, Symbol.reg+3, OBS]
>
> Matches array of at most 6 strings starting with 'g'
> +[/^g/-6] #no .reg necessary for regexp
>
> Matches array of between 5 and 9 hashes containing a key :k pointing to
> something non-nil:
> +[ +{:k=>~nil.reg}*(5..9) ]
>
> Matches an object with Integer instance variable @k and property (ie
> method) foobar that returns a string with 'baz' somewhere in it:
> -{:@k=>Integer, :foobar=>/baz/}
>
> Matches array of 6 hashes with 6 as a value of at least one key,
> followed by 18 objects with an attribute @s which is a String:
> +[ +{OB=>6}*6, -{:@s=>String}*18 ]
>
>
> Status:
> Some highly nested vector reg constructions still don't work quite
> right. (For examples, search on eat_unworking in regtest.rb.) A number
> of features are unimplemented at this point, most notably
> backreferences and substitutions.
>
>
>





Phil Tomson 04-24-2005 04:15 AM

Re: Announcing Reg 0.4.0
 

Wow.

Just curious: what kind needs led you to develop this?

Phil

In article <1114309915.927676.128220@g14g2000cwa.googlegroups .com>,
vikkous <google@inforadical.net> wrote:
>I would like to announce the first version, 0.4.0, of Reg, the Ruby
>Extended Grammar. Reg is a library for pattern matching in ruby data
>structures. Reg provides Regexp-like match and match-and-replace for
>all data structures (particularly Arrays, Objects, and Hashes), not
>just Strings.
>
>The Reg RubyForge project: http://rubyforge.org/projects/reg/
>
>The Reg Tarball:
>http://rubyforge.org/frs/download.ph...-0.4.0.tar.bz2
>
>Reg is best thought of in analogy to regular expressions; Regexps are
>special data structures for matching Strings; Regs are special data
>structures for matching ANY type of ruby data (Strings included, using
>Regexps).
>
>This table compares syntax of reg and regexp for various constructs.
>Keep
>in mind that all Regs are ordinary ruby expressions. The special syntax
>
>is acheived by overriding ruby operators.
>These abbreviations are used:
>re,re1,re2 represent arbitrary regexp subexpressions,
>r,r1,r2 represent arbitrary reg subexpressions
>s,t represent any single character (perhaps appropriately escaped, if
>the char is magical)
>
>
>reg regexp #description
>
>+[r1,r2,r3] /re1re2re3/ #sequence
>-[r1,r2] (re1re2) #subsequence
>r.lit \re #escaping a magical
>regproc{r} #{re} #dynamic inclusion
>r1|r2 or :OR (re1|re2) or [st] #alternation
>~r [^s] #negation (for scalar r and s)
>r+0 re* #zero or more matches
>r+1 re+ #one or more matches
>r-1 re? #zero or one matches
>r*n re{n} #exactly n matches
>r*(n..m) re{n,m} #at least n, at most m matches
>r-n re{n,} #at least n matches
>r+m re{,m} #at most m matches
>OB . #a single item
>OBS .* #zero or more items
>BR[1,2] \1,\2 #backreference ***
>r>>x or sub sub,gsub #search and replace ***
>
>
>here are features of reg that don't have an equivalent in regexp
>r.la #lookahead ***
>~-[] #subsequence negation w/lookahead ***
>& or :AND #all alternatives match
>^ or :XOR #exactly one of alternatives matches
>+{r1=>r2} #hash matcher
>-{name=>r} #object matcher
>obj.reg #turn any ruby object into a reg that matches if
>obj.=== succeeds
>/re/.sym #a symbol regex
>proceq(klass){rcode} #a proc{} that responds to === by invoking the
>proc's call
>OBS as un-anchor #opposite of ^ and $ when placed at edges of a
>reg array (kinda cheesy)
>name=r #named subexpressions
>
>recursive matches via regvariables&regconstants ***
>
>*** = not implemented yet.
>
>
>Reg is kind of hard to wrap your mind around, so here are some
>examples:
>
>Matches array containing exactly 2 elements; 1st is another array, 2nd
>is integer:
>+[Array,Integer]
>
>Like above, but 1st is array of arrays of symbol
>+[+[+[Symbol.reg+0]+0],Integer]
>
>Matches array of at least 3 consecutive symbols and nothing else:
>+[Symbol.reg+3]
>
>Matches array with at least 3 symbols in it somewhere:
>+[OBS, Symbol.reg+3, OBS]
>
>Matches array of at most 6 strings starting with 'g'
>+[/^g/-6] #no .reg necessary for regexp
>
>Matches array of between 5 and 9 hashes containing a key :k pointing to
>something non-nil:
>+[ +{:k=>~nil.reg}*(5..9) ]
>
>Matches an object with Integer instance variable @k and property (ie
>method) foobar that returns a string with 'baz' somewhere in it:
>-{:@k=>Integer, :foobar=>/baz/}
>
>Matches array of 6 hashes with 6 as a value of at least one key,
>followed by 18 objects with an attribute @s which is a String:
>+[ +{OB=>6}*6, -{:@s=>String}*18 ]
>
>
>Status:
>Some highly nested vector reg constructions still don't work quite
>right. (For examples, search on eat_unworking in regtest.rb.) A number
>of features are unimplemented at this point, most notably
>backreferences and substitutions.
>




Peter Suk 04-24-2005 04:39 AM

Re: Announcing Reg 0.4.0
 

On Apr 23, 2005, at 9:34 PM, vikkous wrote:

> I would like to announce the first version, 0.4.0, of Reg, the Ruby
> Extended Grammar.


This is like too good/weird to be true.

--Peter

--
There's neither heaven nor hell, save what we grant ourselves.
There's neither fairness nor justice, save what we grant each other.




Mathieu Bouchard 04-24-2005 05:34 AM

Re: Announcing Reg 0.4.0
 
---1077017559-612394050-1114320835=:18242
Content-Type: MULTIPART/MIXED; BOUNDARY="-1077017559-612394050-1114320835=:18242"

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

---1077017559-612394050-1114320835=:18242
Content-Type: TEXT/PLAIN; charset=iso-8859-1; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE


On Sun, 24 Apr 2005, vikkous wrote:

> I would like to announce the first version, 0.4.0, of Reg, the Ruby
> Extended Grammar. Reg is a library for pattern matching in ruby data
> structures. Reg provides Regexp-like match and match-and-replace for
> all data structures (particularly Arrays, Objects, and Hashes), not
> just Strings.


Can it also match on IO ? I'm particularly thinking of a stream=20
implementation that supports illimited pushback of characters...

Because if it does, then you've got a lexer system that is also good as=20
something else than just a damn lexer.

And by making regexps unified with the rest of the language, it brings=20
Ruby closer to the Icon language, isn't it?

Anyhow: Congratulations!

(this is really something I wish had existed in 2001 or so).

,-o---------o---------o---------o-. ,----. |
| The Diagram is the Program (TM) | | ,-o----------------------------o-.
`-o-----------------------------o-' | | Mathieu Bouchard (Montr=E9al QC)=
|
| |---' | http://artengine.ca/matju |
| | `-o------------------------------'
---1077017559-612394050-1114320835=:18242--
---1077017559-612394050-1114320835=:18242--



vikkous 04-24-2005 07:47 AM

Re: Announcing Reg 0.4.0
 
That's a long story, and well worth telling.

A long time ago, I wanted a better regexp than regexp. My search ended
when I found an extremely obscure language called gema (the
general-purpose matcher). I'm guessing that I'm the only person to ever
take gema seriously. For a time, I became the worlds foremost expert on
gema. Gema is designed around the idea that all computation can be
modeled as pattern and replacement. Everything in gema is pattern and
replacement... essentially everything is done with regexps. I was
fascinated with the idea. This seemed to me to be a much better model
for most programming problems, which typically involve reading input,
tranforming it in some way, and writing it out again. Conventional
languages (starting with fortran, and including ruby) are based around
the idea of a program being a long string of formulas. This is great
for math-heavy stuff, but most programming is really about data
manipulation, not math.

But there was trouble in paradise. Gema was wonderful, but weird. The
syntax was cranky. The author had issued one version long ago then
disappeared. Gema code was hard to read, in part because
everythingwasalljammedtogether.
Ifyouinsertspacestomakeitmorereadable,itchangesthe semanticsofyourprogram.
There were strange problems that I never tracked down or fully
characterized. The only data-type was the string. You had to be an
expert at avoiding the invisible pitfalls of the language to get
anywhere. But I did get surprisingly far. I managed to coax gema into
becoming a true parser, and parsing a toy language.
I wanted to write a compiler in gema. Yes, the whole compiler. And
parsing the toy language was already straining its capabilites. It
wasn't the data model; I actually figured out how to model all other
data types using strings. A match-and-replace language is actually much
better suited to most compiler tasks than an algol-like formula
language.

Eventually, I abandoned gema, determined to recreate it's glory in a
cleaner form. It was at about this time that I discovered ruby. The
successor to gema was ruma, the ruby matcher. Ruma would be basically
just like gema, but without the problems. Whitespace allowed between
tokens. Proper quotation mechanisms, including nested quotes. And the
language used in the actions (replacements) would be full ruby, instead
of gema's inadequate and crude action language.

Ruma got maybe halfway done... quite a ways, really. As part of ruma, I
needed a ruby lexer to make sense of the actions. This turned out to be
quite a lot harder than I had anticipated; I'm still working on that
lexer.

After grinding away at the lexer for a while, dreaming of ruma in the
meantime, I had a brainstorm. Ruma, like gema, was to be a string-based
language. It only operated on strings. In gema, that was just fine
because everything was strings and you just had to live with that. But
ruby has all these other types, a real type system. Wouldn't it be nice
to have those sophisticated search capabilites for other types too?
Well, since I proved to myself that all data types can be converted to
strings, why not convert the ruby data into strings and then match that
in ruma. Of course, it would be so much nicer to just do the matching
on the data in it's original form....

The breakthrough came when I realized how malleable ruby really is. I
had become accustomed to c, which I still love, but in so many ways
it's so much more limited. I didn't really have to write my own parser
and lexer; ruby could do it all for me. I just had to override a bunch
of operators.

After that, it was simple. All I do is override the right operators,
and ruby does the parsing and hands me the match expressions in
already-parsed form. Reg is amazingly small in the end. Most of the
effort and code went into the array matcher, but at least as much
functionality is to be had from the hash and object matchers, which
were trivial.


vikkous 04-24-2005 07:59 AM

Re: Announcing Reg 0.4.0
 
> Can it also match on IO ? I'm particularly thinking of a stream
> implementation that supports illimited pushback of characters...


I would very much like to do this, but right now, no. I'm not sure
exactly what would be involved in having the array matcher match files
as well; it seems like you might have to rip out the guts of the
backtracking engine to support it... but maybe not. Anyway, stay tuned
for a future release.

Just having the ability to compare regexps directly against files would
be really helpful in the construction of lexers of all sorts. Java has
this; why doesn't ruby?

> Because if it does, then you've got a lexer system that is also good

as
> something else than just a damn lexer.


Lexers, parsers, and pattern matching languages get too short a shrift
in my opinion. There's really a lot more they could be used for, if
only people would see... of course, it doesn't help that almost all
existing tools of this kind are string-oriented, and hard to use for
other data.

> And by making regexps unified with the rest of the language, it

brings
> Ruby closer to the Icon language, isn't it?


I wouldn't know... please let know about regexp integration in icon;
maybe there's some features I can steal.


dm1 04-24-2005 08:46 AM

Re: Announcing Reg 0.4.0
 
vikkous wrote:

>
> Lexers, parsers, and pattern matching languages get too short a shrift
> in my opinion. There's really a lot more they could be used for, if
> only people would see... of course, it doesn't help that almost all
> existing tools of this kind are string-oriented, and hard to use for
> other data.
>


A small piece of example code could help to open eyes of people that dont
see what could be done with Reg (like me).


Denis



Lyndon Samson 04-24-2005 12:04 PM

Re: Announcing Reg 0.4.0
 
On 4/24/05, vikkous <google@inforadical.net> wrote:
>


It seems similar in spirit to JXPath for java which lets you use XPath
expressions to access objects, Hashs, Arrays, Maps etc which otherwise
is quite longwinded in java ( no snickering please ).

http://jakarta.apache.org/commons/jxpath/


--
Into RFID? www.rfidnewsupdate.com Simple, fast, news.




vikkous 04-25-2005 10:26 PM

Re: Announcing Reg 0.4.0
 
I included some small examples at the end of my initial post to try to
whet your appetite. Perhaps you can see applications of this kind of
thing to what you use ruby for? Searching for complicated patterns
within an arbitrary object graph is what Reg is about. If you have
complicated data, Reg may be a good choice for searching in it.
(Eventually it'll have search-and-replace, but that's not implemented
yet.)

Traditionally, parsing and pattern matching languages stop after the
parser stage of the compiler pipeline, but it seems to me that many
later compiler tasks are particularly well suited for pattern-matchers.
(They're never used because by this point, compiler data is in the form
of a parse tree,
and text-based pattern tools (most of them) can't deal with that.)
Let's take the example of a simple optimization, like strength
reduction. This is where the compiler changes multiplication by a
constant power of two into a left shift.

The problem, in other words, is to search for nodes of the syntax tree
that look like this:
[<some expr>, :*, 4]

and turn them into into this:
[<some expr>, :<<, 2]

In Reg, that would be:
+[expr, :*, -{:power_of_2?=>:true}].sub{BR[0], :<<, BR[2].log2}

My post "Lalr(n) parsing with reg" outlines how to twist Reg to
actually be a parser.



All times are GMT. The time now is 10:54 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.