Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > How would you design regexps in the integer domain?

Reply
Thread Tools

How would you design regexps in the integer domain?

 
 
Andreas Launila
Guest
Posts: n/a
 
      05-05-2008
I'm trying to come up with a clean way to specify regexps in the integer
domain. I.e. instead of describing a pattern of characters (as in normal
regexps) one describes patterns of integers ("17 followed by 3 or 15"
rather than "'a' followed by 'b' or 'c'").

My main objective is to make the syntax intuitive to Ruby users. I have
been toying around with a few different approaches, but I'm not sure if
any meet the goal. How would you design the syntax for regexps in the
integer domain?

The syntax does not need to support especially many operators:
* Kleene star ('*' in character regexps)
* "At least once" ('+' in character regexps)
* Alternation ('|' in character regexps, i.e. "a|b" being "'a' or 'b'")

Below are some of the approaches one could take.


== String representation

Many Ruby users have used character regexps defined as strings, so it
would seem like a good idea to make integer regexps as similar as
possible. A delimiter would be needed though, since "17" could otherwise
mean "1 followed by 7" or just "17". One delimiter could for instance be
a blank space. The following is how the pattern "Any number of (17 or, 1
followed by 5) followed by 4711" could look.

IntRegexp.new('(17|1 5)*4711')


== Method combination

In this approach the regexp is gradually built up by invoking methods.
One major question is what the methods should be named.

The following is how the above example could look.

first_part = IntRegexp.new(17).or(IntRegexp.new(1).followed_by 5)
first_part.any_number_of_times.followed_by 4711

Or with some different method names

first_part = IntRegexp.new(17) | (IntRegexp.new(1) + 5)
first_part.any_times + 4711

I'm unsure what sort of method names would strike a good balance between
verbosity and readability.


== Blocks

TextualRegexp[1] provides a block-based interface for specifying normal
regexps. Perhaps something similar could be done for integer regexps?
The following is the syntax of TextualRegexp but with integers.

regexp do
at_least_zero do
any f do
integer 17
group{ integer 1; integer 5 }
end
end
integer 4711
end

There's a fair amount of redundancy since there will never be any thing
other than "integer" specified.


[1] http://rubyforge.org/projects/texrex/

--
Andreas Launila


 
Reply With Quote
 
 
 
 
Robert Klemme
Guest
Posts: n/a
 
      05-05-2008
2008/5/5 Andreas Launila <ruby->:
> I'm trying to come up with a clean way to specify regexps in the integer
> domain. I.e. instead of describing a pattern of characters (as in normal
> regexps) one describes patterns of integers ("17 followed by 3 or 15"
> rather than "'a' followed by 'b' or 'c'").


Why do you need this, i.e. what advantages do you expect for a
particular "integer regexp" over classic regular expressions?

> My main objective is to make the syntax intuitive to Ruby users. I have
> been toying around with a few different approaches, but I'm not sure if
> any meet the goal. How would you design the syntax for regexps in the
> integer domain?
>
> The syntax does not need to support especially many operators:
> * Kleene star ('*' in character regexps)
> * "At least once" ('+' in character regexps)
> * Alternation ('|' in character regexps, i.e. "a|b" being "'a' or 'b'")
>
> Below are some of the approaches one could take.
>
>
> == String representation
>
> Many Ruby users have used character regexps defined as strings, so it
> would seem like a good idea to make integer regexps as similar as
> possible. A delimiter would be needed though, since "17" could otherwise
> mean "1 followed by 7" or just "17". One delimiter could for instance be
> a blank space. The following is how the pattern "Any number of (17 or, 1
> followed by 5) followed by 4711" could look.
>
> IntRegexp.new('(17|1 5)*4711')


You can as well do /(17|1 5)*4711/, can't you?

> == Method combination
>
> In this approach the regexp is gradually built up by invoking methods.
> One major question is what the methods should be named.
>
> The following is how the above example could look.
>
> first_part = IntRegexp.new(17).or(IntRegexp.new(1).followed_by 5)
> first_part.any_number_of_times.followed_by 4711


This looks ugly.

> Or with some different method names
>
> first_part = IntRegexp.new(17) | (IntRegexp.new(1) + 5)
> first_part.any_times + 4711
>
> I'm unsure what sort of method names would strike a good balance between
> verbosity and readability.
>
>
> == Blocks
>
> TextualRegexp[1] provides a block-based interface for specifying normal
> regexps. Perhaps something similar could be done for integer regexps?


The change would be easy, i.e. you just need to allow anything where
so far only raw strings were allowed and internally convert it via
#to_s.

> The following is the syntax of TextualRegexp but with integers.
>
> regexp do
> at_least_zero do
> any f do
> integer 17
> group{ integer 1; integer 5 }
> end
> end
> integer 4711
> end
>
> There's a fair amount of redundancy since there will never be any thing
> other than "integer" specified.
>
>
> [1] http://rubyforge.org/projects/texrex/


Apparently the project has gone and I could not prevent it since Ari
did not give me access to it. However, I posted an early version
here:
http://blade.nagaokaut.ac.jp/cgi-bin...by-talk/263785

Kind regards

robert

--
use.inject do |as, often| as.you_can - without end

 
Reply With Quote
 
 
 
 
Andreas Launila
Guest
Posts: n/a
 
      05-05-2008
Robert Klemme wrote:
> 2008/5/5 Andreas Launila <ruby->:
>> I'm trying to come up with a clean way to specify regexps in the integer
>> domain. I.e. instead of describing a pattern of characters (as in normal
>> regexps) one describes patterns of integers ("17 followed by 3 or 15"
>> rather than "'a' followed by 'b' or 'c'").

>
> Why do you need this, i.e. what advantages do you expect for a
> particular "integer regexp" over classic regular expressions?
>


The two types of regexps are not really comparable since they work in
different domains. I.e. classic regular expressions describe patterns in
arrays of characters rather than in arrays of integers. One could for
instance use an integer regexp to decide whether an array begins with 17
and ends with 8.

Specifically this is for specifying deterministic finite automatons,
with an integer alphabet, for an upcoming constraint in Gecode/R[1].

>>
>> IntRegexp.new('(17|1 5)*4711')

>
> You can as well do /(17|1 5)*4711/, can't you?
>


True, assuming that one then convert it back to its source to reparse.
It would probably make it even more familiar.


[1] http://gecoder.rubyforge.org/

--
Andreas Launila


 
Reply With Quote
 
Eivind Eklund
Guest
Posts: n/a
 
      05-05-2008
On Mon, May 5, 2008 at 11:51 AM, Andreas Launila <ruby-> wrote:
> Robert Klemme wrote:
> > 2008/5/5 Andreas Launila <ruby->:
> >> I'm trying to come up with a clean way to specify regexps in the integer
> >> domain. I.e. instead of describing a pattern of characters (as in normal
> >> regexps) one describes patterns of integers ("17 followed by 3 or 15"
> >> rather than "'a' followed by 'b' or 'c'").

> >
> > Why do you need this, i.e. what advantages do you expect for a
> > particular "integer regexp" over classic regular expressions?
> >

>
> The two types of regexps are not really comparable since they work in
> different domains. I.e. classic regular expressions describe patterns in
> arrays of characters rather than in arrays of integers. One could for
> instance use an integer regexp to decide whether an array begins with 17
> and ends with 8.
>
> Specifically this is for specifying deterministic finite automatons,
> with an integer alphabet, for an upcoming constraint in Gecode/R[1].


I'd implement this using a generic repeat operator instead of
hardcoding * and +, and using a generic comparator instead of
restricting to Integers.

The declaration for this would be something like

# Use repeat(0, nil, ...) for *
# Use repeat(1, nil, ...) for +
# Use repeat(0, 1, ...) for ?
def repeat(minimum, maximum, repeated_expression)
def or(*expressions)

So you'd convert your example "(17|1 5)*" to repeat(0, nil, or(17, [1, 5]))

Actually, as I thought of after I had written the above, I have
written code that does part of this already, and is generic - it's
available in types.rb (http://people.freebsd.org/~eivind/ruby/types/).
If you want it, I think I can extend that to support repeats tonight.
Feel free to use code from it under any license that absolves me of
legal blame (ie, the only reason I don't put it in the public domain
is because that expose me to legal liability.)

Eivind.

 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      05-05-2008
2008/5/5 Andreas Launila <ruby->:
> Robert Klemme wrote:
> > 2008/5/5 Andreas Launila <ruby->:
> >> I'm trying to come up with a clean way to specify regexps in the integer
> >> domain. I.e. instead of describing a pattern of characters (as in normal
> >> regexps) one describes patterns of integers ("17 followed by 3 or 15"
> >> rather than "'a' followed by 'b' or 'c'").

> >
> > Why do you need this, i.e. what advantages do you expect for a
> > particular "integer regexp" over classic regular expressions?

>
> The two types of regexps are not really comparable since they work in
> different domains. I.e. classic regular expressions describe patterns in
> arrays of characters rather than in arrays of integers. One could for
> instance use an integer regexp to decide whether an array begins with 17
> and ends with 8.


Ah! That bit of information was missing from the original posting.

> Specifically this is for specifying deterministic finite automatons,
> with an integer alphabet, for an upcoming constraint in Gecode/R[1].


I see. Then of course it's a different story. Unless you translate
the array of integers into a string and apply a textual regexp. But
you can still do that internally, i.e. shield that away as an
implementation detail.

I'd probably choose the approach we had taken with texrex because it
is easy to implement and has good usability (albeit it's a bit
verbose). I would base the decision ultimately on the users of such a
package, i.e. if they are familiar with regular expressions then the
string based approach might be the better choice (short and concise)
while for others the wordy approach might be better.

Kind regards

robert

--
use.inject do |as, often| as.you_can - without end

 
Reply With Quote
 
Andreas Launila
Guest
Posts: n/a
 
      05-05-2008
Eivind Eklund wrote:
>
> I'd implement this using a generic repeat operator instead of
> hardcoding * and +, and using a generic comparator instead of
> restricting to Integers.
>
> The declaration for this would be something like
>
> # Use repeat(0, nil, ...) for *
> # Use repeat(1, nil, ...) for +
> # Use repeat(0, 1, ...) for ?
> def repeat(minimum, maximum, repeated_expression)
> def or(*expressions)
>
> So you'd convert your example "(17|1 5)*" to repeat(0, nil, or(17, [1, 5]))
>


I like that idea, especially using the arrays to group integers. It
seems like a rather economic way to do it. I think "any" would fit
better than "or" though (since it can take more than two arguments). The
example in its entirety would then be

[repeat(0, nil, any(17, [1, 5])), 4711]

I would also consider swapping the position of repeated_expression to
the first argument of repeat, since it seems like a more natural order
to me ("repeat expression between 0 and 4 times"). I'm a bit split on
using nil for infinity, but that is minor.

I will need to explore the idea, but it feels like a good fit to me.
Thank you.

> Actually, as I thought of after I had written the above, I have
> written code that does part of this already, and is generic - it's
> available in types.rb (http://people.freebsd.org/~eivind/ruby/types/).
> If you want it, I think I can extend that to support repeats tonight.
> Feel free to use code from it under any license that absolves me of
> legal blame (ie, the only reason I don't put it in the public domain
> is because that expose me to legal liability.)
>


Thanks for the offer, but there's no need to add additional support. The
important part is the idea, which you have provided.

--
Andreas Launila

 
Reply With Quote
 
Phrogz
Guest
Posts: n/a
 
      05-05-2008
On May 5, 2:24*am, Andreas Launila <ruby-t...@lokorin.org> wrote:
> I'm trying to come up with a clean way to specify regexps in the integer
> domain. I.e. instead of describing a pattern of characters (as in normal
> regexps) one describes patterns of integers ("17 followed by 3 or 15"
> rather than "'a' followed by 'b' or 'c'").


You could specify it as a PEG[1], possibly using Treetop[2] in some
way.

The benefit of the PEG (re-usable sub-expressions) may not outweigh
the multi-line nature. However, you could do something like LPEG[3]
does. Instead of using standard PEG syntax come up with a custom
syntax via operators that allows you to describe an entire grammar in
a single Ruby expression. I'm personally not a fan of the syntax LPEG
uses; you could come up with a better one, perhaps using terse method
names instead of operators in some cases. (Perhaps
"pattern.atleast(5)" instead of LPEG's "pattern^5".)

[1] http://en.wikipedia.org/wiki/Parsing_expression_grammar
[2] http://treetop.rubyforge.org/
[3] http://www.inf.puc-rio.br/~roberto/lpeg.html

 
Reply With Quote
 
Phrogz
Guest
Posts: n/a
 
      05-05-2008
On May 5, 9:10*am, Phrogz <phr...@mac.com> wrote:
> The benefit of the PEG (re-usable sub-expressions) may not outweigh
> the multi-line nature.


To expand on this a little bit more, and use an LPEG-like custom
grammar entirely in Ruby, using PEG instead of regexp would allow you
to do something like:

header = IPEG[17,34,15,12,89,127]
break = IPEG[13,10]
close = IPEG[0]

body = (IPEG[12,43,13,break,57,75] | IPEG[18,13,31])
message_style_1 = header + body + close
message_style_2 = header + IPEG[108,103] + break + body + close
message = message_style_1 | message_style_2

result = message.match( my_array_of_numbers )

As you can see, this allows you to break up complex expressions into
named parts. This makes it both easier to understand, and DRYs up the
expression.

In the above made up syntax:
* "IPEG" stands for "Integer Parsing Expression Grammar"
* The IPEG.[] notation might represent a literal sequence of
integers
* The + operator would combine sequences
* The | operator provides alternation


 
Reply With Quote
 
ara.t.howard
Guest
Posts: n/a
 
      05-05-2008

On May 5, 2008, at 2:24 AM, Andreas Launila wrote:
> I'm trying to come up with a clean way to specify regexps in the
> integer
> domain. I.e. instead of describing a pattern of characters (as in
> normal
> regexps) one describes patterns of integers ("17 followed by 3 or 15"
> rather than "'a' followed by 'b' or 'c'").
>
> My main objective is to make the syntax intuitive to Ruby users. I
> have
> been toying around with a few different approaches, but I'm not sure
> if
> any meet the goal. How would you design the syntax for regexps in the
> integer domain?
>
> The syntax does not need to support especially many operators:
> * Kleene star ('*' in character regexps)
> * "At least once" ('+' in character regexps)
> * Alternation ('|' in character regexps, i.e. "a|b" being "'a' or
> 'b'")
>
> Below are some of the approaches one could take.



i personally would not bother making a new syntax, rather i'd simply
use the exiting one and add a new atom builder:



cfp:~ > cat a.rb
N = 0.chr

def N n
"#{ n }#{ N }"
end

module Enumerable
def nmatch pattern
match = pattern.match( select{|i| Fixnum === i}.join(N) )
if match
returned = []
captured = match.to_a.map{|capture| capture.split N}
returned << captured.first.map{|s| Float(s).to_i}
captured[1..-1].each do |list|
returned << Float(list.first).to_i
end
returned
else
nil
end
end
end

pattern = %r/#{ N 42 }#{ N 7 }(#{ N 17 }){2}(#{ N 1 }){1,}/

p [42, 7, 17, 17, 1, 1, 1].nmatch( pattern ).to_a



cfp:~ > ruby a.rb
[[42, 7, 17, 17, 1, 1], 17, 1]




a @ http://codeforpeople.com/
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama




 
Reply With Quote
 
Une Bévue
Guest
Posts: n/a
 
      05-05-2008
ara.t.howard <> wrote:

> we can deny everything, except that we have the possibility of being
> better. simply reflect on that.


which suppose their is a norm somewhere, ie. something able to measure
what is worst what is best...

--
Une Bévue
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
an oddball scary kind of thing you would think would never happen richard Computer Support 4 01-31-2010 06:34 PM
How would you design scalable solution? Bryan Python 2 10-28-2009 06:20 AM
Web Design: Would you design a PDF by writing Postscript in Notepad? fgdg HTML 236 02-21-2007 09:35 PM
How Would You Design This Application? John Java 5 02-28-2006 09:49 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57