Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Parsing, BNF, TreeTop, GhostWheel, ...

Reply
Thread Tools

Parsing, BNF, TreeTop, GhostWheel, ...

 
 
Philipp Kempgen
Guest
Posts: n/a
 
      08-12-2010
Hi,

I'm looking for a parser which can be fed with some (A)BNF-style rules.

Gave TreeTop a try
---cut---------------------------------------------------------
grammar MathGrammar
rule additive
multitive '+' additive / multitive
end
rule multitive
primary '*' multitive / primary
end
rule primary
'(' additive ')' / number
end
rule number
[1-9] [0-9]*
end
end
---cut---------------------------------------------------------

as well as GhostWheel
---cut---------------------------------------------------------
MathParser = GhostWheel.build_parser {

rule( :additive , alt( :multitive,
seq( :multitive, '+', :additive )
))
rule( :multitive , alt( seq( rimary, '*', :multitive ),
rimary
))
rule( rimary , alt( seq( '(', :additive, ')' ),
:number
))
rule( :number , seq( /[1-9]/ , zplus( /0-9/ ) ))

parser( :main, seq(:additive, eof()) )
}
---cut---------------------------------------------------------

'1+1' parses fine.
However when I change the definition of "additive" from
multitive '+' additive / multitive
to
multitive / multitive '+' additive
it fails to parse.

Is this a problem with packrat/PEG parsers in general?
If so, which parser is more appropriate?
It should hand back a parse tree.

Memory consumption is not an issue.

Regards,
Philipp

 
Reply With Quote
 
 
 
 
Philipp Kempgen
Guest
Posts: n/a
 
      08-12-2010
Philipp Kempgen wrote:
> I'm looking for a parser which can be fed with some (A)BNF-style rules.

...
> If so, which parser is more appropriate?
> It should hand back a parse tree.


Alternatively I'd appreciate if Regexps could return *all*
occurrences of named capture groups inside repetitions etc.
instead of just the last match for each name. Feasible?

Regards,
Philipp

 
Reply With Quote
 
 
 
 
Iain Barnett
Guest
Posts: n/a
 
      08-12-2010

On 12 Aug 2010, at 19:52, Philipp Kempgen wrote:

> Philipp Kempgen wrote:
>> I'm looking for a parser which can be fed with some (A)BNF-style =

rules.
> ...
>> If so, which parser is more appropriate?
>> It should hand back a parse tree.

>=20
> Alternatively I'd appreciate if Regexps could return *all*
> occurrences of named capture groups inside repetitions etc.
> instead of just the last match for each name. Feasible?
>=20
> Regards,
> Philipp
>=20


I was trying to do the same thing and asked about repeated named =
captures.
http://osdir.com/ml/ruby-talk/2010-07/msg00361.html

It seems that using String#scan is the closest anything Ruby has, as the =
Oniguruma regex engine doesn't support it. I think it's a real shame as =
named capture groups are really useful.

Regards,
Iain



 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      08-13-2010
On 13.08.2010 00:50, Iain Barnett wrote:
>
> On 12 Aug 2010, at 19:52, Philipp Kempgen wrote:
>
>> Philipp Kempgen wrote:
>>> I'm looking for a parser which can be fed with some (A)BNF-style rules.

>> ...
>>> If so, which parser is more appropriate?
>>> It should hand back a parse tree.

>>
>> Alternatively I'd appreciate if Regexps could return *all*
>> occurrences of named capture groups inside repetitions etc.
>> instead of just the last match for each name. Feasible?


As I understand you want /f(o)+/ =~ "foo" to return ["o", "o"] as match
for the group (used normal capturing groups for simplicity).

> I was trying to do the same thing and asked about repeated named captures.
> http://osdir.com/ml/ruby-talk/2010-07/msg00361.html
>
> It seems that using String#scan is the closest anything Ruby has, as the Oniguruma regex engine doesn't support it. I think it's a real shame as named capture groups are really useful.


Regular expression engines generally do only return one match per group
- regardless of naming or not naming groups. I'm not up to date with
current Perl's regular expressions which are the only ones I can imagine
to be crazy enough to provide such a feature. Otherwise String#scan
is indeed the proper tool to get multiple matches. The example above
could be done like this

if /f(o+)/ =~ s
$1.scan /o/ do |match|
p match
end
end

or this

if /f(o+)/ =~ s
matches = $1.scan /o/
p matches
end

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
Reply With Quote
 
Iain Barnett
Guest
Posts: n/a
 
      08-13-2010

On 13 Aug 2010, at 11:15, Robert Klemme wrote:
>>=20

>=20
> Regular expression engines generally do only return one match per =

group - regardless of naming or not naming groups. I'm not up to date =
with current Perl's regular expressions which are the only ones I can =
imagine to be crazy enough to provide such a feature. =20

Net definitely offers it, which is ironic as none of the .Net devs I =
used to work with ever used regex.=20

Scan is a really poor alternative (for this kind of task), I think. =
Instead of just writing one regex which expresses what I want to do much =
better, I end up having to write a lot of extra code to get the same =
effect. Scan's cool for more simple things.

Regards
Iain


 
Reply With Quote
 
Philipp Kempgen
Guest
Posts: n/a
 
      08-13-2010
Robert Klemme wrote:
> On 13.08.2010 00:50, Iain Barnett wrote:
>> On 12 Aug 2010, at 19:52, Philipp Kempgen wrote:
>>
>>> Philipp Kempgen wrote:
>>>> I'm looking for a parser which can be fed with some (A)BNF-style rules.
>>> ...
>>>> It should hand back a parse tree.
>>>
>>> Alternatively I'd appreciate if Regexps could return *all*
>>> occurrences of named capture groups inside repetitions etc.
>>> instead of just the last match for each name. Feasible?

>
> As I understand you want /f(o)+/ =~ "foo" to return ["o", "o"] as match
> for the group (used normal capturing groups for simplicity).


FRUIT = '(?<FRUIT> Apple|Banana|Pear|Orange)'
FRUIT_COLLECTION = '(?<FRUIT_COLLECTION> ' << FRUIT << '*)'

re = Regexp.new( '^' << FRUIT_COLLECTION << '$', Regexp::EXTENDED )
re.match( 'PearBananaApple' )[:FRUIT]
=> "Apple"

I would want e.g. matchdata[:FRUIT] to be an Array
[ 'Pear', 'Banana', 'Apple' ]
and not just 'Apple' (the last FRUIT).

Actually something similar to a concrete syntax tree / parse tree
would be even better:

ROOT "PearBananaApple"
FRUIT_COLLECTION "PearBananaApple"
FRUIT "Pear"
FRUIT "Banana"
FRUIT "Apple"

>> the Oniguruma regex engine doesn't support it. I think it's a real shame as named capture groups are really useful.


Exactly.
Anyway thanks for your input.

Regards,
Philipp

 
Reply With Quote
 
Clifford Heath
Guest
Posts: n/a
 
      08-17-2010
(Sent previously, but it didn't appear. Sorry if you get this twice).

Philipp Kempgen wrote:
> Is this a problem with packrat/PEG parsers in general?


Yes. PEG parsers are greedy (iterate as far as possible and never
backtrack on that), and also take the first alternative that succeeds,
without trying subsequent alternatives. It's actually quite useful
and natural, once you get the hang of it.

Pure PEGs have no lookahead, as Treetop does. It's often necessary
though, in the light of the above.

Also, your approach will yield the wrong results when you add '-' into
the mix, because of associativity. "1-2-3" will calculate as 1-(2-3).
A better way is to use iteration, see

<http://github.com/nathansobo/treetop/blob/master/examples/lambda_calculus/arithmetic.treetop>

Clifford Heath.
 
Reply With Quote
 
Clifford Heath
Guest
Posts: n/a
 
      08-18-2010
Christian Surlykke wrote:
> You may want to look at: http://kanocc.rubyforge.org/


Actually, I think Citrus does a better job than Treetop (of which I'm primary maintainer and a serious user). It provides a BNF built on ruby functions and operators, and also a Treetop/PEG-like parser for grammar files as an alternative. A brief look at Kanocc indicates Citrus may be preferable over it also.

Citrus doesn't memoize by default however, you have to enable that. It also doesn't have support for semantic predicates, an important feature (for me) which I added to Treetop. Otherwise it is preferable in most ways, and I think Michael Jackson has done a sterling job. Kudos!

<http://github.com/mjijackson/citrus/>

Clifford Heath.
 
Reply With Quote
 
Philipp Kempgen
Guest
Posts: n/a
 
      08-19-2010
Clifford Heath wrote:
> Yes. PEG parsers are greedy (iterate as far as possible and never
> backtrack on that), and also take the first alternative that succeeds,
> without trying subsequent alternatives. It's actually quite useful
> and natural, once you get the hang of it.
>
> Pure PEGs have no lookahead, as Treetop does. It's often necessary
> though, in the light of the above.


What I wanted to do is to construct/generate a parser from a bunch
of (A)BNF rules (dozends) from an RFC so backtracking is a
requirement while lookahead is not. The grammar specification for
the parser (/ parser generator) does not need to be exactly like BNF
but should not require me to think about where lookahead is needed
as this manual conversion can be error-prone.
That seems to rule out PEG parsers.

Clifford Heath wrote:
> Christian Surlykke wrote:
>> You may want to look at: http://kanocc.rubyforge.org/

>
> Actually, I think Citrus does a better job than Treetop (of which I'm primary maintainer and a serious user). It provides a BNF built on ruby functions and operators, and also a Treetop/PEG-like parser for grammar files as an alternative. A brief look at Kanocc indicates Citrus may be preferable over it also.
>
> Citrus doesn't memoize by default however, you have to enable that. It also doesn't have support for semantic predicates, an important feature (for me) which I added to Treetop.


> <http://github.com/mjijackson/citrus/>


Thanks a lot for the pointers Clifford and Christian!
Will have a look as soon as time permits.


Philipp

 
Reply With Quote
 
Clifford Heath
Guest
Posts: n/a
 
      08-20-2010
Philipp Kempgen wrote:
> Clifford Heath wrote:
> What I wanted to do is to construct/generate a parser from a bunch
> of (A)BNF rules (dozends) from an RFC so backtracking is a
> requirement while lookahead is not. The grammar specification for
> the parser (/ parser generator) does not need to be exactly like BNF
> but should not require me to think about where lookahead is needed
> as this manual conversion can be error-prone.


Most of the BNFs in RFCs cannot be directly automated by any parser
generator. In addition, quite a few of them are slightly incorrect
and/or incomplete, but haven't been fixed because they're always
implemented by humans, who paper over the issues, and introduce new
errors instead

The upshot is that you're not going to find a direct implementation
that works without thinking.

> That seems to rule out PEG parsers.


On the contrary, I believe that a PEG parser might represent the most
direct mapping. PEGs do backtrack, just not on a successful rule or
iteration. You simply have to introduce enough lookahead to prevent
false success or excessive iteration... which is actually pretty easy
once you get the hang of it... the point here is that the *rest* of
the grammar is much more likely to look like the EBNF you started
with. Oh, yeah, you have to refactor any left-recursion too, which can
be difficult.

Clifford Heath.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off




Advertisments