Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > [SUMMARY] Parsing JSON (#155)

Reply
Thread Tools

[SUMMARY] Parsing JSON (#155)

 
 
Benjohn Barnes
Guest
Posts: n/a
 
      02-19-2008
Clifford Heath wrote:
> James Gray wrote:
>> On Feb 8, 2008, at 12:27 PM, Eric Mahurin wrote:
>>>> Treetop was definitely the most popular
>>> But not nearly the fastest...

>> That's true, but I think the need for raw speed in parsers is seldom
>> the top priority.

>
> I was frankly amazed how much of the discussion was about speed.
> Besides, there's an awful lot of useful languages that you can't
> do with LL(1). Including Ruby and C...
>
> That said, Treetop is very slow, and we need to improve that.
> Fortunately, it's not very hard to.


I was reading about PEG parsers on the wikipedia page [1], and wondered
along to a site about packrat parsers [2].

One article there [3] seems to imply that while a packrat parser's
memoization gives linear theoretical time, for almost all real grammars
(and inputs, I presume) it's actually not necessary.

They suggest that for the cases where there is a an exponential time
problem, it's sensible to rely on user supplied memoization directives
in the grammar, or to add memoization selectively and automatically,
based on profiling information.

I'm not sure if that's any help, or if you've already come across these
articles. Hopefully they're of some interest though.

Cheers,
Benjohn


[1]http://en.wikipedia.org/wiki/Parsing_expression_grammar
[2]http://pdos.csail.mit.edu/~baford/packrat/
[3]http://www.mercury.csse.unimelb.edu.au/information/papers/packrat.pdf

 
Reply With Quote
 
 
 
 
Clifford Heath
Guest
Posts: n/a
 
      02-20-2008
Benjohn Barnes wrote:
> One article there [3] seems to imply that while a packrat parser's
> memoization gives linear theoretical time, for almost all real grammars
> (and inputs, I presume) it's actually not necessary.


There's a bit of a chicken/egg problem here.

Existing computer languages have been created to be easy to parse,
which means they require minimum lookahead- one or two tokens.

In part, that's because such languages are easier for humans to
parse as well, and in part because parser generators couldn't
produce efficient parsers for them... until now. Now that packrat
is on the scene, we're free to create more natural looking grammars.

> They suggest that for the cases where there is a an exponential time
> problem, it's sensible to rely on user supplied memoization directives
> in the grammar,


That requires a lot more user understanding (of where to memoize),
or a whole lot more analysis. ANTLR takes the 2nd approach, but it's
a difficult task that still causes many surprises. Treetop is slow,
but it's easily accessible to Joe Average without special skills and
without major surprises. I think it fills an important niche.

> or to add memoization selectively and automatically,
> based on profiling information.


That would be cool. Write an exponential-time parser, run it over
your test cases in learning mode, and then regenerate it to get a
fast parser.

Clifford Heath.
 
Reply With Quote
 
 
 
 
Benjohn Barnes
Guest
Posts: n/a
 
      03-02-2008
Clifford Heath wrote:
> Benjohn Barnes wrote:
>> One article there [3] seems to imply that while a packrat parser's
>> memoization gives linear theoretical time, for almost all real
>> grammars (and inputs, I presume) it's actually not necessary.

>
> There's a bit of a chicken/egg problem here.
>
> Existing computer languages have been created to be easy to parse,
> which means they require minimum lookahead- one or two tokens.
>
> In part, that's because such languages are easier for humans to
> parse as well, and in part because parser generators couldn't
> produce efficient parsers for them... until now. Now that packrat
> is on the scene, we're free to create more natural looking grammars.


That's an interesting suggestion - that languages are as they are to a
great extent because of the difficulty of implementing their parsers, so
existing languages don't need memoized parsing. Certainly a compelling
point. Do you see newer parsing opening up language design then?

I bumped in to a few suggestions of plugable language syntax while
reading about PEG, and I definitely like that idea.

Cheers,
B

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Lib to generate XML/JSON[P] output from a DTD/XSD/JSON Schema/etc Acácio Centeno Python 1 02-15-2013 07:34 AM
I am facing an issue while decoding json string using json.loads sajuptpm Python 2 12-28-2012 07:16 AM
[ANN] Security Fix json-1.1.7 for json_pure and json gems Florian Frank Ruby 0 06-30-2009 05:18 PM
"JSON for ASP" at json.org Tuðrul Topuz ASP General 1 06-27-2008 11:37 PM
parsing json output Gowri Python 9 03-27-2008 05:35 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57