Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > a little parsing challenge ☺

Reply
Thread Tools

a little parsing challenge ☺

 
 
Xah Lee
Guest
Posts: n/a
 
      07-17-2011
2011-07-16

folks, this one will be interesting one.

the problem is to write a script that can check a dir of text files
(and all subdirs) and reports if a file has any mismatched matching
brackets.

• The files will be utf-8 encoded (unix style line ending).

• If a file has mismatched matching-pairs, the script will display the
file name, and the line number and column number of the first
instance where a mismatched bracket occures. (or, just the char number
instead (as in emacs's “point”))

• the matching pairs are all single unicode chars. They are these and
nothing else: () {} [] “” ‹› «» 【】 〈〉 《》 「」『』
Note that ‘single curly quote’ is not consider matching pair here.

• You script must be standalone. Must not be using some parser tools.
But can call lib that's part of standard distribution in your lang.

Here's a example of mismatched bracket: ([)], (“[[”), ((, 】etc. (and
yes, the brackets may be nested. There are usually text between these
chars.)

I'll be writing a emacs lisp solution and post in 2 days. Ι welcome
other lang implementations. In particular, perl, python, php, ruby,
tcl, lua, Haskell, Ocaml. I'll also be able to eval common lisp
(clisp) and Scheme lisp (scsh), Java. Other lang such as Clojure,
Scala, C, C++, or any others, are all welcome, but i won't be able to
eval it. javascript implementation will be very interesting too, but
please indicate which and where to install the command line version.

I hope you'll find this a interesting “challenge”. This is a parsing
problem. I haven't studied parsers except some Wikipedia reading, so
my solution will probably be naive. I hope to see and learn from your
solution too.

i hope you'll participate. Just post solution here. Thanks.

Xah
 
Reply With Quote
 
 
 
 
Raymond Hettinger
Guest
Posts: n/a
 
      07-17-2011
On Jul 17, 12:47*am, Xah Lee <(E-Mail Removed)> wrote:
> i hope you'll participate. Just post solution here. Thanks.


http://pastebin.com/7hU20NNL


Raymond
 
Reply With Quote
 
 
 
 
Robert Klemme
Guest
Posts: n/a
 
      07-17-2011
On 07/17/2011 11:48 AM, Raymond Hettinger wrote:
> On Jul 17, 12:47 am, Xah Lee<(E-Mail Removed)> wrote:
>> i hope you'll participate. Just post solution here. Thanks.

>
> http://pastebin.com/7hU20NNL


Ruby solution: https://gist.github.com/1087583

Kind regards

robert
 
Reply With Quote
 
mhenn
Guest
Posts: n/a
 
      07-17-2011
Am 17.07.2011 15:20, schrieb Robert Klemme:
> On 07/17/2011 11:48 AM, Raymond Hettinger wrote:
>> On Jul 17, 12:47 am, Xah Lee<(E-Mail Removed)> wrote:
>>> i hope you'll participate. Just post solution here. Thanks.

>>
>> http://pastebin.com/7hU20NNL

>
> Ruby solution: https://gist.github.com/1087583


I acutally don't know Ruby that well, but it looks like your program
recognizes "[(])" as correct although it is not, because you translate
"[(])" to "(())" (which is indeed correct, but does not resemble the
input correctly anymore).

>
> Kind regards
>
> robert


 
Reply With Quote
 
Thomas Jollans
Guest
Posts: n/a
 
      07-17-2011
On Jul 17, 9:47*am, Xah Lee <(E-Mail Removed)> wrote:
> 2011-07-16
>
> folks, this one will be interesting one.
>
> the problem is to write a script that can check a dir of text files
> (and all subdirs) and reports if a file has any mismatched matching
> brackets.
>
> • The files will be utf-8 encoded (unix style line ending).
>
> • If a file has mismatched matching-pairs, the script will display the
> file name, and the *line number and column number of the first
> instance where a mismatched bracket occures. (or, just the char number
> instead (as in emacs's “point”))
>
> • the matching pairs are all single unicode chars. They are theseand
> nothing else: () {} [] “” ‹› «»【】 〈〉 《》 「」 『』
> Note that ‘single curly quote’ is not consider matching pair here.
>
> • You script must be standalone. Must not be using some parser tools.
> But can call lib that's part of standard distribution in your lang.
>
> Here's a example of mismatched bracket: ([)], (“[[”), ((,】etc. (and
> yes, the brackets may be nested. There are usually text between these
> chars.)
>
> I'll be writing a emacs lisp solution and post in 2 days. Ι welcome
> other lang implementations. In particular, perl, python, php, ruby,
> tcl, lua, Haskell, Ocaml. I'll also be able to eval common lisp
> (clisp) and Scheme lisp (scsh), Java. Other lang such as Clojure,
> Scala, C, C++, or any others, are all welcome, but i won't be able to
> eval it. javascript implementation will be very interesting too, but
> please indicate which and where to install the command line version.
>
> I hope you'll find this a interesting “challenge”. This is a parsing
> problem. I haven't studied parsers except some Wikipedia reading, so
> my solution will probably be naive. I hope to see and learn from your
> solution too.
>
> i hope you'll participate. Just post solution here. Thanks.


I thought I'd have some fun with multi-processing:

https://gist.github.com/1087682
 
Reply With Quote
 
Thomas Boell
Guest
Posts: n/a
 
      07-17-2011
On Sun, 17 Jul 2011 02:48:42 -0700 (PDT)
Raymond Hettinger <(E-Mail Removed)> wrote:

> On Jul 17, 12:47*am, Xah Lee <(E-Mail Removed)> wrote:
> > i hope you'll participate. Just post solution here. Thanks.

>
> http://pastebin.com/7hU20NNL


I'm new to Python. I think I'd have done it in a similar way (in any
language). Your use of openers/closers looks nice though. In the
initialization of openers, I guess you implicitly create a kind of
hash, right? Then the 'in' operator checks for the keys. That is elegant
because you have the openers and closers right next to each other, not
in separate lists.

But why do you enumerate with start=1? Shouldn't you start with index 0?


 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      07-17-2011
On 07/17/2011 03:55 PM, mhenn wrote:
> Am 17.07.2011 15:20, schrieb Robert Klemme:
>> On 07/17/2011 11:48 AM, Raymond Hettinger wrote:
>>> On Jul 17, 12:47 am, Xah Lee<(E-Mail Removed)> wrote:
>>>> i hope you'll participate. Just post solution here. Thanks.
>>>
>>> http://pastebin.com/7hU20NNL

>>
>> Ruby solution: https://gist.github.com/1087583

>
> I acutally don't know Ruby that well, but it looks like your program
> recognizes "[(])" as correct although it is not, because you translate
> "[(])" to "(())" (which is indeed correct, but does not resemble the
> input correctly anymore).


Right you are. The optimization breaks the logic. Good catch!

Kind regards

robert
 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      07-17-2011
On 07/17/2011 06:01 PM, Robert Klemme wrote:
> On 07/17/2011 03:55 PM, mhenn wrote:
>> Am 17.07.2011 15:20, schrieb Robert Klemme:
>>> On 07/17/2011 11:48 AM, Raymond Hettinger wrote:
>>>> On Jul 17, 12:47 am, Xah Lee<(E-Mail Removed)> wrote:
>>>>> i hope you'll participate. Just post solution here. Thanks.
>>>>
>>>> http://pastebin.com/7hU20NNL
>>>
>>> Ruby solution: https://gist.github.com/1087583

>>
>> I acutally don't know Ruby that well, but it looks like your program
>> recognizes "[(])" as correct although it is not, because you translate
>> "[(])" to "(())" (which is indeed correct, but does not resemble the
>> input correctly anymore).

>
> Right you are. The optimization breaks the logic. Good catch!


Turns out with a little possessiveness I can fix my original approach
which has the added benefit of not needing three passes through the file
(the two #tr's are obsolete now).

https://gist.github.com/1087583

Cheers

robert
 
Reply With Quote
 
Raymond Hettinger
Guest
Posts: n/a
 
      07-17-2011
On Jul 17, 8:49*am, Thomas Boell <(E-Mail Removed)> wrote:
> But why do you enumerate with start=1? Shouldn't you start with index 0?


The problem specification says that the the char number should match
the emacs goto-char function which is indexed from one, not from
zero. This is testable by taking the output of the program and
running it through emacs to see that the cursor gets moved exactly to
the location of the mismatched delimiter.


Raymond

 
Reply With Quote
 
rantingrick
Guest
Posts: n/a
 
      07-18-2011
On Jul 17, 2:47*am, Xah Lee <(E-Mail Removed)> wrote:
> 2011-07-16
>
> folks, this one will be interesting one.
>
> the problem is to write a script that can check a dir of text files
> (and all subdirs) and reports if a file has any mismatched matching
> brackets.
>
>[...]
>
> You script must be standalone. Must not be using some parser tools.
> But can call lib that's part of standard distribution in your lang.


I stopped reading here and did...

>>> from HyperParser import HyperParser # python2.x


....and called it a day. This module is part of the stdlib (idlelib
\HyperParser) so as per your statement it is legal (may not be the
fastest solution).

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
a little parsing challenge ☺ Xah Lee Python 70 07-25-2011 05:57 AM
a little parsing challenge ☺ Xah Lee Perl Misc 29 07-21-2011 09:54 PM
a little challenge - reproduce this error Intransition Ruby 21 06-11-2011 02:16 PM
1 little 2 little 3 little Kennedys dale Digital Photography 0 03-23-2008 01:03 PM
[Q] little regexp challenge henon Ruby 5 10-17-2003 07:37 AM



Advertisments