Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > regexp(ing) Backus-Naurish expressions ...

Reply
Thread Tools

regexp(ing) Backus-Naurish expressions ...

 
 
Robert Klemme
Guest
Posts: n/a
 
      03-11-2013
On 10.03.2013 23:21, Stefan Ram wrote:
> Robert Klemme <(E-Mail Removed)> writes:
>> What limitations would make me want to write a FSM instead by hand?

>
> It is a natural idea that the user may input simple
> arithmetic expressions with numeric literals, basic
> arithmetics, parentheses and algebraic signs when the
> program asks for a numeric value.


I am sorry but you are not answering the question.

Cheers

robert


--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
Reply With Quote
 
 
 
 
Robert Klemme
Guest
Posts: n/a
 
      03-11-2013
On 10.03.2013 23:54, Roedy Green wrote:
> Examples where regexes run out of steam:


I never said you can do anything with regexps. You said they are "quite
limited" to which I responded "I beg to differ: it's amazing what you
can do with them." I think you are talking completely past me.

> parsing Java, HTML, BAT language ... to do syntax colouring.


For that you need a context free parser anyway and would not create a
FSM by hand.

> screen scraping, where what you want can appear in arbiter orders, be
> missing, or enclosed in a variety of delimiters.


Still, I haven't seen a single reason to create a FSM by hand.

> creating code to simulate the output of forms. You have to do it in
> stages. You pick out a string then you pick out strings of that


Regexps are for _parsing_ and not for _generating_.

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
Reply With Quote
 
 
 
 
Robert Klemme
Guest
Posts: n/a
 
      03-11-2013
On 11.03.2013 00:24, Roedy Green wrote:
> On Sun, 10 Mar 2013 22:39:22 +0100, Robert Klemme
> <(E-Mail Removed)> wrote, quoted or indirectly quoted
> someone who said :
>
>> What limitations would make me want to write a FSM instead by hand?

>
> Compacting out nugatory space in HTML would be another example.


There are tools for processing tag based languages. Why would I want to
create a FSM by hand for that?

> Though they are quite complicated, I find FSMs very easy to write, and
> they almost always work first time. You can narrow your thinking to a
> tiny case and ignore the big picture quite safely.


Certainly you can write FSMs for a lot of things. But you were claiming
that a manual FSM should be used instead of a regexp engine; so the
question remains unanswered: why would anyone create a FSM by hand for
parsing?

> In contrast, I find my regexes (of any complexity) nearly always have
> some unexpected behaviour, often than does not show up immediately.


Well, that certainly depends on your familiarity with the tool. To me
this sounds suspiciously like NIH syndrome. I am so familiar with using
regular expressions of various kinds that it would not occur to me to
start writing a FSM for parsing by hand. That is such a waste of time.

> The other complicating factor is I use three different regex schemes
> in a day: Java, Funduc and SlickEdit. I keep borrowing syntax from
> one of the other schemes than the one I am using.


And how exactly do you implement a FSM in SlickEdit?

> Some day I will
> have to write replacements that use Java syntax.


Not sure what you mean by that.

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
Reply With Quote
 
Arne Vajhj
Guest
Posts: n/a
 
      03-11-2013
On 3/11/2013 4:08 PM, Robert Klemme wrote:
> On 11.03.2013 00:24, Roedy Green wrote:
>> On Sun, 10 Mar 2013 22:39:22 +0100, Robert Klemme
>> <(E-Mail Removed)> wrote, quoted or indirectly quoted
>> someone who said :
>>
>>> What limitations would make me want to write a FSM instead by hand?

>>
>> Compacting out nugatory space in HTML would be another example.

>
> There are tools for processing tag based languages. Why would I want to
> create a FSM by hand for that?
>
>> Though they are quite complicated, I find FSMs very easy to write, and
>> they almost always work first time. You can narrow your thinking to a
>> tiny case and ignore the big picture quite safely.

>
> Certainly you can write FSMs for a lot of things. But you were claiming
> that a manual FSM should be used instead of a regexp engine; so the
> question remains unanswered: why would anyone create a FSM by hand for
> parsing?


It sounds cool to claim to do so in a usenet thread!



>> The other complicating factor is I use three different regex schemes
>> in a day: Java, Funduc and SlickEdit. I keep borrowing syntax from
>> one of the other schemes than the one I am using.

>
> And how exactly do you implement a FSM in SlickEdit?
>
>> Some day I will
>> have to write replacements that use Java syntax.

>
> Not sure what you mean by that.


I think he is talking about writing a plugin with a 100%
Java compatible regex syntax.

Arne


 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      03-11-2013
On 03/11/2013 09:59 PM, Arne Vajhj wrote:
> On 3/11/2013 4:08 PM, Robert Klemme wrote:


>> Certainly you can write FSMs for a lot of things. But you were claiming
>> that a manual FSM should be used instead of a regexp engine; so the
>> question remains unanswered: why would anyone create a FSM by hand for
>> parsing?

>
> It sounds cool to claim to do so in a usenet thread!
>
>


You've got a point there!

Cheers

robert

 
Reply With Quote
 
Joshua Cranmer 🐧
Guest
Posts: n/a
 
      03-11-2013
On 3/10/2013 5:54 PM, Roedy Green wrote:
> Examples where regexes run out of steam:
> parsing Java, HTML, BAT language ... to do syntax colouring.


Actually, all of those examples fall under the category of lexing, which
is very easy to do with regular expressions; the python equivalent of
flex uses regular expressions internally to do the lexing. Basically,
what you'd have to do is this:

1. For each token, compute the regex that matches the token and enclose
it in a named capturing group
2. Combine the token regexes into a single regex using disjunctions
3. Run the large regex on the input string by continually finding
matches until it runs out of them.
4. For each match, use the named capturing group to do actions for that
part of the input string.

> screen scraping, where what you want can appear in arbiter orders, be
> missing, or enclosed in a variety of delimiters.


([()<>,:;@])|(?:[^\\"]|\\.)*|\[(?:[^\\\]]|\\.)*\]|(?:\\.|[^
\t\r\n()<>,:;@["])+

That is an example of a production regular expression I use specifically
for tokenizing. Note in particular that I am matching two separate kinds
of string literals ("foo" and [foo]). The hard part here is that I'm
dealing with an idiot language that made comment-parsing context-free,
but I decided to say "to hell with this" and ignore that fact, banking
that it's a rare edge case I never have to deal with.

Granted, such large regular expressions can become extremely unwieldly
(said regex is actually composed out of about five lines of code plus
detailed comments above each part explaining what it does), but it's
still very simple to do in a regex.

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth
 
Reply With Quote
 
Stefan Ram
Guest
Posts: n/a
 
      03-11-2013
=?UTF-8?B?Sm9zaHVhIENyYW5tZXIg8J+Qpw==?= <(E-Mail Removed)> writes:
>On 3/10/2013 5:54 PM, Roedy Green wrote:
>>parsing Java

>Actually, all of those examples fall under the category of lexing,


Parsing is not lexing, usually parsing comes after lexing.

 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      03-11-2013
On 3/11/2013 6:00 PM, Joshua Cranmer 🐧 wrote:
> [...]
> ([()<>,:;@])|(?:[^\\"]|\\.)*|\[(?:[^\\\]]|\\.)*\]|(?:\\.|[^
> \t\r\n()<>,:;@["])+
>
> That is an example of a production regular expression I use specifically
> for tokenizing. [...]


As Ed Post noted nearly thirty years ago:

It has been observed that a TECO command sequence
more closely resembles transmission line noise
than readable text.
-- "Real Programmers Don't Use PASCAL"

Nobody I know of uses TECO any more, but regexes satisfy
people's craving for gibberish.

--
Eric Sosman
http://www.velocityreviews.com/forums/(E-Mail Removed)d
 
Reply With Quote
 
Arne Vajhøj
Guest
Posts: n/a
 
      03-11-2013
On 3/11/2013 6:31 PM, Eric Sosman wrote:
> On 3/11/2013 6:00 PM, Joshua Cranmer 🐧 wrote:
>> [...]
>> ([()<>,:;@])|(?:[^\\"]|\\.)*|\[(?:[^\\\]]|\\.)*\]|(?:\\.|[^
>> \t\r\n()<>,:;@["])+
>>
>> That is an example of a production regular expression I use specifically
>> for tokenizing. [...]

>
> As Ed Post noted nearly thirty years ago:
>
> It has been observed that a TECO command sequence
> more closely resembles transmission line noise
> than readable text.
> -- "Real Programmers Don't Use PASCAL"
>
> Nobody I know of uses TECO any more, but regexes satisfy
> people's craving for gibberish.


$ edit/teco z.z
%Can't find file "Z.Z"
%Creating new file
*ex$$



(sorry - the only thing I know about TECO is how to exit)

Arne


 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      03-12-2013
On 3/11/2013 6:40 PM, Arne Vajhøj wrote:
> On 3/11/2013 6:31 PM, Eric Sosman wrote:
>>[...]
>> Nobody I know of uses TECO any more, but regexes satisfy
>> people's craving for gibberish.

>
> $ edit/teco z.z
> %Can't find file "Z.Z"
> %Creating new file
> *ex$$
>
>
>
> (sorry - the only thing I know about TECO is how to exit)


Perhaps the most important lesson of all!

--
Eric Sosman
(E-Mail Removed)d
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to use expressions in named-association port map? Alex Rast VHDL 2 11-10-2004 01:00 AM
Custom Regular Expressions in ASP.net Jay Douglas ASP .Net 3 11-03-2003 08:09 PM
Using Aggregates in Case Expressions Anand P Paralkar VHDL 6 10-28-2003 03:27 PM
Regular expressions mark Perl 4 10-28-2003 12:37 PM
Add custom regular expressions to the validation list of available expressions Jay Douglas ASP .Net 0 08-15-2003 10:19 PM



Advertisments