Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > regular expression module?

Reply
Thread Tools

regular expression module?

 
 
Walter Roberson
Guest
Posts: n/a
 
      04-20-2004
I remember reading on this newsgroup, maybe about a year ago, of
a module that did a Finite State Machine implementation of "standard"
regular expressions -- i.e., just what can be built out of
concatenation, alternation, and repetition, (and perhaps
some syntactic sugar such as charactre-classes), without the perl
extensions that lift perl's regex's into a different parser class
entirely.

Unfortunately, when I tried to find the posting again, I could
not, and I could not find anything like that in CPAN.

The matter came to my mind again today when someone mentioned
the Yapp module and parsing speeds -- one of the points that
can really slow down perl's regexes is the backtracking that it
does. If one happens to be working with input whose grammar is
not overly complex, then one might be perfectly happy with
faster but less flexible RE's and a perl equivilent of
yacc and lex.

Would anyone happen to recall the name and location of that
finite-state RE module? Something like that could potentially speed
up a number of my programs.
--
Is "meme" descriptive or perscriptive? Does the knowledge that
memes exist not subtly encourage the creation of more memes?
-- A Child's Garden Of Memes
 
Reply With Quote
 
 
 
 
Anno Siegel
Guest
Posts: n/a
 
      04-20-2004
Walter Roberson <(E-Mail Removed)-cnrc.gc.ca> wrote in comp.lang.perl.misc:
> I remember reading on this newsgroup, maybe about a year ago, of
> a module that did a Finite State Machine implementation of "standard"
> regular expressions -- i.e., just what can be built out of
> concatenation, alternation, and repetition, (and perhaps
> some syntactic sugar such as charactre-classes), without the perl
> extensions that lift perl's regex's into a different parser class
> entirely.
>
> Unfortunately, when I tried to find the posting again, I could
> not, and I could not find anything like that in CPAN.
>
> The matter came to my mind again today when someone mentioned
> the Yapp module and parsing speeds -- one of the points that
> can really slow down perl's regexes is the backtracking that it
> does.


Backtracking is certainly not a Perl-specific addition. Any regular
expression implementation must backtrack.

> If one happens to be working with input whose grammar is
> not overly complex, then one might be perfectly happy with
> faster but less flexible RE's and a perl equivilent of
> yacc and lex.
>
> Would anyone happen to recall the name and location of that
> finite-state RE module? Something like that could potentially speed
> up a number of my programs.


I don't know the module you have in mind, but if it's a re-implementation
of regular expressions in Perl, any speedup is unlikely.

Anno
 
Reply With Quote
 
 
 
 
Brian McCauley
Guest
Posts: n/a
 
      04-20-2004
http://www.velocityreviews.com/forums/(E-Mail Removed)-berlin.de (Anno Siegel) writes:

> Walter Roberson <(E-Mail Removed)-cnrc.gc.ca> wrote in comp.lang.perl.misc:
> > I remember reading on this newsgroup, maybe about a year ago, of
> > a module that did a Finite State Machine implementation of "standard"
> > regular expressions -- i.e., just what can be built out of
> > concatenation, alternation, and repetition, (and perhaps
> > some syntactic sugar such as charactre-classes), without the perl
> > extensions that lift perl's regex's into a different parser class
> > entirely.
> >
> > Unfortunately, when I tried to find the posting again, I could
> > not, and I could not find anything like that in CPAN.
> >
> > The matter came to my mind again today when someone mentioned
> > the Yapp module and parsing speeds -- one of the points that
> > can really slow down perl's regexes is the backtracking that it
> > does.

>
> Backtracking is certainly not a Perl-specific addition. Any regular
> expression implementation must backtrack.


No, the OP is right - "standard" regular expressions can be done with
a Finite State Machine.

I don't, however, recall the thread.

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\
 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      04-20-2004
Brian McCauley <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> (E-Mail Removed)-berlin.de (Anno Siegel) writes:
>
> > Walter Roberson <(E-Mail Removed)-cnrc.gc.ca> wrote in comp.lang.perl.misc:
> > > I remember reading on this newsgroup, maybe about a year ago, of
> > > a module that did a Finite State Machine implementation of "standard"
> > > regular expressions -- i.e., just what can be built out of
> > > concatenation, alternation, and repetition, (and perhaps
> > > some syntactic sugar such as charactre-classes), without the perl
> > > extensions that lift perl's regex's into a different parser class
> > > entirely.
> > >
> > > Unfortunately, when I tried to find the posting again, I could
> > > not, and I could not find anything like that in CPAN.
> > >
> > > The matter came to my mind again today when someone mentioned
> > > the Yapp module and parsing speeds -- one of the points that
> > > can really slow down perl's regexes is the backtracking that it
> > > does.

> >
> > Backtracking is certainly not a Perl-specific addition. Any regular
> > expression implementation must backtrack.

>
> No, the OP is right - "standard" regular expressions can be done with
> a Finite State Machine.


I may be confused, but I don't see the contradiction. The finite-state-
machine *does* the backtracking, no?

Anno
 
Reply With Quote
 
Walter Roberson
Guest
Posts: n/a
 
      04-20-2004
In article <c632jt$m3p$(E-Mail Removed)-Berlin.DE>,
Anno Siegel <(E-Mail Removed)-berlin.de> wrote:
:> No, the OP is right - "standard" regular expressions can be done with
:> a Finite State Machine.

:I may be confused, but I don't see the contradiction. The finite-state-
:machine *does* the backtracking, no?

No, no backtracking is needed.

There are two major representations of finite state machines, one "DFA"
(Deterministic Finite Automata) and the other one "NDFA"
(Non-Deterministic Finite Automata). NDFA are smaller to diagram out,
and are possibly more popular. It is common for beginners to believe
that NDFA can "do more" than DFA can, but they both have the same expressive
power, and there is a mechanical procedure to convert any NDFA to an
exactly equivilent DFA. Some implimentations of NDFA use
backtracking in order to avoid the (non-trivial) overhead of doing
the NDFA -> DFA conversion; those backtracking implimentations are
often faster on simple expressions, but when one has a more complex
expression, then it becomes better to do the conversion and run the
equivilent-DFA.

DFA's never need to backtrack. Once you -have- the DFA, it runs in
time linear to the length of the input: you read the next input
symbol, index that symbol in the transition table of the current
state, and that tells you the new state number. Repeat and by the
end of processing each symbol in the string exactly -once-,
you have either "accepted" the input (in which case the pattern matched)
or else you run out of input at a non-accepting state, in which case
the pattern match failed. The difficulty, the reason this isn't
used much more often, is that if you are starting with an NDFA, the
conversion into a DFA takes exponential time. Going the
regex -> NDFA -> DFA route is thus best suited for cases in which you
expect to "compile once, run often". Too expensive of a process if
you are farting around doing on-the-fly typing in of expressions to
be processed against short files, but well worth doing if you have a
big input file.... especially if you can separate out the compiled
DFA and "save" it so that you don't have to recompile it from run
to run or when other parts of the program change.

'yacc' is a well known program that does all the work of building
up compiled DFA, complete with some kind of state-table compression
so that the result is both space and time efficient. But yacc produces
C code. I seem to recall hearing of a yacc implimentation that
had an option for spitting out perl. Or was that a 'lex' program?
I do not remember clearly.
--
Studies show that the average reader ignores 106% of all statistics
they see in .signatures.
 
Reply With Quote
 
Uri Guttman
Guest
Posts: n/a
 
      04-20-2004
>>>>> "WR" == Walter Roberson <(E-Mail Removed)-cnrc.gc.ca> writes:

WR> 'yacc' is a well known program that does all the work of building
WR> up compiled DFA, complete with some kind of state-table compression
WR> so that the result is both space and time efficient. But yacc produces
WR> C code. I seem to recall hearing of a yacc implimentation that
WR> had an option for spitting out perl. Or was that a 'lex' program?
WR> I do not remember clearly.

search google for perl-byacc

uri
 
Reply With Quote
 
pkent
Guest
Posts: n/a
 
      04-20-2004
In article <c63ivi$5il$(E-Mail Removed)>,
http://www.velocityreviews.com/forums/(E-Mail Removed)-cnrc.gc.ca (Walter Roberson) wrote:

<snip interesting stuff about regexes>

> 'yacc' is a well known program that does all the work of building
> up compiled DFA, complete with some kind of state-table compression
> so that the result is both space and time efficient. But yacc produces
> C code. I seem to recall hearing of a yacc implimentation that
> had an option for spitting out perl. Or was that a 'lex' program?
> I do not remember clearly.


Would you be thinking of yapp? As in:

http://search.cpan.org/~fdesar/Parse.../Parse/Yapp.pm

specifically it says "The script yapp is a front-end to the Parse::Yapp
module and let you easily create a Perl OO parser from an input grammar
file." and "Parse::Yapp should compile a clean yacc grammar without any
modification..."

P

--
pkent 77 at yahoo dot, er... what's the last bit, oh yes, com
Remove the tea to reply
 
Reply With Quote
 
Walter Roberson
Guest
Posts: n/a
 
      04-20-2004
In article <(E-Mail Removed)>,
pkent <(E-Mail Removed)> wrote:
:In article <c63ivi$5il$(E-Mail Removed)>,
: (E-Mail Removed)-cnrc.gc.ca (Walter Roberson) wrote:

:<snip interesting stuff about regexes>

:> 'yacc' is a well known program that does all the work of building
:> up compiled DFA, complete with some kind of state-table compression
:> so that the result is both space and time efficient. But yacc produces
:> C code. I seem to recall hearing of a yacc implimentation that
:> had an option for spitting out perl. Or was that a 'lex' program?
:> I do not remember clearly.

:Would you be thinking of yapp? As in:

:http://search.cpan.org/~fdesar/Parse.../Parse/Yapp.pm

I was not, but thank you for the reference. I seem to recall seeing
[years ago] a version of yacc with an explicit perl option. I might
be able to find it again if I looked around.

But getting back to my original question: although Yapp does appear to
have a lot of functionality useful for my purposes, the module that
I am remembering as having seen mentioned, was, as best I recall,
just an RE (as opposed to regex) compiler and FSM driver, without the
full LALR parsing facilities of yacc/yapp. Useful if, for example,
one had a large number of simple RE's (or just plain strings) to match
against. join('|', @wordlist) is not very efficient to match against
in perl, even if one qr's it, as perl backtracks because it assumes
later portions of the expression might require the full regexp power.

Maybe I should be looking more closely at the regexp optimizer... but
boy does it produce ugly expressions!
--
Can a statement be self-referential without knowing it?
 
Reply With Quote
 
Jim Cochrane
Guest
Posts: n/a
 
      04-21-2004
In article <c640tf$nn9$(E-Mail Removed)>, Walter Roberson wrote:
> In article <(E-Mail Removed)>,
> pkent <(E-Mail Removed)> wrote:
>:In article <c63ivi$5il$(E-Mail Removed)>,
>: (E-Mail Removed)-cnrc.gc.ca (Walter Roberson) wrote:
>
>:<snip interesting stuff about regexes>
>
>:> 'yacc' is a well known program that does all the work of building
>:> up compiled DFA, complete with some kind of state-table compression
>:> so that the result is both space and time efficient. But yacc produces
>:> C code. I seem to recall hearing of a yacc implimentation that
>:> had an option for spitting out perl. Or was that a 'lex' program?
>:> I do not remember clearly.
>
>:Would you be thinking of yapp? As in:
>
>:http://search.cpan.org/~fdesar/Parse.../Parse/Yapp.pm
>
> I was not, but thank you for the reference. I seem to recall seeing
> [years ago] a version of yacc with an explicit perl option. I might
> be able to find it again if I looked around.
>
> But getting back to my original question: although Yapp does appear to
> have a lot of functionality useful for my purposes, the module that
> I am remembering as having seen mentioned, was, as best I recall,
> just an RE (as opposed to regex) compiler and FSM driver, without the
> full LALR parsing facilities of yacc/yapp. Useful if, for example,
> one had a large number of simple RE's (or just plain strings) to match
> against. join('|', @wordlist) is not very efficient to match against
> in perl, even if one qr's it, as perl backtracks because it assumes
> later portions of the expression might require the full regexp power.
>
> Maybe I should be looking more closely at the regexp optimizer... but
> boy does it produce ugly expressions!


Sorry to ask the obvious, but did you try some carefully-worded google
searches?

--
Jim Cochrane; http://www.velocityreviews.com/forums/(E-Mail Removed)
[When responding by email, include the term non-spam in the subject line to
get through my spam filter.]
 
Reply With Quote
 
Walter Roberson
Guest
Posts: n/a
 
      04-21-2004
In article <(E-Mail Removed)>,
Jim Cochrane <(E-Mail Removed)> wrote:
:In article <c640tf$nn9$(E-Mail Removed)>, Walter Roberson wrote:
:> But getting back to my original question: although Yapp does appear to
:> have a lot of functionality useful for my purposes, the module that
:> I am remembering as having seen mentioned, was, as best I recall,
:> just an RE (as opposed to regex) compiler and FSM driver, without the
:> full LALR parsing facilities of yacc/yapp.

:Sorry to ask the obvious, but did you try some carefully-worded google
:searches?

Yes, I spent a couple of hours looking for what I thought I remembered
existing, including examining the thread topic of everything in the
time range that I thought I had seen the particular message in.

It is possible that I did not happen upon the right keywords when I
did the search, but I did try a variety of approaches, and my ability
to elicit useful information from google is usually quite good.
--
"[...] it's all part of one's right to be publicly stupid." -- Dave Smey
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Seek xpath expression where an attribute name is a regular expression GIMME XML 3 12-29-2008 03:11 PM
C/C++ language proposal: Change the 'case expression' from "integral constant-expression" to "integral expression" Adem C++ 42 11-04-2008 12:39 PM
C/C++ language proposal: Change the 'case expression' from "integral constant-expression" to "integral expression" Adem C Programming 45 11-04-2008 12:39 PM
Matching abitrary expression in a regular expression =?iso-8859-1?B?bW9vcJk=?= Java 8 12-02-2005 12:51 AM
Dynamically changing the regular expression of Regular Expression validator VSK ASP .Net 2 08-24-2003 02:47 PM



Advertisments