Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Does Perl combine multiple REs into a single automaton?

Reply
Thread Tools

Does Perl combine multiple REs into a single automaton?

 
 
Clint Olsen
Guest
Posts: n/a
 
      06-28-2004
Hi:

I posted earlier about how to speed up writing lexical analyzers in Perl.
With that effort in mind, I was curious to know if Perl combines multiple
patterns like:

if (/pat/) {
} elsif (/pat1/) {
....
} elsif (/pat2/) {
....
....
....
} else {
}

so that pat[\d]+ are in a sense combined via alternation with each branch
working like embedded action code?

The reason why I ask is that someone suggested I try to do this manually in
order to help speed up the pattern matching process (presumably using the
"(?{ code })" feature documented in perlre. Is it really faster to do it
this way?

When I'm in the debugger in Perl, I've noticed that the execution path gets
sort of muddied. I don't see Perl executing each match as a separate
statement. It's as if it jumps right to the code for the pattern match.
If that's the case, then there's not much of a compelling argument to embed
action code and cobmine REs.

Thanks,

-Clint
 
Reply With Quote
 
 
 
 
Anno Siegel
Guest
Posts: n/a
 
      06-28-2004
Clint Olsen <> wrote in comp.lang.perl.misc:
> Hi:
>
> I posted earlier about how to speed up writing lexical analyzers in Perl.
> With that effort in mind, I was curious to know if Perl combines multiple
> patterns like:
>
> if (/pat/) {
> } elsif (/pat1/) {
> ...
> } elsif (/pat2/) {
> ...
> ...
> ...
> } else {
> }
>
> so that pat[\d]+ are in a sense combined via alternation with each branch
> working like embedded action code?


That sounds extremely unlikely.

> The reason why I ask is that someone suggested I try to do this manually in
> order to help speed up the pattern matching process (presumably using the
> "(?{ code })" feature documented in perlre. Is it really faster to do it
> this way?


Only benchmarks can answer that. Out of interest, I made up an example
along the lines you sketched up there. The straight if/elsif/else came
out 50% faster than a regex with embedded code.

Anno
 
Reply With Quote
 
 
 
 
Clint Olsen
Guest
Posts: n/a
 
      06-29-2004
On 2004-06-29, Abigail <> wrote:
>
> No, that would be impossible. First of all, the different patterns may
> contain different set of parens, so $1 and friends would need to be set
> to different things. But more importantly, in the original code, if
> 'pat2' matches, but 'pat' and 'pat1' don't, $_ has been accessed three
> times. And $_ could be tied.


Ok, that makes sense. I didn't think it would be possible to combine them.
It's just that Perl behind the scenes must be doing some sort of weird
execution for these if/else/branches since they are never 'visited' in the
debugger. It just immediately jumps to the code block.

> It *may* be faster to write it as
>
> if (/pat([12]?)/) {
> if ($1 eq "") {...}
> elsif ($1 eq "1") {...}
> else {...}
> }
>
> You'd only use the regex engine was. However, you are using a more
> complicated pattern, and that means the optimizer can do less. Which
> might actually result in a slowdown. You'll have to benchmark to be sure.


I was wondering about that, too - Write a megapattern with capture buffers
and just test the capture buffers for which action to take...

FWIW, I just took the regular expression set for all the keywords of my
language and merged it with the reserved symbols into a larger pattern
separated by an alternation. Since the action code was identical, I
thought it would be a reasonable test. Unfortunately in my case, I didn't
notice any particular speed difference. As you said, this could be because
the pattern is slightly more complicated now or perhaps statistically
speaking the symbols just aren't seen often enough to make a difference...

Thanks,

-Clint
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Is hi-res + shrink better than taking lo-res? veg_all@yahoo.com Digital Photography 15 02-02-2007 08:30 AM
Simple question: combine a quoted string into a single token Squeamizh Ruby 2 08-07-2006 06:33 PM
Combine hash declaration/assignment into single statement? usenet@DavidFilmer.com Perl Misc 34 03-08-2006 03:09 AM
RES: RES: RES: Bare-bones Ruby Leiradella, Andre V Matos Da Cunha Ruby 0 12-29-2004 06:27 PM
RES: RES: Bare-bones Ruby Leiradella, Andre V Matos Da Cunha Ruby 1 12-29-2004 01:45 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57