Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > A hash or array of regexp's?

Reply
Thread Tools

A hash or array of regexp's?

 
 
Tim Shoppa
Guest
Posts: n/a
 
      03-28-2005
I often find myself with a list of things that I'm searching for. And
for each of the things I'm searching for, there's an action I want to
do.

Sometimes the "search for" pattern is just the first four characters in
the line, for example. Here things are easy: I build a hash with the
key being the four-character pattern, and the value being the
subroutine to execute. Works very nicely: get each line, use a
substr() to extract the first four characters, look them up in the
hash, and execute the correct subroutine. Very quick, very fast, very
idiomatic.

But other times the patterns are not so easily handled. Often they are
true regexp's, matching variable repeats/patterns. This of course can
be handled with if matches and blocks to do the actions, but this
screams out to me as something that I ought to be able to handle using
a data structure which is something like a hash, using regexp's as
keys.

Pages 193/194 of the Camel book reveal how to loop over a bunch of
precompiled regexp's, using qr// to precompile the regexp's, and this
isn't bad. But it's not quite the same as a hash lookup. And it seems
to me that there ought to be an idiom, maybe a CPAN module, that makes
the whole operation look more like a hash lookup, because that's how I
think of it in my head, even though I know that regexp's aren't really
as quick or efficient as simple keys.

So, is there a common perl idiom for dealing with this situation?
Maybe a CPAN module?

Tim.

 
Reply With Quote
 
 
 
 
xhoster@gmail.com
Guest
Posts: n/a
 
      03-28-2005
"Tim Shoppa" <> wrote:
> I often find myself with a list of things that I'm searching for. And
> for each of the things I'm searching for, there's an action I want to
> do.
>
> Sometimes the "search for" pattern is just the first four characters in
> the line, for example. Here things are easy: I build a hash with the
> key being the four-character pattern, and the value being the
> subroutine to execute. Works very nicely: get each line, use a
> substr() to extract the first four characters, look them up in the
> hash, and execute the correct subroutine. Very quick, very fast, very
> idiomatic.
>
> But other times the patterns are not so easily handled. Often they are
> true regexp's, matching variable repeats/patterns. This of course can
> be handled with if matches and blocks to do the actions, but this
> screams out to me as something that I ought to be able to handle using
> a data structure which is something like a hash, using regexp's as
> keys.
>
> Pages 193/194 of the Camel book reveal how to loop over a bunch of
> precompiled regexp's, using qr// to precompile the regexp's, and this
> isn't bad. But it's not quite the same as a hash lookup. And it seems
> to me that there ought to be an idiom, maybe a CPAN module, that makes
> the whole operation look more like a hash lookup, because that's how I
> think of it in my head, even though I know that regexp's aren't really
> as quick or efficient as simple keys.


Also, any given string can match many different regexes, while there is
exactly one hash key it can match. Trying to munge such a situation into a
hash-like idiom seems very misleading and just asking for trouble.

I'd just use an array of arrays, with each inner array being of length 2,
a regex/action pair.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
 
 
 
Fabian Pilkowski
Guest
Posts: n/a
 
      03-29-2005
* Tim Shoppa schrieb:

> I often find myself with a list of things that I'm searching for. And
> for each of the things I'm searching for, there's an action I want to
> do.
>
> Sometimes the "search for" pattern is just the first four characters in
> the line, for example. Here things are easy: I build a hash with the
> key being the four-character pattern, and the value being the
> subroutine to execute. Works very nicely: get each line, use a
> substr() to extract the first four characters, look them up in the
> hash, and execute the correct subroutine. Very quick, very fast, very
> idiomatic.
>
> But other times the patterns are not so easily handled. Often they are
> true regexp's, matching variable repeats/patterns. This of course can
> be handled with if matches and blocks to do the actions, but this
> screams out to me as something that I ought to be able to handle using
> a data structure which is something like a hash, using regexp's as
> keys.
>
> So, is there a common perl idiom for dealing with this situation?


I would do this with an array containing a regex as each second element
and the callback in the following one, then iterating over this array
while skipping the callback elements.

#!/usr/bin/perl -w
use strict;

my @array = (
qr/(line\s(\d)\2)/ => sub { print "match: $1" },
# ...
);

while ( <DATA> ) {
for my $i ( 0 .. @array-1 ) {
next if $i % 2; # skip if odd
my( $re, $sub ) = @array[ $i, $i+1 ];
$sub->() if $_ =~ $re; # callback
}
}
__DATA__
line 10
line 11
line 12


>
> Maybe a CPAN module?


The Modul Tie::HashRef is moving around the problem of stringified hash
keys. Perhaps it accepts a reference to a regex as keys -- the doc isn't
talking about and neither I checked it out yet.

regards,
fabian
 
Reply With Quote
 
Tim Shoppa
Guest
Posts: n/a
 
      03-29-2005
Fabian Pikowski wrote:
> The Modul Tie::HashRef is moving around the problem


Thanks for the tip, it's not only a tied hash but also a useful
object-oriented approach to looking for matches. It takes "qr//" forms
directly as the key, no need stringify/destringify. And to answer the
other reply, the approach taken ("first match") works fine for my
purposes.

I know it's not really a hash (with all the efficiencies that would be
implied if it was) but I like to think in terms of a hash, and
Tie::HashRef works wonderfully for this.

Tim.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
hash of hash of hash of hash in c++ rp C++ 1 11-10-2011 04:45 PM
Hash#select returns an array but Hash#reject returns a hash... Srijayanth Sridhar Ruby 19 07-02-2008 12:49 PM
Benchmark segfault [Was: Array#inject to create a hash versus Hash[*array.collect{}.flatten] ] Michal Suchanek Ruby 6 06-13-2007 04:40 AM
Array#inject to create a hash versus Hash[*array.collect{}.flatten] -- Speed, segfault Anthony Martinez Ruby 4 06-11-2007 08:16 AM
Sort by hash vaule, an array of hash references fahdsultan@gmail.com Perl Misc 11 10-10-2005 09:35 PM



Advertisments