Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Fastest way to find a match?

Reply
Thread Tools

Fastest way to find a match?

 
 
bukzor
Guest
Posts: n/a
 
      03-12-2008
Hi,

I'm trying to find the fastest way in perl to see if a name contains
another.

I've a list of 2704 names (aka "A")

I've another name (aka "B")

I need to know if any of A is contained in B.

A = foo foo1 foo2 foo3 foo45 ....
B = INCASE_foo2_YOUWANT
is a match

B = INCASE_YOURDONOTWANT
is not a match.

what would be the fastest way to check the 2704 possible values of
"A" ?

Thanks,


so far, I'm using

foreach $t (keys %A) {
$v = $B;
$v = s/$t//;
if ($v ne $B) {
print "MATCH"
}
}
 
Reply With Quote
 
 
 
 
Joost Diepenmaat
Guest
Posts: n/a
 
      03-12-2008
bukzor <(E-Mail Removed)> writes:

> Hi,
>
> I'm trying to find the fastest way in perl to see if a name contains
> another.
>
> I've a list of 2704 names (aka "A")
>
> I've another name (aka "B")


<snip>

> foreach $t (keys %A) {
> $v = $B;
> $v = s/$t//;
> if ($v ne $B) {
> print "MATCH"
> }
> }


Did you notice that doesn't work? $v = s/../../; assignes the number of
replacements of $_ into $v.

For an unordered list of strings of lengths >= 1, the fastest check
would probably be to use the index() function, though it may not be that
much more or even less fast than a regex match (NOT a replace!).

See perldoc -f index and perldoc perlretut.

--
Joost Diepenmaat | blog: http://joost.zeekat.nl/ | work: http://zeekat.nl/
 
Reply With Quote
 
 
 
 
Ben Morrow
Guest
Posts: n/a
 
      03-13-2008

Quoth Joost Diepenmaat <(E-Mail Removed)>:
> bukzor <(E-Mail Removed)> writes:
>
> > I'm trying to find the fastest way in perl to see if a name contains
> > another.
> >
> > I've a list of 2704 names (aka "A")
> >
> > I've another name (aka "B")

>
> <snip>
>
> > foreach $t (keys %A) {
> > $v = $B;
> > $v = s/$t//;
> > if ($v ne $B) {
> > print "MATCH"
> > }
> > }

>
> Did you notice that doesn't work? $v = s/../../; assignes the number of
> replacements of $_ into $v.
>
> For an unordered list of strings of lengths >= 1, the fastest check
> would probably be to use the index() function, though it may not be that
> much more or even less fast than a regex match (NOT a replace!).


For finding one name, index is best. For finding many, probably best is
something like

# this makes things considerably faster, but make sure your strings
# are all in the same Unicode state first. If necessary, Encode them
# all to UTF8.
use bytes;

for my $B (@B) {
study $B;
for my $A (@A) {
$B =~ /$A/ and print "MATCH";
}
}

as 'study' only remembers the last string studied. An alternative would
be to build a big regex from all the search strings joined with '|', but
that would probably end up slower due to having such a large compiled
regex.

Ben

 
Reply With Quote
 
John Bokma
Guest
Posts: n/a
 
      03-13-2008
bukzor <(E-Mail Removed)> wrote:

> Hi,
>
> I'm trying to find the fastest way in perl to see if a name contains
> another.
>
> I've a list of 2704 names (aka "A")
>
> I've another name (aka "B")
>
> I need to know if any of A is contained in B.
>
> A = foo foo1 foo2 foo3 foo45 ....
> B = INCASE_foo2_YOUWANT


As usual, it's hard to say without seeing some real examples of "A" and
"B". Are the parts in B always separated by _ ?

--
John

http://johnbokma.com/
 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      03-13-2008
bukzor <(E-Mail Removed)> wrote:
>Hi,
>
>I'm trying to find the fastest way in perl to see if a name contains
>another.
>
>I've a list of 2704 names (aka "A")
>
>I've another name (aka "B")
>
>I need to know if any of A is contained in B.
>
>A = foo foo1 foo2 foo3 foo45 ....
>B = INCASE_foo2_YOUWANT
>is a match
>
>B = INCASE_YOURDONOTWANT
>is not a match.
>
>what would be the fastest way to check the 2704 possible values of
>"A" ?
>
>Thanks,
>
>
>so far, I'm using
>
>foreach $t (keys %A) {
> $v = $B;
> $v = s/$t//;
> if ($v ne $B) {


What does the string value of the number of matches have to do with the
original text of $B? This condition will always succeed unless $t and $B are
both '1'.

Maybe you meant to test the result of s/// directly?
if ($v =~ s/$t/) {

However, why do a s/// and awkwardly restore $v for each iteration in the
first place? A simple
if ($B =~ m/$_/) {
will do without all the temporary assignments, which cost time!


Having said that I strongly believe your code isn't doing what you think
it's doing in the first place. Initially you wrote
"if any of A is contained in B"
But your code is testing if B is a regular expression that matches any of A.
That is something very different.

I would imagine a simple index() is what you are looking for

foreach (keys %A) {
if (index($B, $_) > -1) {
print "FOUND";
}
}

This is probably also the fastest method, but you may want to run some
benchmarks.

jue
 
Reply With Quote
 
Michele Dondi
Guest
Posts: n/a
 
      03-13-2008
On Wed, 12 Mar 2008 16:34:08 -0700 (PDT), bukzor
<(E-Mail Removed)> wrote:

>I'm trying to find the fastest way in perl to see if a name contains
>another.

[...]
>so far, I'm using
>
>foreach $t (keys %A) {
> $v = $B;
> $v = s/$t//;
> if ($v ne $B) {


Funniest way I've seen thus far to check for a match!

> print "MATCH"


How 'bout

use List::Util 'first';

# ...

my @rx=map qr/\Q$_\E/ => @A;
say "match!" if first { $B ~~ $_ } @rx;

?

L::U is XS and should be pretty fast.


Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
 
Reply With Quote
 
bukzor
Guest
Posts: n/a
 
      03-13-2008
Thanks everyone for your thoughtful posts. The OP was psuedocode and
not to be taken literally.

The piece to check is indeed always surrounded by underscores. Here is
the solution from a coworker that I've implemented:


What about using a hash table? You could put the 2704 "A" names in
hash table. Then, break done "B" into all the possible parts that
could match something in "A" and see if there is something in the hash
table that matches.

$hash{"foo"} = 1;
$hash{"foo1"} = 1;
....

For "INCASE_foo2_YOUWANT", there is only one possible matcher after
you remove everything before the first underscore and after the last
underscore, so just check the hash table for the existence of "foo2".

For "INCASE_foo2_SOMETHING_YOUWANT", you could check both "foo2" and
"SOMETHING".

Wouldn't that be near instant?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

before:
51.670u 0.100s 0:57.76 89.6% 0+0k 0+0io 1033pf+0w
after:
0.560u 0.040s 0:00.58 103.4% 0+0k 0+0io 1033pf+0w
I'm a happy camper...
 
Reply With Quote
 
jm
Guest
Posts: n/a
 
      03-15-2008
Michele Dondi a écrit :
> On Wed, 12 Mar 2008 16:34:08 -0700 (PDT), bukzor
> <(E-Mail Removed)> wrote:
>
>> I'm trying to find the fastest way in perl to see if a name contains
>> another.

> [...]
>> so far, I'm using
>>
>> foreach $t (keys %A) {
>> $v = $B;
>> $v = s/$t//;
>> if ($v ne $B) {

>
> Funniest way I've seen thus far to check for a match!
>
>> print "MATCH"

>
> How 'bout
>
> use List::Util 'first';
>
> # ...
>
> my @rx=map qr/\Q$_\E/ => @A;
> say "match!" if first { $B ~~ $_ } @rx;
>
> ?
>
> L::U is XS and should be pretty fast.
>
>
> Michele


I didnot find ~~ in man perlop.

Is this a perl operator?
or a L::U operator?
 
Reply With Quote
 
Ben Morrow
Guest
Posts: n/a
 
      03-15-2008

Quoth jm <(E-Mail Removed)>:
> Michele Dondi a écrit :
> >
> > use List::Util 'first';
> >
> > # ...
> >
> > my @rx=map qr/\Q$_\E/ => @A;
> > say "match!" if first { $B ~~ $_ } @rx;
> >
> > ?
> >
> > L::U is XS and should be pretty fast.

>
> I didnot find ~~ in man perlop.
>
> Is this a perl operator?
> or a L::U operator?


It's the new 'smart match' operator in perl 5.10. In the case above,
since $_ is always a regular expression, it is equivalent to =~; so

print "match!\n" if first { $B =~ $_ } @rx;

Ben

 
Reply With Quote
 
Dr.Ruud
Guest
Posts: n/a
 
      03-16-2008
bukzor schreef:

> I'm trying to find the fastest way in perl to see if a name
> contains another.
>
> I've a list of 2704 names (aka "A")
>
> I've another name (aka "B")
>
> I need to know if any of A is contained in B.


Considered fgrep?

--
Affijn, Ruud

"Gewoon is een tijger."
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
What the fastest algorithm for find max/min in an array of integers? Eugeny Myunster C Programming 16 04-28-2008 07:24 PM
fastest way to find the intersection of n lists of sets Prateek Python 11 04-30-2007 08:19 AM
how to find the fastest cpan mirror site? XiaotingHua Perl Misc 1 11-02-2004 10:33 AM
Fastest 5 mp Digital Camera ? Fastest 4 mp Digital Camera? photoguysept102004@yahoo.com Digital Photography 6 10-28-2004 11:33 AM
fastest way to access global vars? rghatol@unionbankph.com ASP .Net 1 03-02-2004 04:54 PM



Advertisments