Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Curious regexp behavior

Reply
Thread Tools

Curious regexp behavior

 
 
Derek Lewis
Guest
Posts: n/a
 
      02-16-2005
On a whim, I just decided to try an experiment with regexps, to see how
they perform in two slightly different cases. I wanted to see how using
a single regexp object for many many evaluations performed compared to
using the regexp within the loop.

The scripts I wrote searched through a words file that is 234937 lines
long.

Here's the scripts I wrote, to clarify:
First one:

total = 0
File.open( 'words', 'r' ) { |file|
file.each_line { |line|
word = line.chomp
total +=1 if word =~ /[a-df-h][aeiou]{2}/
}
}
puts total

Second one:

rexp = /[a-df-h][aeiou]{2}/
total = 0
File.open( 'words', 'r' ) { |file|
file.each_line { |line|
word = line.chomp
total +=1 if word =~ rexp
}
}
puts total


I expected the second one to be slightly faster, but was surprised to
see that it was actually slightly slower. I ran each one about 10-15
times, and eyeballed an average. The results from each run after the
first were pretty consistant.

It's just a curiosity, but does anyone know what might cause them to be
'backwards' like that?

--
Derek Lewis

================================================== =================
Java Web-Application Developer

Email : http://www.velocityreviews.com/forums/(E-Mail Removed)
Cellular : 778.898.5825
Website : http://www.lewisd.com

"If you've got a 5000-line JSP page that has "all in one" support
for three input forms and four follow-up screens, all controlled
by "if" statements in scriptlets, well ... please don't show it
to me . Its almost dinner time, and I don't want to lose my
appetite ."
- Craig R. McClanahan


 
Reply With Quote
 
 
 
 
Charles Mills
Guest
Posts: n/a
 
      02-16-2005
Derek Lewis wrote:
> On a whim, I just decided to try an experiment with regexps, to see

how
> they perform in two slightly different cases. I wanted to see how

using
> a single regexp object for many many evaluations performed compared

to
> using the regexp within the loop.
>
> The scripts I wrote searched through a words file that is 234937

lines
> long.
>
> Here's the scripts I wrote, to clarify:
> First one:
>
> total = 0
> File.open( 'words', 'r' ) { |file|
> file.each_line { |line|
> word = line.chomp
> total +=1 if word =~ /[a-df-h][aeiou]{2}/
> }
> }
> puts total
>
> Second one:
>
> rexp = /[a-df-h][aeiou]{2}/
> total = 0
> File.open( 'words', 'r' ) { |file|
> file.each_line { |line|
> word = line.chomp
> total +=1 if word =~ rexp
> }
> }
> puts total
>
>
> I expected the second one to be slightly faster, but was surprised to
> see that it was actually slightly slower. I ran each one about 10-15
> times, and eyeballed an average. The results from each run after the
> first were pretty consistant.
>
> It's just a curiosity, but does anyone know what might cause them to

be
> 'backwards' like that?
>

I'll wager a guess. In the first version Ruby knows that
'/[a-df-h][aeiou]{2}/' is a regexp. In the second one Ruby doesn't
know if 'rexp' is a variable or method, so it has to do 1 maybe 2 look
ups on every interation before it dispatches String#=~.
Also regexp's are immutable so Ruby doesn't allocate a new regexp on
every interation and storing the regexp has no effect in that regard.

-Charlie

 
Reply With Quote
 
 
 
 
Eric Hodel
Guest
Posts: n/a
 
      02-16-2005
--Apple-Mail-15--771973824
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII; format=flowed

On 15 Feb 2005, at 17:16, Derek Lewis wrote:

> First one:
>
> total = 0
> File.open( 'words', 'r' ) { |file|
> file.each_line { |line|
> word = line.chomp
> total +=1 if word =~ /[a-df-h][aeiou]{2}/

^^^^ inline regexp (part of the AST)
> }
> }
> puts total
>
> Second one:
>
> rexp = /[a-df-h][aeiou]{2}/
> total = 0
> File.open( 'words', 'r' ) { |file|
> file.each_line { |line|
> word = line.chomp
> total +=1 if word =~ rexp

^^^^ variable lookup
> }
> }
> puts total
>
>
> I expected the second one to be slightly faster, but was surprised to
> see that it was actually slightly slower. I ran each one about 10-15
> times, and eyeballed an average. The results from each run after the
> first were pretty consistant.
>
> It's just a curiosity, but does anyone know what might cause them to be
> 'backwards' like that?


Inline regexps are much faster than a variable lookup then using the
methods on the Regexp object.

--
Eric Hodel - (E-Mail Removed) - http://segment7.net
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

--Apple-Mail-15--771973824
content-type: application/pgp-signature; x-mac-type=70674453;
name=PGP.sig
content-description: This is a digitally signed message part
content-disposition: inline; filename=PGP.sig
content-transfer-encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)

iD8DBQFCEtoXMypVHHlsnwQRAh7aAJ46tcIOb0m2MDduBhOXkG pMgX5jTgCfYeqS
Go0t7KlyH3HliLg7xAtWbQE=
=aKt5
-----END PGP SIGNATURE-----

--Apple-Mail-15--771973824--


 
Reply With Quote
 
Ryan Davis
Guest
Posts: n/a
 
      02-16-2005

On Feb 15, 2005, at 5:16 PM, Derek Lewis wrote:

> I expected the second one to be slightly faster, but was surprised to
> see that it was actually slightly slower. I ran each one about 10-15
> times, and eyeballed an average. The results from each run after the
> first were pretty consistant.
>
> It's just a curiosity, but does anyone know what might cause them to be
> 'backwards' like that?


Use ParseTree and you can see why!!!

<576> echo "a=/blah/; 's' =~ a" | parse_tree_show -f
(cut for readability)
[:lasgn, :a, [:lit, /blah/]],
[:call, [:str, "s"], :=~, [:array, [:lvar, :a]]]]]]]]
<577> echo "'s' =~ /blah/" | parse_tree_show -f
(cut for readability)
[:match3, [:lit, /blah/], [:str, "s"]]]]]]]

Basically, the inline regex avoids the lvar lookup and the call and
shoots straight into a match3 node. The lvar is probably not _that_
expensive, but method dispatch is not terribly cheap.

--
(E-Mail Removed) - http://blog.zenspider.com/
http://rubyforge.org/projects/ruby2c/
http://rubyforge.org/projects/parsetree/
http://www.zenspider.com/seattle.rb



 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      02-16-2005

"Derek Lewis" <(E-Mail Removed)> schrieb im Newsbeitrag
news:(E-Mail Removed)...
> On a whim, I just decided to try an experiment with regexps, to see how
> they perform in two slightly different cases. I wanted to see how using
> a single regexp object for many many evaluations performed compared to
> using the regexp within the loop.
>
> The scripts I wrote searched through a words file that is 234937 lines
> long.
>
> Here's the scripts I wrote, to clarify:
> First one:
>
> total = 0
> File.open( 'words', 'r' ) { |file|
> file.each_line { |line|
> word = line.chomp
> total +=1 if word =~ /[a-df-h][aeiou]{2}/
> }
> }
> puts total
>
> Second one:
>
> rexp = /[a-df-h][aeiou]{2}/
> total = 0
> File.open( 'words', 'r' ) { |file|
> file.each_line { |line|
> word = line.chomp
> total +=1 if word =~ rexp
> }
> }
> puts total
>
>
> I expected the second one to be slightly faster, but was surprised to
> see that it was actually slightly slower. I ran each one about 10-15
> times, and eyeballed an average. The results from each run after the
> first were pretty consistant.
>
> It's just a curiosity, but does anyone know what might cause them to be
> 'backwards' like that?


Did you try the same with the matching reversed, i.e., "rexp =~ word"
instead of "word =~ rexp"? Did it make a difference?

Kind regards

robert

 
Reply With Quote
 
William Morgan
Guest
Posts: n/a
 
      02-16-2005
Excerpts from Ryan Davis's mail of 16 Feb 2005 (EST):
> Use ParseTree and you can see why!!!
>
> <576> echo "a=/blah/; 's' =~ a" | parse_tree_show -f
> (cut for readability)
> [:lasgn, :a, [:lit, /blah/]],
> [:call, [:str, "s"], :=~, [:array, [:lvar, :a]]]]]]]]
> <577> echo "'s' =~ /blah/" | parse_tree_show -f
> (cut for readability)
> [:match3, [:lit, /blah/], [:str, "s"]]]]]]]


Very nice answer.

Like the original poster, I found the behavior counterintuitive. Perhaps
this is because our assumptions come from the C model of the universe,
where more local variables is typically faster, and method dispatch is
not a problem.

I wonder what the merits of collecting equivalences like these to form
some kind of post-hoc parse-tree optimization would be. Probably not
great, but it might be fun.

--
William <(E-Mail Removed)>


 
Reply With Quote
 
Derek Lewis
Guest
Posts: n/a
 
      02-16-2005
On Wed, Feb 16, 2005 at 06:14:52PM +0900, Robert Klemme wrote:
>
>
> Did you try the same with the matching reversed, i.e., "rexp =~ word"
> instead of "word =~ rexp"? Did it make a difference?
>
> Kind regards
>
> robert
>


I did, actually, and it was very slightly faster. Still slower than an
inline regexp, however.

Thanks for the insightful answers, everyone. It quite interesting to
find out how your favorite programming language works inside.

--
Derek Lewis

================================================== =================
Java Web-Application Developer

Email : (E-Mail Removed)
Cellular : 778.898.5825
Website : http://www.lewisd.com

"If you've got a 5000-line JSP page that has "all in one" support
for three input forms and four follow-up screens, all controlled
by "if" statements in scriptlets, well ... please don't show it
to me . Its almost dinner time, and I don't want to lose my
appetite ."
- Craig R. McClanahan


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[regexp] How to convert string "/regexp/i" to /regexp/i - ? Joao Silva Ruby 16 08-21-2009 05:52 PM
curious paramstyle qmark behavior BartlebyScrivener Python 7 10-21-2006 03:27 PM
Curious template behavior, g++ and xlC and icpc ckhoge@gmail.com C++ 0 06-15-2005 01:08 AM
Curious string behavior mark Python 2 01-28-2004 06:25 PM
Curious OE6 behavior Rusty Lillico Computer Support 5 07-01-2003 08:13 PM



Advertisments