Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Does Python optimize regexes?

Reply
Thread Tools

Does Python optimize regexes?

 
 
Jason Smith
Guest
Posts: n/a
 
      06-29-2004
Hi. I just have a question about optimizations Python does when
converting to bytecode.

import re
for someString in someListOfStrings:
if re.match('foo', someString):
print someString, "matched!"

Does Python notice that re.match is called with the same expression, and
thus lift it out of the loop? Or do I need to always optimize by hand
using re.compile? I suspect so because the Python bytecode generator
would hardly know about a library function like re.compile, unlike e.g.
Perl, with builtin REs.

Thanks much for any clarification or advice.

--
Jason Smith
Open Enterprise Systems
Bangkok, Thailand
http://oes.co.th

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFA4VGMm5qEoSbpT3kRAsYpAKClQmFeamBfDx0vgTVpMc +utgmD/QCcDjpi
edwYF0cRA1V2BvlqV6y4/l4=
=YMB7
-----END PGP SIGNATURE-----

 
Reply With Quote
 
 
 
 
Peter Otten
Guest
Posts: n/a
 
      06-29-2004
Jason Smith wrote:

> Hi. I just have a question about optimizations Python does when
> converting to bytecode.
>
> import re
> for someString in someListOfStrings:
> if re.match('foo', someString):
> print someString, "matched!"
>
> Does Python notice that re.match is called with the same expression, and
> thus lift it out of the loop? Or do I need to always optimize by hand
> using re.compile? I suspect so because the Python bytecode generator
> would hardly know about a library function like re.compile, unlike e.g.
> Perl, with builtin REs.
>
> Thanks much for any clarification or advice.
>


Python puts the compiled regular expressions into a cache. The relevant code
is in sre.py:

def match(pattern, string, flags=0):
return _compile(pattern, flags).match(string)

....

def _compile(*key):
p = _cache.get(key)
if p is not None:
return p
....

So not explicitly calling compile() in advance only costs you two function
calls and a dictionary lookup - and maybe some clarity in your code.

Peter

 
Reply With Quote
 
 
 
 
Peter Otten
Guest
Posts: n/a
 
      06-29-2004
Peter Otten wrote:

> Python puts the compiled regular expressions into a cache. The relevant


By the way, re.compile() uses that cache, too:

>>> import re
>>> r1 = re.compile("abc")
>>> r2 = re.compile("abc")
>>> r1 is r2

True

Peter


 
Reply With Quote
 
Michael Geary
Guest
Posts: n/a
 
      06-29-2004
Peter Otten wrote:
> Python puts the compiled regular expressions into a cache. The relevant
> code is in sre.py:
>
> def match(pattern, string, flags=0):
> return _compile(pattern, flags).match(string)
>
> ...
>
> def _compile(*key):
> p = _cache.get(key)
> if p is not None:
> return p
> ...
>
> So not explicitly calling compile() in advance only costs you two function
> calls and a dictionary lookup - and maybe some clarity in your code.


That cost can be significant. Here's a test case where not precompiling the
regular expression increased the run time by more than 50%:

http://groups.google.com/groups?selm....supernews.com

-Mike


 
Reply With Quote
 
Jason Smith
Guest
Posts: n/a
 
      06-30-2004
Thanks much to Peter and Michael for the clarification.

Peter Otten wrote:
> So not explicitly calling compile() in advance only costs you two function
> calls and a dictionary lookup - and maybe some clarity in your code.


The reason I asked is because I felt that re.compile() was less clear:

someRegex = re.compile('searchforme')
while something:
theString = getTheString()
if someRegex.search(theString):
celebrate()

I wanted to remove someRegex since I can shave a line of code and some
confusion, but I was worried about re.search() in a loop.

The answer is this is smartly handled in Python, as opposed to bytecode
optimizations. Great!

--
Jason Smith
Open Enterprise Systems
Bangkok, Thailand
http://oes.co.th

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFA4jN0m5qEoSbpT3kRAry2AJ9RQQnHiGiR2S5bv2CdOp OhMNXOdACeKfyO
a3iZduUZ5qmkOcoBOkV3XEQ=
=D+ea
-----END PGP SIGNATURE-----

 
Reply With Quote
 
Aahz
Guest
Posts: n/a
 
      07-03-2004
In article <(E-Mail Removed)>,
Jason Smith <(E-Mail Removed)> wrote:
>
>The reason I asked is because I felt that re.compile() was less clear:
>
>someRegex = re.compile('searchforme')
>while something:
> theString = getTheString()
> if someRegex.search(theString):
> celebrate()
>
>I wanted to remove someRegex since I can shave a line of code and some
>confusion, but I was worried about re.search() in a loop.


My reasoning is slightly different. I'm always forgetting with
re.search whether the pattern or string goes first; with re.compile, you
can't fail. Yesterday I fixed a couple of bugs where someone else made
the same error....
--
Aahz ((E-Mail Removed)) <*> http://www.pythoncraft.com/

"Typing is cheap. Thinking is expensive." --Roy Smith, c.l.py
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Does GCC optimize variadic functions to death? Elmar C Programming 14 03-25-2010 12:03 AM
Does Ruby optimize tail-call recursion? Patrick Li Ruby 4 09-04-2008 08:51 AM
why visual studio does not optimize constructor in this case George2 C++ 4 12-28-2007 12:44 AM
Re: why visual studio does not optimize constructor in this case Tristan Wibberley C++ 1 12-27-2007 10:53 PM
How much does Python optimize? Blackbird Python 4 03-04-2006 03:08 AM



Advertisments