Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: Curious to see alternate approach on a search/replace via regex

Reply
Thread Tools

Re: Curious to see alternate approach on a search/replace via regex

 
 
Steven D'Aprano
Guest
Posts: n/a
 
      02-08-2013
Ian Kelly wrote:

> On Thu, Feb 7, 2013 at 4:59 PM, Steven D'Aprano
> <(E-Mail Removed)> wrote:
>> Oh, one last thing... pulling out "re.compile" outside of the function
>> does absolutely nothing. You don't even compile anything. It basically
>> looks up that a compile function exists in the re module, and that's all.

>
> Using Python 2.7:

[...]
> Whatever caching is being done by re.compile, that's still a 24%
> savings by moving the compile calls into the setup.


That may or may not be the case, but rh didn't compile anything. He
moved "re.compile" literally, with no arguments, out of the timing code.
That clearly does nothing except confirm that re.compile exists.



--
Steven

 
Reply With Quote
 
 
 
 
rh
Guest
Posts: n/a
 
      02-08-2013
On Fri, 08 Feb 2013 14:02:14 +1100
Steven D'Aprano <(E-Mail Removed)> wrote:

> Ian Kelly wrote:
>
> > On Thu, Feb 7, 2013 at 4:59 PM, Steven D'Aprano
> > <(E-Mail Removed)> wrote:
> >> Oh, one last thing... pulling out "re.compile" outside of the
> >> function does absolutely nothing. You don't even compile anything.
> >> It basically looks up that a compile function exists in the re
> >> module, and that's all.

> >
> > Using Python 2.7:

> [...]
> > Whatever caching is being done by re.compile, that's still a 24%
> > savings by moving the compile calls into the setup.

>
> That may or may not be the case, but rh didn't compile anything. He
> moved "re.compile" literally, with no arguments, out of the timing
> code. That clearly does nothing except confirm that re.compile exists.


My initial post has the function and in there are two re.compile calls.
I moved those out of the function and see repeatable time efficiency
improvements.

FWIW the fastest so far was posted by Peter Otten and didn't
use regex.

As a new learner of python (or any language) I like to know what
habits will serve me well into the future. So the only reason I look
at the time it takes is as a sanity check to make sure I'm not
learning bad habits. In this case someone else pointed out time
comparisons and off the thread went into timings!

I did take note of your previous post using timeit and filed
that away into the gray matter for some other day.

>
>
>
> --
> Steven
>



--


 
Reply With Quote
 
 
 
 
rh
Guest
Posts: n/a
 
      02-08-2013
On Thu, 7 Feb 2013 18:08:00 -0700
Ian Kelly <(E-Mail Removed)> wrote:

> On Thu, Feb 7, 2013 at 5:55 PM, Ian Kelly <(E-Mail Removed)>
> wrote:
> > Whatever caching is being done by re.compile, that's still a 24%
> > savings by moving the compile calls into the setup.

>
> On the other hand, if you add an re.purge() call to the start of t1 to
> clear the cache:
>
> >>> t3 = Timer("""

> ... re.purge()
> ... nx = re.compile(r'https?://(.+)$')
> ... v = nx.search(u).group(1)
> ... ux = re.compile(r'([-:./?&=]+)')
> ... ux.sub('_', v)""", """
> ... import re
> ... u = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1'""")
> >>> min(t3.repeat(number=10000))

> 3.5532990924824617
>
> Which is approximately 30 times slower, so clearly the regular
> expression *is* being cached. I think what we're seeing here is that
> the time needed to look up the compiled regular expression in the
> cache is a significant fraction of the time needed to actually execute
> it.


By "actually execute" you mean to apply the compiled expression
to the search or sub? Or do you mean the time needed to compile
the pattern into a regex obj?

I presumed that compiling the pattern at each iteration was expensive
and that's why I expected moving it out of the function to reduce the
time needed to search/sub.

 
Reply With Quote
 
Dave Angel
Guest
Posts: n/a
 
      02-08-2013
On 02/07/2013 06:13 PM, rh wrote:
> On Fri, 08 Feb 2013 09:45:41 +1100
> Steven D'Aprano <(E-Mail Removed)> wrote:
>
>> <snip>
>>
>> But since you don't demonstrate any actual working code, you could be
>> correct, or you could be timing it wrong. Without seeing your timing
>> code, my guess is that you are doing it wrong. Timing code is tricky,
>> which is why I always show my work. If I get it wrong, someone will
>> hopefully tell me. Otherwise, I might as well be making up the
>> numbers.

>
> re.compile


That statement does explicitly nothing useful. It certainly doesn't
compile anything, or call any regex code.

> starttime = time.time()
> for i in range(numloops):
> u2f()
>
> msg = '\nElapsed {0:.3f}'.format(time.time() - starttime)
> print(msg)
>



--
DaveA
 
Reply With Quote
 
Ian Kelly
Guest
Posts: n/a
 
      02-08-2013
On Thu, Feb 7, 2013 at 10:57 PM, rh <(E-Mail Removed)> wrote:
> On Thu, 7 Feb 2013 18:08:00 -0700
> Ian Kelly <(E-Mail Removed)> wrote:
>
>> Which is approximately 30 times slower, so clearly the regular
>> expression *is* being cached. I think what we're seeing here is that
>> the time needed to look up the compiled regular expression in the
>> cache is a significant fraction of the time needed to actually execute
>> it.

>
> By "actually execute" you mean to apply the compiled expression
> to the search or sub? Or do you mean the time needed to compile
> the pattern into a regex obj?


The former. Both are dwarfed by the time needed to compile the pattern.
 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      02-08-2013
Ian Kelly wrote:

> On Thu, Feb 7, 2013 at 10:57 PM, rh <(E-Mail Removed)> wrote:
>> On Thu, 7 Feb 2013 18:08:00 -0700
>> Ian Kelly <(E-Mail Removed)> wrote:
>>
>>> Which is approximately 30 times slower, so clearly the regular
>>> expression *is* being cached. I think what we're seeing here is that
>>> the time needed to look up the compiled regular expression in the
>>> cache is a significant fraction of the time needed to actually execute
>>> it.

>>
>> By "actually execute" you mean to apply the compiled expression
>> to the search or sub? Or do you mean the time needed to compile
>> the pattern into a regex obj?

>
> The former. Both are dwarfed by the time needed to compile the pattern.


Surely that depends on the size of the pattern, and the size of the data
being worked on.

Compiling the pattern "s[ai]t" doesn't take that much work, it's only six
characters and very simple. Applying it to:

"sazsid"*1000000 + "sat"

on the other hand may be a tad expensive.

Sweeping generalities about the cost of compiling regexes versus searching
with them are risky.



--
Steven

 
Reply With Quote
 
Ian Kelly
Guest
Posts: n/a
 
      02-08-2013
On Fri, Feb 8, 2013 at 4:43 AM, Steven D'Aprano
<(E-Mail Removed)> wrote:
> Ian Kelly wrote:
> Surely that depends on the size of the pattern, and the size of the data
> being worked on.


Natually.

> Compiling the pattern "s[ai]t" doesn't take that much work, it's only six
> characters and very simple. Applying it to:
>
> "sazsid"*1000000 + "sat"
>
> on the other hand may be a tad expensive.
>
> Sweeping generalities about the cost of compiling regexes versus searching
> with them are risky.


I was referring to the specific timing measurements I made earlier in
this thread, not generalizing.
 
Reply With Quote
 
Serhiy Storchaka
Guest
Posts: n/a
 
      02-15-2013
On 08.02.13 03:08, Ian Kelly wrote:
> I think what we're seeing here is that
> the time needed to look up the compiled regular expression in the
> cache is a significant fraction of the time needed to actually execute
> it.


There is a bug issue for this. See http://bugs.python.org/issue16389 .

 
Reply With Quote
 
rh
Guest
Posts: n/a
 
      02-26-2013
On Fri, 15 Feb 2013 22:58:30 +0200
Serhiy Storchaka <(E-Mail Removed)> wrote:

> On 08.02.13 03:08, Ian Kelly wrote:
> > I think what we're seeing here is that
> > the time needed to look up the compiled regular expression in the
> > cache is a significant fraction of the time needed to actually
> > execute it.

>
> There is a bug issue for this. See http://bugs.python.org/issue16389 .
>


I can't tell what is the problem, is it fixed or still in progress?

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Curious to see alternate approach on a search/replace via regex rh Python 6 02-08-2013 08:53 AM
Re: Curious to see alternate approach on a search/replace via regex Demian Brecht Python 0 02-07-2013 03:08 PM
Re: Curious to see alternate approach on a search/replace via regex Peter Otten Python 0 02-07-2013 09:49 AM
Re: Curious to see alternate approach on a search/replace via regex MRAB Python 0 02-06-2013 11:11 PM
Re: Curious to see alternate approach on a search/replace via regex Demian Brecht Python 0 02-06-2013 10:33 PM



Advertisments