Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   Re: Curious to see alternate approach on a search/replace via regex (http://www.velocityreviews.com/forums/t957351-re-curious-to-see-alternate-approach-on-a-search-replace-via-regex.html)

Demian Brecht 02-06-2013 09:55 PM

Re: Curious to see alternate approach on a search/replace via regex
 
Well, an alternative /could/ be:

from urlparse import urlparse

parts = urlparse('http://alongnameofasite1234567.com/q?sports=run&a=1&b=1')
print '%s%s_%s' % (parts.netloc.replace('.', '_'),
parts.path.replace('/', '_'),
parts.query.replace('&', '_').replace('=', '_')
)


Although with the result of:

alongnameofasite1234567_com_q_sports_run_a_1_b_1
1288 function calls in 0.004 seconds


Compared to regex method:

498 function calls (480 primitive calls) in 0.000 seconds

I'd prefer the regex method myself.

Demian Brecht
http://demianbrecht.github.com




On 2013-02-06 1:41 PM, "rh" <richard_hubbe11@lavabit.com> wrote:

>http://alongnameofasite1234567.com/q?sports=run&a=1&b=1




Steven D'Aprano 02-07-2013 03:04 AM

Re: Curious to see alternate approach on a search/replace via regex
 
On Wed, 06 Feb 2013 13:55:58 -0800, Demian Brecht wrote:

> Well, an alternative /could/ be:
>
> from urlparse import urlparse
>
> parts =
> urlparse('http://alongnameofasite1234567.com/q?sports=run&a=1&b=1')
> print '%s%s_%s' % (parts.netloc.replace('.', '_'),
> parts.path.replace('/', '_'),
> parts.query.replace('&', '_').replace('=', '_') )
>
>
> Although with the result of:
>
> alongnameofasite1234567_com_q_sports_run_a_1_b_1
> 1288 function calls in 0.004 seconds
>
>
> Compared to regex method:
>
> 498 function calls (480 primitive calls) in 0.000 seconds
>
> I'd prefer the regex method myself.


I dispute those results. I think you are mostly measuring the time to
print the result, and I/O is quite slow. My tests show that using urlparse
is 33% faster than using regexes, and far more understandable and
maintainable.


py> from urlparse import urlparse
py> def mangle(url):
.... parts = urlparse(url)
.... return '%s%s_%s' % (parts.netloc.replace('.', '_'),
.... parts.path.replace('/', '_'),
.... parts.query.replace('&', '_').replace('=', '_')
.... )
....
py> import re
py> def u2f(u):
.... nx = re.compile(r'https?://(.+)$')
.... u = nx.search(u).group(1)
.... ux = re.compile(r'([-:./?&=]+)')
.... return ux.sub('_', u)
....
py> s = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1'
py> assert u2f(s) == mangle(s)
py>
py> from timeit import Timer
py> setup = 'from __main__ import s, u2f, mangle'
py> t1 = Timer('mangle(s)', setup)
py> t2 = Timer('u2f(s)', setup)
py>
py> min(t1.repeat(repeat=7))
7.2962000370025635
py> min(t2.repeat(repeat=7))
10.981598854064941
py>
py> (10.98-7.29)/10.98
0.33606557377049184


(Timings done using Python 2.6 on my laptop -- your speeds may vary.)



--
Steven

rh 02-07-2013 03:31 AM

Re: Curious to see alternate approach on a search/replace via regex
 
On 07 Feb 2013 03:04:39 GMT
Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:

> On Wed, 06 Feb 2013 13:55:58 -0800, Demian Brecht wrote:
>
> > Well, an alternative /could/ be:
> >
> > from urlparse import urlparse
> >
> > parts =
> > urlparse('http://alongnameofasite1234567.com/q?sports=run&a=1&b=1')
> > print '%s%s_%s' % (parts.netloc.replace('.', '_'),
> > parts.path.replace('/', '_'),
> > parts.query.replace('&', '_').replace('=', '_') )
> >
> >
> > Although with the result of:
> >
> > alongnameofasite1234567_com_q_sports_run_a_1_b_1
> > 1288 function calls in 0.004 seconds
> >
> >
> > Compared to regex method:
> >
> > 498 function calls (480 primitive calls) in 0.000 seconds
> >
> > I'd prefer the regex method myself.

>
> I dispute those results. I think you are mostly measuring the time to
> print the result, and I/O is quite slow. My tests show that using
> urlparse is 33% faster than using regexes, and far more
> understandable and maintainable.
>
>
> py> from urlparse import urlparse
> py> def mangle(url):
> ... parts = urlparse(url)
> ... return '%s%s_%s' % (parts.netloc.replace('.', '_'),
> ... parts.path.replace('/', '_'),
> ... parts.query.replace('&', '_').replace('=', '_')
> ... )
> ...
> py> import re
> py> def u2f(u):
> ... nx = re.compile(r'https?://(.+)$')
> ... u = nx.search(u).group(1)
> ... ux = re.compile(r'([-:./?&=]+)')
> ... return ux.sub('_', u)
> ...
> py> s = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1'
> py> assert u2f(s) == mangle(s)
> py>
> py> from timeit import Timer
> py> setup = 'from __main__ import s, u2f, mangle'
> py> t1 = Timer('mangle(s)', setup)
> py> t2 = Timer('u2f(s)', setup)
> py>
> py> min(t1.repeat(repeat=7))
> 7.2962000370025635
> py> min(t2.repeat(repeat=7))
> 10.981598854064941
> py>
> py> (10.98-7.29)/10.98
> 0.33606557377049184
>
>
> (Timings done using Python 2.6 on my laptop -- your speeds may vary.)


I am using 2.7.3 and I put the re.compile outside the function and it
performed faster than urlparse. I don't print out the data.

Fast
^
| compiled regex
| urlparse
| plain regex
| all-at-once search/replace with alternation
Slow

>
>
>
> --
> Steven



jmfauth 02-07-2013 11:08 AM

Re: Curious to see alternate approach on a search/replace via regex
 
On 7 fév, 04:04, Steven D'Aprano <steve
+comp.lang.pyt...@pearwood.info> wrote:
> On Wed, 06 Feb 2013 13:55:58 -0800, Demian Brecht wrote:
> > Well, an alternative /could/ be:

>
> ...
> py> s = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1'
> py> assert u2f(s) == mangle(s)
> py>
> py> from timeit import Timer
> py> setup = 'from __main__ import s, u2f, mangle'
> py> t1 = Timer('mangle(s)', setup)
> py> t2 = Timer('u2f(s)', setup)
> py>
> py> min(t1.repeat(repeat=7))
> 7.2962000370025635
> py> min(t2.repeat(repeat=7))
> 10.981598854064941
> py>
> py> (10.98-7.29)/10.98
> 0.33606557377049184
>
> (Timings done using Python 2.6 on my laptop -- your speeds may vary.)
>


--------


[OT] Sorry, but I find all these "timeit" I see here and there
more and more ridiculous.

Maybe it's the language itself, which became ridiculous.


code:

r = repeat("('WHERE IN THE WORLD IS CARMEN?'*10).lower()")
print('1:', r)

r = repeat("('WHERE IN THE WORLD IS HÉLÈNE?'*10).lower()")
print('2:', r)

t = Timer("re.sub('CARMEN', 'CARMEN', 'WHERE IN THE WORLD IS
CARMEN?'*10)", "import re")
r = t.repeat()
print('3:', r)

t = Timer("re.sub('HÉLÈNE', 'HÉLÈNE', 'WHERE IN THE WORLD IS
HÉLÈNE?'*10)", "import re")
r = t.repeat()
print('4:', r)

result:

>c:\python32\pythonw -u "vitesse3.py"

1: [2.578785478740226, 2.5738459157233833, 2.5739002658825543]
2: [2.57605654937141, 2.5784755252962572, 2.5775366066044896]
3: [11.856728254324088, 11.856321809655501, 11.857456073846905]
4: [12.111787643688231, 12.102743462128771, 12.098514783440208]
>Exit code: 0
>c:\Python33\pythonw -u "vitesse3.py"

1: [0.6063335264470632, 0.6104798922133946, 0.6078580877959869]
2: [4.080205081267272, 4.079303183698418, 4.0786836706522145]
3: [18.093742209318215, 18.079666699618095, 18.07107661757692]
4: [18.852576768615222, 18.841418050790622, 18.840745369110437]
>Exit code: 0


The future is bright for ... ascii users.

jmf



Chris Angelico 02-07-2013 12:44 PM

Re: Curious to see alternate approach on a search/replace via regex
 
On Thu, Feb 7, 2013 at 10:08 PM, jmfauth <wxjmfauth@gmail.com> wrote:
> The future is bright for ... ascii users.
>
> jmf


So you're admitting to being not very bright?

*ducks*

Seriously jmf, please don't hijack threads just to whine about
contrived issues of Unicode performance yet again. That horse is dead.
Go fork Python and reimplement buggy narrow builds if you want to, the
rest of us are happy with a bug-free Python.

ChrisA

Steven D'Aprano 02-07-2013 10:45 PM

Re: Curious to see alternate approach on a search/replace via regex
 
rh wrote:

> I am using 2.7.3 and I put the re.compile outside the function and it
> performed faster than urlparse. I don't print out the data.


I find that hard to believe. re.compile caches its results, so except for
the very first time it is called, it is very fast -- basically a function
call and a dict lookup. I find it implausible that a micro-optimization
such as you describe could be responsible for speeding the code up by over
33%.

But since you don't demonstrate any actual working code, you could be
correct, or you could be timing it wrong. Without seeing your timing code,
my guess is that you are doing it wrong. Timing code is tricky, which is
why I always show my work. If I get it wrong, someone will hopefully tell
me. Otherwise, I might as well be making up the numbers.



--
Steven


rh 02-07-2013 11:13 PM

Re: Curious to see alternate approach on a search/replace via regex
 
On Fri, 08 Feb 2013 09:45:41 +1100
Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:

> rh wrote:
>
> > I am using 2.7.3 and I put the re.compile outside the function and
> > it performed faster than urlparse. I don't print out the data.

>
> I find that hard to believe. re.compile caches its results, so except
> for the very first time it is called, it is very fast -- basically a
> function call and a dict lookup. I find it implausible that a
> micro-optimization such as you describe could be responsible for
> speeding the code up by over 33%.


Not sure where you came up with that number. Maybe another post?
I never gave any numbers, just comparisons.

>
> But since you don't demonstrate any actual working code, you could be
> correct, or you could be timing it wrong. Without seeing your timing
> code, my guess is that you are doing it wrong. Timing code is tricky,
> which is why I always show my work. If I get it wrong, someone will
> hopefully tell me. Otherwise, I might as well be making up the
> numbers.


re.compile
starttime = time.time()
for i in range(numloops):
u2f()

msg = '\nElapsed {0:.3f}'.format(time.time() - starttime)
print(msg)

>
>
>
> --
> Steven
>



--



Steven D'Aprano 02-07-2013 11:59 PM

Re: Curious to see alternate approach on a search/replace via regex
 
rh wrote:

> On Fri, 08 Feb 2013 09:45:41 +1100
> Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
>
>> rh wrote:
>>
>> > I am using 2.7.3 and I put the re.compile outside the function and
>> > it performed faster than urlparse. I don't print out the data.

>>
>> I find that hard to believe. re.compile caches its results, so except
>> for the very first time it is called, it is very fast -- basically a
>> function call and a dict lookup. I find it implausible that a
>> micro-optimization such as you describe could be responsible for
>> speeding the code up by over 33%.

>
> Not sure where you came up with that number. Maybe another post?


That number comes from my post, which you replied to.

http://mail.python.org/pipermail/pyt...ry/640056.html

By the way, are you aware that you are setting the X-No-Archive header on
your posts?



> I never gave any numbers, just comparisons.
>
>>
>> But since you don't demonstrate any actual working code, you could be
>> correct, or you could be timing it wrong. Without seeing your timing
>> code, my guess is that you are doing it wrong. Timing code is tricky,
>> which is why I always show my work. If I get it wrong, someone will
>> hopefully tell me. Otherwise, I might as well be making up the
>> numbers.

>
> re.compile
> starttime = time.time()
> for i in range(numloops):
> u2f()
>
> msg = '\nElapsed {0:.3f}'.format(time.time() - starttime)
> print(msg)



I suggest you go back to my earlier post, the one you responded to, and look
at how I use the timeit module to time small code snippets. Then read the
documentation for it, and the comments in the source code. If you can get
hold of the Python Cookbook, read Tim Peters' comments in that.

http://docs.python.org/2/library/timeit.html
http://docs.python.org/3/library/timeit.html



Oh, one last thing... pulling out "re.compile" outside of the function does
absolutely nothing. You don't even compile anything. It basically looks up
that a compile function exists in the re module, and that's all.



--
Steven


Ian Kelly 02-08-2013 12:55 AM

Re: Curious to see alternate approach on a search/replace via regex
 
On Thu, Feb 7, 2013 at 4:59 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> Oh, one last thing... pulling out "re.compile" outside of the function does
> absolutely nothing. You don't even compile anything. It basically looks up
> that a compile function exists in the re module, and that's all.


Using Python 2.7:

>>> t1 = Timer("""

.... nx = re.compile(r'https?://(.+)$')
.... v = nx.search(u).group(1)
.... ux = re.compile(r'([-:./?&=]+)')
.... ux.sub('_', v)""", """
.... import re
.... u = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1'""")
>>> t2 = Timer("""

.... v = nx.search(u).group(1)
.... ux.sub('_', v)""", """
.... import re
.... nx = re.compile(r'https?://(.+)$')
.... ux = re.compile(r'([-:./?&=]+)')
.... u = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1'""")
>>> min(t1.repeat())

11.625409933385388
>>> min(t2.repeat())

8.825254885746652

Whatever caching is being done by re.compile, that's still a 24%
savings by moving the compile calls into the setup.

Ian Kelly 02-08-2013 01:08 AM

Re: Curious to see alternate approach on a search/replace via regex
 
On Thu, Feb 7, 2013 at 5:55 PM, Ian Kelly <ian.g.kelly@gmail.com> wrote:
> Whatever caching is being done by re.compile, that's still a 24%
> savings by moving the compile calls into the setup.


On the other hand, if you add an re.purge() call to the start of t1 to
clear the cache:

>>> t3 = Timer("""

.... re.purge()
.... nx = re.compile(r'https?://(.+)$')
.... v = nx.search(u).group(1)
.... ux = re.compile(r'([-:./?&=]+)')
.... ux.sub('_', v)""", """
.... import re
.... u = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1'""")
>>> min(t3.repeat(number=10000))

3.5532990924824617

Which is approximately 30 times slower, so clearly the regular
expression *is* being cached. I think what we're seeing here is that
the time needed to look up the compiled regular expression in the
cache is a significant fraction of the time needed to actually execute
it.


All times are GMT. The time now is 07:00 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.