Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   substitution (http://www.velocityreviews.com/forums/t712148-substitution.html)

Adi Eyal 01-18-2010 02:17 PM

substitution
 
> From:*superpollo <utente@esempio.net>
> To:
> Date:*Mon, 18 Jan 2010 11:15:37 +0100
> Subject:*substitution
> hi.
>
> what is the most pythonic way to substitute substrings?
>
> eg: i want to apply:
>
> foo --> bar
> baz --> quux
> quuux --> foo
>
> so that:
>
> fooxxxbazyyyquuux --> barxxxquuxyyyfoo
>
> bye



Using regular expressions the answer is short (and sweet)

mapping = {
"foo" : "bar",
"baz" : "quux",
"quuux" : "foo"
}

pattern = "(%s)" % "|".join(mapping.keys())
repl = lambda x : mapping.get(x.group(1), x.group(1))
s = "fooxxxbazyyyquuux"
re.subn(pattern, repl, s)

Iain King 01-18-2010 02:23 PM

Re: substitution
 
On Jan 18, 2:17*pm, Adi Eyal <a...@digitaltrowel.com> wrote:
> > From:*superpollo <ute...@esempio.net>
> > To:
> > Date:*Mon, 18 Jan 2010 11:15:37 +0100
> > Subject:*substitution
> > hi.

>
> > what is the most pythonic way to substitute substrings?

>
> > eg: i want to apply:

>
> > foo --> bar
> > baz --> quux
> > quuux --> foo

>
> > so that:

>
> > fooxxxbazyyyquuux --> barxxxquuxyyyfoo

>
> > bye

>
> Using regular expressions the answer is short (and sweet)
>
> mapping = {
> * * * * "foo" : "bar",
> * * * * "baz" : "quux",
> * * * * "quuux" : "foo"
>
> }
>
> pattern = "(%s)" % "|".join(mapping.keys())
> repl = lambda x : mapping.get(x.group(1), x.group(1))
> s = "fooxxxbazyyyquuux"
> re.subn(pattern, repl, s)


Winner! :)

Iain

Steven D'Aprano 01-18-2010 04:26 PM

Re: substitution
 
On Mon, 18 Jan 2010 06:23:44 -0800, Iain King wrote:

> On Jan 18, 2:17*pm, Adi Eyal <a...@digitaltrowel.com> wrote:

[...]
>> Using regular expressions the answer is short (and sweet)
>>
>> mapping = {
>> * * * * "foo" : "bar",
>> * * * * "baz" : "quux",
>> * * * * "quuux" : "foo"
>>
>> }
>>
>> pattern = "(%s)" % "|".join(mapping.keys())
>> repl = lambda x : mapping.get(x.group(1), x.group(1))
>> s = "fooxxxbazyyyquuux"
>> re.subn(pattern, repl, s)

>
> Winner! :)


What are the rules for being declared "Winner"? For the simple case
given, calling s.replace three times is much faster: more than twice as
fast.

But a bigger problem is that the above "winner" may not work correctly if
there are conflicts between the target strings (e.g. 'a'->'X',
'aa'->'Y'). The problem is that the result you get depends on the order
of the searches, BUT as given, that order is non-deterministic.
dict.keys() returns in an arbitrary order, which means the caller can't
specify the order except by accident. For example:

>>> repl = lambda x : m[x.group(1)]
>>> m = {'aa': 'Y', 'a': 'X'}
>>> pattern = "(%s)" % "|".join(m.keys())
>>> subn(pattern, repl, 'aaa') # expecting 'YX'

('XXX', 3)

The result that you get using this method will be consistent but
arbitrary and unpredictable.




For those who care, here's my timing code:

from timeit import Timer

setup = """
mapping = {"foo" : "bar", "baz" : "quux", "quuux" : "foo"}
pattern = "(%s)" % "|".join(mapping.keys())
repl = lambda x : mapping.get(x.group(1), x.group(1))
repl = lambda x : mapping[x.group(1)]
s = "fooxxxbazyyyquuux"
from re import subn
"""

t1 = Timer("subn(pattern, repl, s)", setup)
t2 = Timer(
"s.replace('foo', 'bar').replace('baz', 'quux').replace('quuux', 'foo')",
"s = 'fooxxxbazyyyquuux'")


And the results on my PC:

>>> min(t1.repeat(number=100000))

1.1273870468139648
>>> min(t2.repeat(number=100000))

0.49491715431213379



--
Steven

Iain King 01-18-2010 04:46 PM

Re: substitution
 
On Jan 18, 4:26*pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
> On Mon, 18 Jan 2010 06:23:44 -0800, Iain King wrote:
> > On Jan 18, 2:17*pm, Adi Eyal <a...@digitaltrowel.com> wrote:

> [...]
> >> Using regular expressions the answer is short (and sweet)

>
> >> mapping = {
> >> * * * * "foo" : "bar",
> >> * * * * "baz" : "quux",
> >> * * * * "quuux" : "foo"

>
> >> }

>
> >> pattern = "(%s)" % "|".join(mapping.keys())
> >> repl = lambda x : mapping.get(x.group(1), x.group(1))
> >> s = "fooxxxbazyyyquuux"
> >> re.subn(pattern, repl, s)

>
> > Winner! :)

>
> What are the rules for being declared "Winner"? For the simple case
> given, calling s.replace three times is much faster: more than twice as
> fast.
>
> But a bigger problem is that the above "winner" may not work correctly if
> there are conflicts between the target strings (e.g. 'a'->'X',
> 'aa'->'Y'). The problem is that the result you get depends on the order
> of the searches, BUT as given, that order is non-deterministic.
> dict.keys() returns in an arbitrary order, which means the caller can't
> specify the order except by accident. For example:
>
> >>> repl = lambda x : m[x.group(1)]
> >>> m = {'aa': 'Y', 'a': 'X'}
> >>> pattern = "(%s)" % "|".join(m.keys())
> >>> subn(pattern, repl, 'aaa') *# expecting 'YX'

>
> ('XXX', 3)
>
> The result that you get using this method will be consistent but
> arbitrary and unpredictable.
>
> For those who care, here's my timing code:
>
> from timeit import Timer
>
> setup = """
> mapping = {"foo" : "bar", "baz" : "quux", "quuux" : "foo"}
> pattern = "(%s)" % "|".join(mapping.keys())
> repl = lambda x : mapping.get(x.group(1), x.group(1))
> repl = lambda x : mapping[x.group(1)]
> s = "fooxxxbazyyyquuux"
> from re import subn
> """
>
> t1 = Timer("subn(pattern, repl, s)", setup)
> t2 = Timer(
> "s.replace('foo', 'bar').replace('baz', 'quux').replace('quuux', 'foo')",
> "s = 'fooxxxbazyyyquuux'")
>
> And the results on my PC:
>
> >>> min(t1.repeat(number=100000))

> 1.1273870468139648
> >>> min(t2.repeat(number=100000))

>
> 0.49491715431213379
>
> --
> Steven


Adi elicited that response from me because his solution was vastly
more succinct than everything else that had appeared up til that point
while still meeting the OP's requirements. The OP never cared about
overlap between 2 'find' strings, just between the 'find' and
'replace' strings (though I did take it into account in my second post
for the sake of completeness). His code could have been a little
cleaner, I'd have trimmed it to:

mapping = {"foo": "bar", "baz": "quux", "quuux": "foo"}
pattern = "(%s)" % "|".join(mapping)
repl = lambda x : mapping[x.group(1)]
s = "fooxxxbazyyyquuux"
re.subn(pattern, repl, s)

but apart from that was very pythonic: explicit, succinct, and all the
heavy work is done by the module (i.e. in compiled c code in the
majority case of CPython). It can be 'upgraded' to cover the find-
find overlap if you really want (I *believe* regexps will match the
leftmost option in a group first):

subs = [("foo", "bar"), ("baz", "quux"), ("quuux", "foo")]
pattern = "(%s)" % "|".join((x[0] for x in subs))
mapping = dict(subs)
repl = lambda x : mapping[x.group(1)]
s = "fooxxxbazyyyquuux"
re.subn(pattern, repl, s)

Anyway, there's no prize for winning, but by all means: if you think
someone else's code and not a variation on this should win for most
pythonic, then make your nomination :)

Iain


All times are GMT. The time now is 12:44 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.