Velocity Reviews > Efficient way to sum a product of numbers...

Efficient way to sum a product of numbers...

vsoler
Guest
Posts: n/a

 08-31-2009
Hi,

After simplifying my problem, I can say that I want to get the sum of
the product of two culumns:

Say
m= [[ 'a', 1], [ 'b', 2],[ 'a', 3]]
r={'a':4, 'b':5, 'c':6}

What I need is the calculation

1*4 + 2*5 + 3*4 = 4 + 10 + 12 = 26

That is, for each row list in variable 'm' look for its first element
in variable 'r' and multiply the value found by the second element in
row 'm'. After that, sum all the products.

What's an efficient way to do it? I have thousands of these
calculations to make on a big data file.

Thank you.

Tim Chase
Guest
Posts: n/a

 08-31-2009
> After simplifying my problem, I can say that I want to get the sum of
> the product of two culumns:
>
> Say
> m= [[ 'a', 1], [ 'b', 2],[ 'a', 3]]

assuming you meant ['c', 3] here... ^
> r={'a':4, 'b':5, 'c':6}
>
> What I need is the calculation
>
> 1*4 + 2*5 + 3*4 = 4 + 10 + 12 = 26

and you mean "3*6" here instead of "3*4", which is 18 instead of
12, making the whole sum 4+10+18=32

Then it sounds like you could do something like

result = sum(v * r[k] for k,v in m)

where "m" is any arbitrary iterable of tuples. If the keys (the
letters) aren't guaranteed to be in "r", then you can use
defaults (in this case "0", but could just as likely be "1"

result = sum(v * r.get(k,0) for k,v in m)

If the conditions above don't hold, you'll have to introduce me

-tkc

vsoler
Guest
Posts: n/a

 08-31-2009
On Aug 31, 6:30*pm, Tim Chase <(E-Mail Removed)> wrote:
> > After simplifying my problem, I can say that I want to get the sum of
> > the product of two culumns:

>
> > Say
> > * * * * *m= [[ 'a', 1], [ 'b', 2],[ 'a', 3]]

>
> assuming you meant ['c', 3] here... * *^> * * * * *r={'a':4, 'b':5, 'c':6}
>
> > What I need is the calculation

>
> > * * * * *1*4 + 2*5 + 3*4 = 4 + 10 + 12 = 26

>
> and you mean "3*6" here instead of "3*4", which is 18 instead of
> 12, making the whole sum 4+10+18=32
>
> Then it sounds like you could do something like
>
> * result = sum(v * r[k] for k,v in m)
>
> where "m" is any arbitrary iterable of tuples. *If the keys (the
> letters) aren't guaranteed to be in "r", then you can use
> defaults (in this case "0", but could just as likely be "1"
>
> * result = sum(v * r.get(k,0) for k,v in m)
>
> If the conditions above don't hold, you'll have to introduce me
>
> -tkc

Hello Tim,

There is no mistake in my original post, so I really meant [ 'a', 3]

Imagine that m contains time sheets of suppliers

supplier 'a' has worked for you 1 hour
supplier 'b' has worked for you 2 hour
supplier 'a' has worked for you 3 hour

Now

supplier 'a' charges \$4 per hour
supplier 'b' charges \$5 per hour
supplier 'c' charges \$6 per hour

I want to know how much I will be charged this month by my pannel of
suppliers.

1*4 + 2*5 + 3*4 = 4 + 10 + 12 = 26

This is what I am after.
I expect all my suppliers to have handed me in advance the per hour
fee. If at least one hasn't, I must know that the result is undefined.

Hope this helps

Vicente Soler

Tim Chase
Guest
Posts: n/a

 08-31-2009
vsoler wrote:
> On Aug 31, 6:30 pm, Tim Chase <(E-Mail Removed)> wrote:
>>> After simplifying my problem, I can say that I want to get the sum of
>>> the product of two culumns:
>>> Say
>>> m= [[ 'a', 1], [ 'b', 2],[ 'a', 3]]

>> assuming you meant ['c', 3] here... ^> r={'a':4, 'b':5, 'c':6}
>>
>>> What I need is the calculation
>>> 1*4 + 2*5 + 3*4 = 4 + 10 + 12 = 26

>> and you mean "3*6" here instead of "3*4", which is 18 instead of
>> 12, making the whole sum 4+10+18=32
>>
>> Then it sounds like you could do something like
>>
>> result = sum(v * r[k] for k,v in m)
>>
>> where "m" is any arbitrary iterable of tuples. If the keys (the
>> letters) aren't guaranteed to be in "r", then you can use
>> defaults (in this case "0", but could just as likely be "1"
>>
>> result = sum(v * r.get(k,0) for k,v in m)
>>
>> If the conditions above don't hold, you'll have to introduce me

> There is no mistake in my original post, so I really meant [ 'a', 3]

Ah...that makes more sense of the data. My answer still holds
then. Use the r[k] version instead of the r.get(...) version,
and it will throw an exception if the rate doesn't exist in your
mapping. (a KeyError if you want to catch it)

-tkc

vsoler
Guest
Posts: n/a

 08-31-2009
On Aug 31, 6:59*pm, Tim Chase <(E-Mail Removed)> wrote:
> vsoler wrote:
> > On Aug 31, 6:30 pm, Tim Chase <(E-Mail Removed)> wrote:
> >>> After simplifying my problem, I can say that I want to get the sum of
> >>> the product of two culumns:
> >>> Say
> >>> * * * * *m= [[ 'a', 1], [ 'b', 2],[ 'a', 3]]
> >> assuming you meant ['c', 3] here... * *^> * * * * *r={'a':4, 'b':5, 'c':6}

>
> >>> What I need is the calculation
> >>> * * * * *1*4 + 2*5 + 3*4 = 4 + 10 + 12 = 26
> >> and you mean "3*6" here instead of "3*4", which is 18 instead of
> >> 12, making the whole sum 4+10+18=32

>
> >> Then it sounds like you could do something like

>
> >> * result = sum(v * r[k] for k,v in m)

>
> >> where "m" is any arbitrary iterable of tuples. *If the keys (the
> >> letters) aren't guaranteed to be in "r", then you can use
> >> defaults (in this case "0", but could just as likely be "1"
> >> depending on your intent):

>
> >> * result = sum(v * r.get(k,0) for k,v in m)

>
> >> If the conditions above don't hold, you'll have to introduce me
> >> to your new math.

> > There is no mistake in my original post, so I really meant [ 'a', 3]

>
> Ah...that makes more sense of the data. *My answer still holds
> then. *Use the r[k] version instead of the r.get(...) version,
> and it will throw an exception if the rate doesn't exist in your
> mapping. *(a KeyError if you want to catch it)
>
> -tkc

It works!!!

Thank you

Paul Rubin
Guest
Posts: n/a

 08-31-2009
vsoler <(E-Mail Removed)> writes:
> m= [[ 'a', 1], [ 'b', 2],[ 'a', 3]]
> r={'a':4, 'b':5, 'c':6}
>
> What I need is the calculation
>
> 1*4 + 2*5 + 3*4 = 4 + 10 + 12 = 26

sum(r[k]*w for k,w in m)

Jan Kaliszewski
Guest
Posts: n/a

 08-31-2009
31-08-2009 o 18:19:28 vsoler <(E-Mail Removed)> wrote:

> Say
> m= [[ 'a', 1], [ 'b', 2],[ 'a', 3]]
> r={'a':4, 'b':5, 'c':6}
>
> What I need is the calculation
>
> 1*4 + 2*5 + 3*4 = 4 + 10 + 12 = 26
>
> That is, for each row list in variable 'm' look for its first element
> in variable 'r' and multiply the value found by the second element in
> row 'm'. After that, sum all the products.
>
> What's an efficient way to do it? I have thousands of these
> calculations to make on a big data file.

31-08-2009 o 18:30:27 Tim Chase <(E-Mail Removed)> wrote:

> result = sum(v * r[k] for k,v in m)

You can also check if this isn't more efficient:

from itertools import starmap
from operator import mul

result = sum(starmap(mul, ((r[name], hour) for name, hour in m)))

Or, if you had m in form of two lists:

names = ['a', 'b', 'a']
hours = [1, 2, 3]

....then you could do:

from itertools import imap as map # <- remove if you use Py3.x
from operator import mul

result = sum(map(mul, map(r.__getitem__, names), hours))

Cheers,
*j

PS. I've done a quick test on my computer (Pentium 4, 2.4Ghz, Linux):

>>> setup = "from itertools import starmap, imap ; from operator import
>>> mul; import random, string; names =
>>> [rndom.choice(string.ascii_letters) for x in xrange(10000)]; hours =
>>> [random.randint(1, 12) for x in xrange(1000)]; m = zip(names, hours);
>>> workers = set(names); r = dict(zip(workers, (random.randint(1, 10) for
>>> x in xrange(en(workers)))))"
>>> tests = (

.... 'sum(v * r[k] for k,v in m)',
.... 'sum(starmap(mul, ((r[name], hour) for name, hour in m)))',
.... 'sum(imap(mul, imap(r.__getitem__, names), hours))',
.... )
>>> for t in tests:

.... print t
.... timeit.repeat(t, setup, number=1000)
.... print
....
sum(v * r[k] for k,v in m)
[6.2493009567260742, 6.1892399787902832, 6.2634339332580566]

sum(starmap(mul, ((r[name], hour) for name, hour in m)))
[9.3293819427490234, 10.280816078186035, 9.2766909599304199]

sum(imap(mul, imap(r.__getitem__, names), hours))
[5.7341709136962891, 5.5898380279541016, 5.7318859100341797]

--
Jan Kaliszewski (zuo) <(E-Mail Removed)>

Jan Kaliszewski
Guest
Posts: n/a

 08-31-2009
31-08-2009 o 22:28:56 Jan Kaliszewski <(E-Mail Removed)> wrote:

> >>> setup = "from itertools import starmap, imap ; from operator

> import mul; import random, string; names = [rndom.choice(string.
> ascii_letters) for x in xrange(10000)]; hours = [random.randint(
> 1, 12) for x in xrange(1000)]; m = zip(names, hours); workers =
> set(names); r = dict(zip(workers, (random.randint(1, 10) for x i
> n xrange(en(workers)))))"

Erratum -- should be:

>>> setup = (

... 'from itertools import starmap, imap;'
... 'from operator import mul;'
... 'import random, string; names'
... ' = [random.choice(string.ascii_letters)'
... ' for x in xrange(10000)];'
... 'hours = [random.randint(1, 12)'
... for x in xrange(10000)];'
... 'm = zip(names, hours);'
... 'workers = set(names);'
... 'r = dict(zip(workers, (random.randint(1, 10)'
... ' for x in xrange(len(workers)))))'
... )

--
Jan Kaliszewski (zuo) <(E-Mail Removed)>

John Nagle
Guest
Posts: n/a

 08-31-2009
vsoler wrote:
> Hi,
>
> After simplifying my problem, I can say that I want to get the sum of
> the product of two columns:
>
> Say
> m= [[ 'a', 1], [ 'b', 2],[ 'a', 3]]
> r={'a':4, 'b':5, 'c':6}
>
> What I need is the calculation
>
> 1*4 + 2*5 + 3*4 = 4 + 10 + 12 = 26

You need a matrix package.

Use "numpy", the Python numerics module, if you're trying to do
operations on multidimensional arrays. In NumPy, you can extract
columns, multiply them together, and take the sum.

John Nagle