On 2012-09-20 19:31,

http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:

> Hi,

> I have this script in python that i need to apply for very large arrays (arrays coming from satellite images).

> The script works grate but i would like to speed up the process.

> The larger computational time is in the for loop process.

> Is there is a way to improve that part?

> Should be better to use dic() instead of np.ndarray for saving the results?

> and if yes how i can make the sum in dic()(like in the correspondent matrix[row_c,1] = matrix[row_c,1] + valuesRaster[row,col] )?

> If the dic() is the solution way is faster?

>

> Thanks

> Giuseppe

>

> import numpy as np

> import sys

> from time import clock, time

>

> # create the arrays

>

> start = time()

> valuesRaster = np.random.random_integers(0, 100, 100).reshape(10, 10)

> valuesCategory = np.random.random_integers(1, 10, 100).reshape(10, 10)

>

> elapsed = (time() - start)

> print(elapsed , "create the data")

>

> start = time()

>

> categories = np.unique(valuesCategory)

> matrix = np.c_[ categories , np.zeros(len(categories))]

>

> elapsed = (time() - start)

> print(elapsed , "create the matrix and append a colum zero ")

>

> rows = 10

> cols = 10

>

> start = time()

>

> for col in range(0,cols):

> for row in range(0,rows):

> for row_c in range(0,len(matrix)) :

> if valuesCategory[row,col] == matrix[row_c,0] :

> matrix[row_c,1] = matrix[row_c,1] + valuesRaster[row,col]

> break

> elapsed = (time() - start)

> print(elapsed , "loop in the data ")

>

> print (matrix)

>
If I understand the code correctly, 'matrix' contains the categories in

column 0 and the totals in column 1.

What you're doing is performing a linear search through the categories

and then adding to the corresponding total.

Linear searches are slow because on average you have to search through

half of the list. Using a dict would be much faster (although you

should of course measure it!).

Try something like this:

import numpy as np

from time import time

# Create the arrays.

start = time()

valuesRaster = np.random.random_integers(0, 100, 100).reshape(10, 10)

valuesCategory = np.random.random_integers(1, 10, 100).reshape(10, 10)

elapsed = time() - start

print(elapsed, "Create the data.")

start = time()

categories = np.unique(valuesCategory)

totals = dict.fromkeys(categories, 0)

elapsed = time() - start

print(elapsed, "Create the totals dict.")

rows = 100

cols = 10

start = time()

for col in range(cols):

for row in range(rows):

cat = valuesCategory[row, col]

ras = valuesRaster[row, col]

totals[cat] += ras

elapsed = time() - start

print(elapsed, "Loop in the data.")

print(totals)