Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > looping in array vs looping in a dic

Reply
Thread Tools

looping in array vs looping in a dic

 
 
giuseppe.amatulli@gmail.com
Guest
Posts: n/a
 
      09-20-2012
Hi,
I have this script in python that i need to apply for very large arrays (arrays coming from satellite images).
The script works grate but i would like to speed up the process.
The larger computational time is in the for loop process.
Is there is a way to improve that part?
Should be better to use dic() instead of np.ndarray for saving the results?
and if yes how i can make the sum in dic()(like in the correspondent matrix[row_c,1] = matrix[row_c,1] + valuesRaster[row,col] )?
If the dic() is the solution way is faster?

Thanks
Giuseppe

import numpy as np
import sys
from time import clock, time

# create the arrays

start = time()
valuesRaster = np.random.random_integers(0, 100, 100).reshape(10, 10)
valuesCategory = np.random.random_integers(1, 10, 100).reshape(10, 10)

elapsed = (time() - start)
print(elapsed , "create the data")

start = time()

categories = np.unique(valuesCategory)
matrix = np.c_[ categories , np.zeros(len(categories))]

elapsed = (time() - start)
print(elapsed , "create the matrix and append a colum zero ")

rows = 10
cols = 10

start = time()

for col in range(0,cols):
for row in range(0,rows):
for row_c in range(0,len(matrix)) :
if valuesCategory[row,col] == matrix[row_c,0] :
matrix[row_c,1] = matrix[row_c,1] + valuesRaster[row,col]
break
elapsed = (time() - start)
print(elapsed , "loop in the data ")

print (matrix)
 
Reply With Quote
 
 
 
 
MRAB
Guest
Posts: n/a
 
      09-20-2012
On 2012-09-20 19:31, http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> Hi,
> I have this script in python that i need to apply for very large arrays (arrays coming from satellite images).
> The script works grate but i would like to speed up the process.
> The larger computational time is in the for loop process.
> Is there is a way to improve that part?
> Should be better to use dic() instead of np.ndarray for saving the results?
> and if yes how i can make the sum in dic()(like in the correspondent matrix[row_c,1] = matrix[row_c,1] + valuesRaster[row,col] )?
> If the dic() is the solution way is faster?
>
> Thanks
> Giuseppe
>
> import numpy as np
> import sys
> from time import clock, time
>
> # create the arrays
>
> start = time()
> valuesRaster = np.random.random_integers(0, 100, 100).reshape(10, 10)
> valuesCategory = np.random.random_integers(1, 10, 100).reshape(10, 10)
>
> elapsed = (time() - start)
> print(elapsed , "create the data")
>
> start = time()
>
> categories = np.unique(valuesCategory)
> matrix = np.c_[ categories , np.zeros(len(categories))]
>
> elapsed = (time() - start)
> print(elapsed , "create the matrix and append a colum zero ")
>
> rows = 10
> cols = 10
>
> start = time()
>
> for col in range(0,cols):
> for row in range(0,rows):
> for row_c in range(0,len(matrix)) :
> if valuesCategory[row,col] == matrix[row_c,0] :
> matrix[row_c,1] = matrix[row_c,1] + valuesRaster[row,col]
> break
> elapsed = (time() - start)
> print(elapsed , "loop in the data ")
>
> print (matrix)
>

If I understand the code correctly, 'matrix' contains the categories in
column 0 and the totals in column 1.

What you're doing is performing a linear search through the categories
and then adding to the corresponding total.

Linear searches are slow because on average you have to search through
half of the list. Using a dict would be much faster (although you
should of course measure it!).

Try something like this:

import numpy as np
from time import time

# Create the arrays.

start = time()

valuesRaster = np.random.random_integers(0, 100, 100).reshape(10, 10)
valuesCategory = np.random.random_integers(1, 10, 100).reshape(10, 10)

elapsed = time() - start
print(elapsed, "Create the data.")

start = time()

categories = np.unique(valuesCategory)
totals = dict.fromkeys(categories, 0)

elapsed = time() - start
print(elapsed, "Create the totals dict.")

rows = 100
cols = 10

start = time()

for col in range(cols):
for row in range(rows):
cat = valuesCategory[row, col]
ras = valuesRaster[row, col]
totals[cat] += ras

elapsed = time() - start
print(elapsed, "Loop in the data.")

print(totals)

 
Reply With Quote
 
 
 
 
Ian Kelly
Guest
Posts: n/a
 
      09-20-2012
On Thu, Sep 20, 2012 at 1:09 PM, MRAB <(E-Mail Removed)> wrote:
> for col in range(cols):
> for row in range(rows):
> cat = valuesCategory[row, col]
> ras = valuesRaster[row, col]
> totals[cat] += ras


Expanding on what MRAB wrote, since you probably have far fewer
categories than pixels, you may be able to take better advantage of
numpy's vectorized operations (which are pretty much the whole point
of using numpy in the first place) by looping over the categories
instead:

for cat in categories:
totals[cat] += np.sum(valuesCategory * (valuesRaster == cat))
 
Reply With Quote
 
giuseppe.amatulli@gmail.com
Guest
Posts: n/a
 
      09-20-2012
Hi Ian and MRAB
thanks to you input i have improve the speed of my code. Definitely reading in dic() is faster. I have one more question.
In the dic() I calculate the sum of the values, but i want count also the number of observation, in order to calculate the average in the end.
Should i create a new dic() or is possible to do in the same dic().
Here in the final code.
Thanks Giuseppe



rows = dsCategory.RasterYSize
cols = dsCategory.RasterXSize

print("Generating output file %s" %(dst_file))

start = time()

unique=dict()

for irows in xrange(rows):
valuesRaster=dsRaster.GetRasterBand(1).ReadAsArray (0,irows,cols,1)
valuesCategory=dsCategory.GetRasterBand(1).ReadAsA rray(0,irows,cols,1)
for icols in xrange(cols):
if ( valuesRaster[0,icols] != no_data_Raster ) and ( valuesCategory[0,icols] != no_data_Category ) :
row = valuesCategory[0, icols],valuesRaster[0, icols]
if row[0] in unique :
unique[row[0]] += row[1]
else:
unique[row[0]] = 0+row[1] # this 0 was add if not the first observation was considered = 0

 
Reply With Quote
 
giuseppe.amatulli@gmail.com
Guest
Posts: n/a
 
      09-20-2012
Hi Ian and MRAB
thanks to you input i have improve the speed of my code. Definitely reading in dic() is faster. I have one more question.
In the dic() I calculate the sum of the values, but i want count also the number of observation, in order to calculate the average in the end.
Should i create a new dic() or is possible to do in the same dic().
Here in the final code.
Thanks Giuseppe



rows = dsCategory.RasterYSize
cols = dsCategory.RasterXSize

print("Generating output file %s" %(dst_file))

start = time()

unique=dict()

for irows in xrange(rows):
valuesRaster=dsRaster.GetRasterBand(1).ReadAsArray (0,irows,cols,1)
valuesCategory=dsCategory.GetRasterBand(1).ReadAsA rray(0,irows,cols,1)
for icols in xrange(cols):
if ( valuesRaster[0,icols] != no_data_Raster ) and ( valuesCategory[0,icols] != no_data_Category ) :
row = valuesCategory[0, icols],valuesRaster[0, icols]
if row[0] in unique :
unique[row[0]] += row[1]
else:
unique[row[0]] = 0+row[1] # this 0 was add if not the first observation was considered = 0

 
Reply With Quote
 
MRAB
Guest
Posts: n/a
 
      09-20-2012
On 2012-09-21 00:35, (E-Mail Removed) wrote:
> Hi Ian and MRAB
> thanks to you input i have improve the speed of my code. Definitely reading in dic() is faster. I have one more question.
> In the dic() I calculate the sum of the values, but i want count also the number of observation, in order to calculate the average in the end.
> Should i create a new dic() or is possible to do in the same dic().
> Here in the final code.
> Thanks Giuseppe
>

Keep it simple. Use 2 dicts.

>
>
> rows = dsCategory.RasterYSize
> cols = dsCategory.RasterXSize
>
> print("Generating output file %s" %(dst_file))
>
> start = time()
>
> unique=dict()
>
> for irows in xrange(rows):
> valuesRaster=dsRaster.GetRasterBand(1).ReadAsArray (0,irows,cols,1)
> valuesCategory=dsCategory.GetRasterBand(1).ReadAsA rray(0,irows,cols,1)
> for icols in xrange(cols):
> if ( valuesRaster[0,icols] != no_data_Raster ) and ( valuesCategory[0,icols] != no_data_Category ) :
> row = valuesCategory[0, icols],valuesRaster[0, icols]
> if row[0] in unique :
> unique[row[0]] += row[1]
> else:
> unique[row[0]] = 0+row[1] # this 0 was add if not the first observation was considered = 0
>

You could use defaultdict instead:

from collections import defaultdict

unique = defaultdict(int)
....
category, raster = valuesCategory[0, icols],
valuesRaster[0, icols]
unique[category] += raster

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
TestInside Practice Exam 70-290 Dic 2008 erni MCSE 0 01-07-2009 07:55 PM
TestInside Practice Exam 70-290 Dic 2008 erni Microsoft Certification 0 01-07-2009 07:54 PM
Custom.dic Virginia McGovern Computer Support 8 03-10-2006 12:17 AM
Ejecting dic VOX POPULI Computer Support 8 12-08-2004 06:39 PM
Method not recognizing PAREN when looping through character array Curts Java 1 08-21-2003 10:47 PM



Advertisments