Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: Strange array.array performance

Reply
Thread Tools

Re: Strange array.array performance

 
 
Maxim Khitrov
Guest
Posts: n/a
 
      02-19-2009
On Thu, Feb 19, 2009 at 2:35 PM, Robert Kern <(E-Mail Removed)> wrote:
> On 2009-02-19 12:52, Maxim Khitrov wrote:
>>
>> Hello all,
>>
>> I'm currently writing a Python<-> MATLAB interface with ctypes and
>> array.array class, using which I'll need to push large amounts of data
>> to MATLAB.

>
> Have you taken a look at mlabwrap?
>
> http://mlabwrap.sourceforge.net/
>
> At the very least, you will probably want to use numpy arrays instead of
> array.array.
>
> http://numpy.scipy.org/


I have, but numpy is not currently available for python 2.6, which is
what I need for some other features, and I'm trying to keep the
dependencies down in any case. Mlabwrap description doesn't mention if
it is thread-safe, and that's another one of my requirements.

The only feature that I'm missing with array.array is the ability to
quickly pre-allocate large chunks of memory. To do that right now I'm
using array('d', (0,) * size). It would be nice if array accepted an
int as the second argument indicating how much memory to allocate and
initialize to 0.

- Max
 
Reply With Quote
 
 
 
 
Maxim Khitrov
Guest
Posts: n/a
 
      02-20-2009
On Thu, Feb 19, 2009 at 7:01 PM, Scott David Daniels
<(E-Mail Removed)> wrote:
> Maxim Khitrov wrote:
>>
>> On Thu, Feb 19, 2009 at 2:35 PM, Robert Kern <(E-Mail Removed)>
>> wrote:
>> I have, but numpy is not currently available for python 2.6, which is
>> what I need for some other features, and I'm trying to keep the
>> dependencies down in any case....
>> The only feature that I'm missing with array.array is the ability to
>> quickly pre-allocate large chunks of memory. To do that right now I'm
>> using array('d', (0,) * size). It would be nice if array accepted an
>> int as the second argument indicating how much memory to allocate and
>> initialize to 0.

>
> In the meantime, you could write a function (to ease the shift to numpy)
> and reduce your interface problem to a very small set of lines:
> def zeroes_d(n):
> '''Allocate a n-element vector of 'd' elements'''
> vector = array.array('d') # fromstring has no performance bug
> vector.fromstring(n * 8 * '\0')
> return vector
> Once numpy is up and running on 2.6, this should be easy to convert
> to a call to zeroes.


If I do decide to transition at any point, it will require much
greater modification. For example, to speed-up retrieval of data from
Matlab, which is returned to me as an mxArray structure, I allocate an
array.array for it and then use ctypes.memmove to copy data directly
into the array's buffer (address obtained through buffer_info()).

Same thing for sending data, rather than allocate a separate mxArray,
copy data, and then send, I create an empty mxArray and set its data
pointer to the array's buffer. I'm sure that there are equivalents in
numpy, but the point is that the transition, which currently would not
benefit my code in any significant way, will not be a quick change.

On the other hand, I have to thank you for the fromstring example. For
some reason, it never occurred to me that creating a string of nulls
would be much faster than a tuple of zeros. In fact, you can pass the
string to the constructor and it calls fromstring automatically. For
an array of 1 million elements, using a string to initialize is 18x
faster.

- Max
 
Reply With Quote
 
 
 
 
Maxim Khitrov
Guest
Posts: n/a
 
      02-20-2009
On Thu, Feb 19, 2009 at 7:01 PM, Scott David Daniels
<(E-Mail Removed)> wrote:
> Maxim Khitrov wrote:
>>
>> On Thu, Feb 19, 2009 at 2:35 PM, Robert Kern <(E-Mail Removed)>
>> wrote:
>> I have, but numpy is not currently available for python 2.6, which is
>> what I need for some other features, and I'm trying to keep the
>> dependencies down in any case....
>> The only feature that I'm missing with array.array is the ability to
>> quickly pre-allocate large chunks of memory. To do that right now I'm
>> using array('d', (0,) * size). It would be nice if array accepted an
>> int as the second argument indicating how much memory to allocate and
>> initialize to 0.

>
> In the meantime, you could write a function (to ease the shift to numpy)
> and reduce your interface problem to a very small set of lines:
> def zeroes_d(n):
> '''Allocate a n-element vector of 'd' elements'''
> vector = array.array('d') # fromstring has no performance bug
> vector.fromstring(n * 8 * '\0')
> return vector
> Once numpy is up and running on 2.6, this should be easy to convert
> to a call to zeroes.


Here's the function that I'll be using from now on. It gives me
exactly the behavior I need, with an int initializer being treated as
array size. Still not as efficient as it could be if supported
natively by array (one malloc instead of two + memmove + extra
function call), but very good performance nevertheless:

from array import array as _array
array_null = dict((tc, '\0' * _array(tc).itemsize) for tc in 'cbBuhHiIlLfd')

def array(typecode, init):
if isinstance(init, int):
return _array(typecode, array_null[typecode] * init)
return _array(typecode, init)

- Max
 
Reply With Quote
 
John Machin
Guest
Posts: n/a
 
      02-20-2009
On Feb 20, 6:53*am, Maxim Khitrov <(E-Mail Removed)> wrote:
> On Thu, Feb 19, 2009 at 2:35 PM, Robert Kern <(E-Mail Removed)> wrote:
> > On 2009-02-19 12:52, Maxim Khitrov wrote:

>
> >> Hello all,

>
> >> I'm currently writing a Python<-> *MATLAB interface with ctypes and
> >> array.array class, using which I'll need to push large amounts of data
> >> to MATLAB.

>
> > Have you taken a look at mlabwrap?

>
> > *http://mlabwrap.sourceforge.net/

>
> > At the very least, you will probably want to use numpy arrays instead of
> > array.array.

>
> > *http://numpy.scipy.org/

>
> I have, but numpy is not currently available for python 2.6, which is
> what I need for some other features, and I'm trying to keep the
> dependencies down in any case. Mlabwrap description doesn't mention if
> it is thread-safe, and that's another one of my requirements.
>
> The only feature that I'm missing with array.array is the ability to
> quickly pre-allocate large chunks of memory. To do that right now I'm
> using array('d', (0,) * size).


It would go somewhat faster if you gave it a float instead of an int.

> It would be nice if array accepted an
> int as the second argument indicating how much memory to allocate and
> initialize to 0.


While you're waiting for that to happen, you'll have to use the
fromstring trick, or another gimmick that is faster and is likely not
to use an extra temp 8Mb for a 1M-element array, as I presume the
fromstring does.

[Python 2.6.1 on Windows XP SP3]
[Processor: x86 Family 15 Model 36 Stepping 2 AuthenticAMD ~1994 Mhz]

C:\junk>\python26\python -mtimeit -s"from array import array" "x=array
('d',(0,)*
1000000)"
10 loops, best of 3: 199 msec per loop

C:\junk>\python26\python -mtimeit -s"from array import array" "x=array
('d',(0.,)*1000000)"
10 loops, best of 3: 158 msec per loop

C:\junk>\python26\python -mtimeit -s"from array import array" "x=array
('d');x.fromstring('\0'*8*1000000)"
10 loops, best of 3: 36 msec per loop

C:\junk>\python26\python -mtimeit -s"from array import array" "x=array
('d','\0'*8*1000000)"
10 loops, best of 3: 35.7 msec per loop

C:\junk>\python26\python -mtimeit -s"from array import array" "array
('d',(0.,))*1000000"
10 loops, best of 3: 19.5 msec per loop

HTH,
John
 
Reply With Quote
 
Maxim Khitrov
Guest
Posts: n/a
 
      02-20-2009
On Thu, Feb 19, 2009 at 9:15 PM, John Machin <(E-Mail Removed)> wrote:
> On Feb 20, 6:53 am, Maxim Khitrov <(E-Mail Removed)> wrote:
>> On Thu, Feb 19, 2009 at 2:35 PM, Robert Kern <(E-Mail Removed)> wrote:
>> > On 2009-02-19 12:52, Maxim Khitrov wrote:

>>
>> >> Hello all,

>>
>> >> I'm currently writing a Python<-> MATLAB interface with ctypes and
>> >> array.array class, using which I'll need to push large amounts of data
>> >> to MATLAB.

>>
>> > Have you taken a look at mlabwrap?

>>
>> > http://mlabwrap.sourceforge.net/

>>
>> > At the very least, you will probably want to use numpy arrays instead of
>> > array.array.

>>
>> > http://numpy.scipy.org/

>>
>> I have, but numpy is not currently available for python 2.6, which is
>> what I need for some other features, and I'm trying to keep the
>> dependencies down in any case. Mlabwrap description doesn't mention if
>> it is thread-safe, and that's another one of my requirements.
>>
>> The only feature that I'm missing with array.array is the ability to
>> quickly pre-allocate large chunks of memory. To do that right now I'm
>> using array('d', (0,) * size).

>
> It would go somewhat faster if you gave it a float instead of an int.
>
>> It would be nice if array accepted an
>> int as the second argument indicating how much memory to allocate and
>> initialize to 0.

>
> While you're waiting for that to happen, you'll have to use the
> fromstring trick, or another gimmick that is faster and is likely not
> to use an extra temp 8Mb for a 1M-element array, as I presume the
> fromstring does.
>
> [Python 2.6.1 on Windows XP SP3]
> [Processor: x86 Family 15 Model 36 Stepping 2 AuthenticAMD ~1994 Mhz]
>
> C:\junk>\python26\python -mtimeit -s"from array import array" "x=array
> ('d',(0,)*
> 1000000)"
> 10 loops, best of 3: 199 msec per loop
>
> C:\junk>\python26\python -mtimeit -s"from array import array" "x=array
> ('d',(0.,)*1000000)"
> 10 loops, best of 3: 158 msec per loop
>
> C:\junk>\python26\python -mtimeit -s"from array import array" "x=array
> ('d');x.fromstring('\0'*8*1000000)"
> 10 loops, best of 3: 36 msec per loop
>
> C:\junk>\python26\python -mtimeit -s"from array import array" "x=array
> ('d','\0'*8*1000000)"
> 10 loops, best of 3: 35.7 msec per loop
>
> C:\junk>\python26\python -mtimeit -s"from array import array" "array
> ('d',(0.,))*1000000"
> 10 loops, best of 3: 19.5 msec per loop


Interesting, though I'm not able to replicate that last outcome. The
string method is still the fastest on my machine. Furthermore, it
looks like the order in which you do the multiplication also matters -
(8 * size * '\0') is faster than ('\0' * 8 * size). Here is my test
and outcome:

---
from array import array
from timeit import repeat

print repeat(lambda: array('d', (0,) * 100000), number = 100)
print repeat(lambda: array('d', (0.0,) * 100000), number = 100)
print repeat(lambda: array('d', (0.0,)) * 100000, number = 100)
print repeat(lambda: array('d', '\0' * 100000 * , number = 100)
print repeat(lambda: array('d', '\0' * 8 * 100000), number = 100)
print repeat(lambda: array('d', 8 * 100000 * '\0'), number = 100)
---

[0.91048107424534941, 0.88766983642377162, 0.88312824645684618]
[0.72164595848486179, 0.72038338197219343, 0.72346024633711981]
[0.10763947529894136, 0.1047547164728595, 0.10461521722863232]
[0.05856873793382178, 0.058508825334111947, 0.058361838698573365]
[0.057632016342657799, 0.057521392119007864, 0.057227118035289237]
[0.056006643320014149, 0.056331811311153501, 0.056187433215103333]

The array('d', (0.0,)) * 100000 method is a good compromise between
performance and amount of memory used, so maybe I'll use that instead.

- Max
 
Reply With Quote
 
Aahz
Guest
Posts: n/a
 
      03-12-2009
In article <(E-Mail Removed)>,
Maxim Khitrov <(E-Mail Removed)> wrote:
>
>Interesting, though I'm not able to replicate that last outcome. The
>string method is still the fastest on my machine. Furthermore, it
>looks like the order in which you do the multiplication also matters -
>(8 * size * '\0') is faster than ('\0' * 8 * size).


That's not surprising -- the latter does two string multiplication
operations, which I would expect to be slower than int multiplication.
--
Aahz ((E-Mail Removed)) <*> http://www.pythoncraft.com/

"All problems in computer science can be solved by another level of
indirection." --Butler Lampson
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Performance Tutorials Services - Boosting Performance by DisablingUnnecessary Services on Windows XP Home Edition Software Engineer Javascript 0 06-10-2011 02:18 AM
strange information from asp.net trace / getting performance information using WebRequest and StreamReader z. f. ASP .Net 0 02-03-2005 11:23 AM
Performance problems on Intel but not AMD... strange Chris Roe Java 1 06-11-2004 08:11 AM
Strange performance problem: Please help Sojwal Java 2 01-27-2004 09:21 PM
Web Form Performance Versus Single File Performance jm ASP .Net 1 12-12-2003 11:14 PM



Advertisments