Velocity Reviews > Java > Algorithm for growing arrays

# Algorithm for growing arrays

Chris
Guest
Posts: n/a

 08-08-2006
This is an ill-defined question, but here goes:

My app uses lots of int arrays. We usually don't know how large the
array will ultimately be when we first start populating it. So we grow
the array as necessary, which involves creating a new, larger array and
copying the data over from the old one.

The question is, when we create a larger array, how large should it be?
Assume that all arrays will eventually grow to a random size, unknowable
in advance. Assume also that we want to optimize for both memory and speed.

At one extreme, we could grow each array an element at a time, that is,
the new size would be the old size + 1. This is memory efficient, but
slow and generates a lot of garbage. At the other extreme, we could
allocate huge arrays at the beginning and double them each time we
needed more space. This would be fast because we would seldom need to
extend the arrays, but very memory inefficient.

Just for reference, java.util.ArrayList uses this formula internally:

int newCapacity = (oldCapacity * 3)/2 + 1;

Patricia Shanahan
Guest
Posts: n/a

 08-08-2006
Chris wrote:
> This is an ill-defined question, but here goes:
>
> My app uses lots of int arrays. We usually don't know how large the
> array will ultimately be when we first start populating it. So we grow
> the array as necessary, which involves creating a new, larger array and
> copying the data over from the old one.
>
> The question is, when we create a larger array, how large should it be?
> Assume that all arrays will eventually grow to a random size, unknowable
> in advance. Assume also that we want to optimize for both memory and speed.

Do you have any data about the distribution of the sizes? Do have a
"cost" function that combines memory and speed costs? How many bytes of
memory are you prepared to waste to avoid copying n bytes of existing array?

If you have that sort of data, you may be able to work out an optimum

If not, I suggest assuming the new size will be a linear function of the
existing size, give yourself some way to change the coefficients, and
experiment in the working program.

Patricia

Steve W. Jackson
Guest
Posts: n/a

 08-08-2006
In article <44d8c00e\$0\$24214\$(E-Mail Removed)> ,
Chris <(E-Mail Removed)> wrote:

> This is an ill-defined question, but here goes:
>
> My app uses lots of int arrays. We usually don't know how large the
> array will ultimately be when we first start populating it. So we grow
> the array as necessary, which involves creating a new, larger array and
> copying the data over from the old one.
>
> The question is, when we create a larger array, how large should it be?
> Assume that all arrays will eventually grow to a random size, unknowable
> in advance. Assume also that we want to optimize for both memory and speed.
>
> At one extreme, we could grow each array an element at a time, that is,
> the new size would be the old size + 1. This is memory efficient, but
> slow and generates a lot of garbage. At the other extreme, we could
> allocate huge arrays at the beginning and double them each time we
> needed more space. This would be fast because we would seldom need to
> extend the arrays, but very memory inefficient.
>
>
> Just for reference, java.util.ArrayList uses this formula internally:
>
> int newCapacity = (oldCapacity * 3)/2 + 1;

If you've already looked into what ArrayList uses, then I think a rather
obvious question is why not use an ArrayList to begin with?

Using ArrayList<Integer> might not be as flexible as an array of int
since Integer is immutable. That's easily rectified with a mutable
plus the automatic ability to grow. The number of available but unused
entries isn't really relevant since they don't contain data until you
put something there.

= Steve =
--
Steve W. Jackson
Montgomery, Alabama

Matt Rose
Guest
Posts: n/a

 08-08-2006
Chris wrote:
> This is an ill-defined question, but here goes:
>
> My app uses lots of int arrays. We usually don't know how large the
> array will ultimately be when we first start populating it. So we grow
> the array as necessary, which involves creating a new, larger array and
> copying the data over from the old one.
>
> The question is, when we create a larger array, how large should it be?
> Assume that all arrays will eventually grow to a random size, unknowable
> in advance. Assume also that we want to optimize for both memory and speed.
>
> At one extreme, we could grow each array an element at a time, that is,
> the new size would be the old size + 1. This is memory efficient, but
> slow and generates a lot of garbage. At the other extreme, we could
> allocate huge arrays at the beginning and double them each time we
> needed more space. This would be fast because we would seldom need to
> extend the arrays, but very memory inefficient.
>
>
> Just for reference, java.util.ArrayList uses this formula internally:
>
> int newCapacity = (oldCapacity * 3)/2 + 1;

caveat: I cannot condone premature optimisation... but it can be fun!

It depends on the usage of these arrays but under some circumstances it
might pay to create a new array each time you run out of space but
don't copy the data across, just keep a reference in some sort of
ordered data structure (possibly another array, as long as you're
willing to cope with running out of space with that one) then when you
need to access all the data just iterate over all your arrays. This
gets more complex if the data might arrive in unpredictably sized
chunks.

Of course, at some point you might have to give in and declare the
extra man hours to maintain your own malloc will cost more than the
hardware required to just use an API List implementation...

Matt

Luc Mercier
Guest
Posts: n/a

 08-08-2006
As long as the growth is exponential, the amortized complexity of adding
a new element --- ie, the average on many addition of new elements ---
is constant. That means the time required to copy the array into a
bigger one as little influence on the runtime, compared to an array of
the good size.

Luc Mercier
Guest
Posts: n/a

 08-08-2006
Luc Mercier wrote:
> As long as the growth is exponential, the amortized complexity of adding
> a new element --- ie, the average on many addition of new elements ---
> is constant. That means the time required to copy the array into a
> bigger one as little influence on the runtime, compared to an array of
> the good size.

(Sorry, that was not finished)

You're saying ArrayList use a factor of 3/2. For performances, at least
for the theoretical point of view, you can as well choose a factor of
1.1 or 2, and the runtime performance will be similar.

But if you choose a LINEAR growth instead of EXPONENTIAL -- ie,
newLength = oldLength + constant --, then the amortized time of the
addition of an element become linear, so it will be MUCH slower.

Look at any introduction to amortized complexity analysis for more details.

Lasse Reichstein Nielsen
Guest
Posts: n/a

 08-08-2006
Luc Mercier <(E-Mail Removed)> writes:

> As long as the growth is exponential, the amortized complexity of
> adding a new element --- ie, the average on many addition of new
> elements --- is constant. That means the time required to copy the
> array into a bigger one as little influence on the runtime, compared
> to an array of the good size.

It is true that the amortized cost is constant, but if the exponent is
close to 1.0, it can be a big constant, and the copying will have a
significant influence on runtime.

For example, if you double the array each time, the amortized cost
of an insertion is three element copies (paying for its own insertion
and the first copying of itself and one existing element).

If you only grow the array by 10% each time, the amortized cost of an
insertion will be 11 element copies, i.e., almost four times slower
than for doubling (if we only count array stores as complexity, and
not, e.g., time spent in memory allocation).

/L
--
Lasse Reichstein Nielsen - http://www.velocityreviews.com/forums/(E-Mail Removed)
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'

bugbear
Guest
Posts: n/a

 08-09-2006
Chris wrote:
> This is an ill-defined question, but here goes:
>
> My app uses lots of int arrays. We usually don't know how large the
> array will ultimately be when we first start populating it. So we grow
> the array as necessary, which involves creating a new, larger array and
> copying the data over from the old one.
>
> The question is, when we create a larger array, how large should it be?
> Assume that all arrays will eventually grow to a random size, unknowable
> in advance. Assume also that we want to optimize for both memory and speed.
>
> At one extreme, we could grow each array an element at a time, that is,
> the new size would be the old size + 1. This is memory efficient, but
> slow and generates a lot of garbage. At the other extreme, we could
> allocate huge arrays at the beginning and double them each time we
> needed more space. This would be fast because we would seldom need to
> extend the arrays, but very memory inefficient.
>

you could implement the List interface over a linked list
of blocks of (say) 1000 objects each.

This structure can be grown without garbage or cpu load.

Random access on such a structure (in this instance)
space overhead is 1000 times lower than a pure list.

Clearly an iterator would be very efficient.

BugBear

Chris Uppal
Guest
Posts: n/a

 08-09-2006
Chris wrote:

> Assume that all arrays will eventually grow to a random size, unknowable
> in advance. Assume also that we want to optimize for both memory and
> speed.

Under those assumptions the problem is insoluble. Unless you have, at minimum,
some knowledge of the statistical distribution of final sizes, you cannot
optimise for both space and speed simultaneously.

-- chris

Robert Klemme
Guest
Posts: n/a

 08-09-2006
On 09.08.2006 11:36, Chris Uppal wrote:
> Chris wrote:
>
>> Assume that all arrays will eventually grow to a random size, unknowable
>> in advance. Assume also that we want to optimize for both memory and
>> speed.

>
> Under those assumptions the problem is insoluble. Unless you have, at minimum,
> some knowledge of the statistical distribution of final sizes, you cannot
> optimise for both space and speed simultaneously.

Just to throw in another idea, a BitSet could be an efficient solution
as well - but this of course depends on semantics and usage pattersn.
Just my 0.02EUR...

Kind regards

robert