Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Java (http://www.velocityreviews.com/forums/f30-java.html)
-   -   Bulk Array Element Allocation, is it faster? (http://www.velocityreviews.com/forums/t754441-bulk-array-element-allocation-is-it-faster.html)

 Jan Burse 09-25-2011 01:17 AM

Bulk Array Element Allocation, is it faster?

Dear All,

I just wonder wether modern JIT do optimize
Code of the following form:

Bla[] bla = new Bla[n];
for (int i=0; i<n; i++) {
bla[i] = new Bla();
}

When I use the above in my code, my application
spends 8800 ms in the benchmark I have.

When I then change the code to:

Bla[] bla = new Bla[n];

...

if (bla[i]==null)
bla[i] = new Bla();

..

So instead of allocating all elements at once,
I will allocate them on demand in the subsequent
flow of my application.

When I use the above in my code, my application
now suddently sends 9600 ms in the benchmark
I have.

So I wonder whether eventually the JIT has
a way to detect the bulk allocate of the first
version and transform it into something more
efficient than my lazy allocation.

Any clues?

Bye

 Eric Sosman 09-25-2011 01:37 AM

Re: Bulk Array Element Allocation, is it faster?

On 9/24/2011 9:17 PM, Jan Burse wrote:
> Dear All,
>
> I just wonder wether modern JIT do optimize
> Code of the following form:
>
> Bla[] bla = new Bla[n];
> for (int i=0; i<n; i++) {
> bla[i] = new Bla();
> }
>
> When I use the above in my code, my application
> spends 8800 ms in the benchmark I have.
>
> When I then change the code to:
>
> Bla[] bla = new Bla[n];
>
> ...
>
> if (bla[i]==null)
> bla[i] = new Bla();
>
> ..
>
> So instead of allocating all elements at once,
> I will allocate them on demand in the subsequent
> flow of my application.

This could be a win if you will actually use only a few of
the array elements, and if constructing a Bla is expensive (for
example, if it involves I/O). If Bla's are cheap to make, or if
you'll wind up using most of them anyhow, there's no obvious
reason to conditionalize.

> When I use the above in my code, my application
> now suddently sends 9600 ms in the benchmark
> I have.

Nine percent longer. But I'm highly suspicious of those nice,
round numbers: It seems an unlikely coincidence that two rather
different chunks of code should both execute in even tenths of
seconds. Are you sure you've measured what you want to measure?
Are you sure that what you want to measure is the important thing?

> So I wonder whether eventually the JIT has
> a way to detect the bulk allocate of the first
> version and transform it into something more
> efficient than my lazy allocation.

Based on your (suspect) timings, "just leave it be" would be
exactly the kind of improvement you ask for. The JIT very likely
does something like that already :-)

--
Eric Sosman
esosman@ieee-dot-org.invalid

 Roedy Green 09-25-2011 02:05 AM

Re: Bulk Array Element Allocation, is it faster?

On Sun, 25 Sep 2011 03:17:19 +0200, Jan Burse <janburse@fastmail.fm>
wrote, quoted or indirectly quoted someone who said :

>Any clues

new does two things:
1. allocates X bytes of space
2. initialises that block of space to a pattern.

A clever jit may notice that some objects are simple, and always
generate the exact same pattern. It can then replace the constructor
code with a block move.

A clever jit may notice that some objects are simple, except for a few
computed bytes. It might be able to compose a fast constructor that
does a block move then call the methods to compute the few computed
bytes.

Moving 0 to a group of bytes is even faster than a block move, since
the source is a register or the instruction itself.

So another idea is the jit could reorder the fields so that the ones
needing the same initialisation byte pattern live side by side, so you
can do it with a REP constant move.

The flat footed way to do a constructor is to zap the entire block to
0, then go field by field initialised to non-null not-0 and plop that
value in on top.

Because constructors are called so often, even a very minor
optimisation would pay off.
--
http://mindprod.com
It should not be considered an error when the user starts something
to browsers, services, editors... It is inexcusable to
punish the user by requiring some elaborate sequence to atone,
e.g. open the task editor, find and kill some processes.

 Lew 09-25-2011 06:22 AM

Re: Bulk Array Element Allocation, is it faster?

Roedy Green wrote:
> The flat footed way to do a constructor is to zap the entire block to
> 0, then go field by field initialised to non-null not-0 and plop that
> value in on top.

This "flat footed [sic] way" is what Java constructors do, and must.

Member fields are set to their default values prior to any constructor assignment, even to the same value as the default.

public class Foo
{
private Bar bar = null;
}

Foo#bar will be cleared to 'null' upon first construction, then initialized to 'null' by the explicit initialization. Both steps happen. Both steps are required to happen, and they're required to happen separately.

> Because constructors are called so often, even a very minor
> optimisation would pay off.

Perhaps, depending on what you mean by "pay off". Some have argued that the optimization achieved by omitting explicit initialization to default values is not worth the supposed reduction in clarity.

Since we all know that member fields are constructed with their default values, I don't think it's any less clear to omit their explicit (and redundant) initialization to default values.

I guess it depends on whether that's your opinion, and whether omitting the redundant initialization is an optimization that would pay off.

--
Lew

 Roedy Green 09-25-2011 07:23 AM

Re: Bulk Array Element Allocation, is it faster?

On Sat, 24 Sep 2011 23:22:45 -0700 (PDT), Lew <lewbloch@gmail.com>
wrote, quoted or indirectly quoted someone who said :

>Foo#bar will be cleared to 'null' upon first construction, then initialized to 'null' by the explicit initialization.
> Both steps happen. Both steps are required to happen, and they're required to happen separately.

I doubt that. The JLS never says exactly how something works, just
what the net effect must be. It is part of what makes it so difficult
to follow. A JLS for programmers rather than implementors might
describe a simple reference implementation, and say, "As a programmer
you can presume it works like this, but actual implementations will be
more sophisticated and may not give exactly the same behaviour as this
simple explanation. For those fine points, see the implementors JLS."

So for example it could describe how a simple GC works.

In most cases the second init to null could be dropped by a clever
optimiser because it would have the same net effect.

Of course I defer to you and Patricia on divining the meaning of the
JLS.
--
http://mindprod.com
It should not be considered an error when the user starts something
to browsers, services, editors... It is inexcusable to
punish the user by requiring some elaborate sequence to atone,
e.g. open the task editor, find and kill some processes.

 Jan Burse 09-25-2011 08:39 AM

Re: Bulk Array Element Allocation, is it faster?

Eric Sosman schrieb:
> This could be a win if you will actually use only a few of
> the array elements, and if constructing a Bla is expensive (for
> example, if it involves I/O). If Bla's are cheap to make, or if
> you'll wind up using most of them anyhow, there's no obvious
> reason to conditionalize.

The bla <init> does nothing. And yes it could
be that a bla is not needed, that is why I
invented the lazy scheme. But I see the slowdown
especially for when all the bla's are needed.

> Nine percent longer. But I'm highly suspicious of those nice,
> round numbers: It seems an unlikely coincidence that two rather
> different chunks of code should both execute in even tenths of
> seconds.

Repeated measurements give different figures,
which lead me to see an accuracy of measurement
of around 100ms, so instead of telling you that
the application needs one time 9538 ms and another
time 9622 ms etc.. I gave only the order of the
time spend rounded to hundreds.

Bye

 Jan Burse 09-25-2011 08:47 AM

Re: Bulk Array Element Allocation, is it faster?

Roedy Green schrieb:
> new does two things:
> 1. allocates X bytes of space
> 2. initialises that block of space to a pattern.

So a dumb translation of the loop would be:

1. for each i between 0 and n-1
1.1 allocates X bytes of space
1.2 call the initializer

Then I was more thinking that the loop does the
following, i.e. is cleverly compiled to:

1. allocates n*X bytes of space
2. for each i between 0 and n-1
2.1 call the initializer for location X*i

At least I would hand assembler it like this.
Saving n-1 calls to the heap allocator.

But maybe positive effect is not based on such
a clever allocation and only on what Patricia said.

compilation techniques found for JIT today, but
did not yet hit any evidence for clever loop
allocation, but it is so obvious I really it
must be done. But maybe it is not done because
of multi-threading or GC, who knows...

Bye

 Robert Klemme 09-25-2011 10:57 AM

Re: Bulk Array Element Allocation, is it faster?

On 09/25/2011 03:41 AM, Patricia Shanahan wrote:
> On 9/24/2011 6:17 PM, Jan Burse wrote:
>> Dear All,
>>
>> I just wonder wether modern JIT do optimize
>> Code of the following form:
>>
>> Bla[] bla = new Bla[n];
>> for (int i=0; i<n; i++) {
>> bla[i] = new Bla();
>> }
>>
>> When I use the above in my code, my application
>> spends 8800 ms in the benchmark I have.
>>
>> When I then change the code to:
>>
>> Bla[] bla = new Bla[n];
>>
>> ...
>>
>> if (bla[i]==null)
>> bla[i] = new Bla();
>>
>> ..
>>
>> So instead of allocating all elements at once,
>> I will allocate them on demand in the subsequent
>> flow of my application.
>>
>> When I use the above in my code, my application
>> now suddently sends 9600 ms in the benchmark
>> I have.
>>
>> So I wonder whether eventually the JIT has
>> a way to detect the bulk allocate of the first
>> version and transform it into something more
>> efficient than my lazy allocation.
>>
>> Any clues?

>
> You also need to consider the general optimization of processors in
> favor of doing efficiently those things they have done in the recent past.
>
> When you do the allocation all at once, the code and data for "new
> Bla()" is in cache on the second and subsequent calls. There may be
> cache conflicts between "new Bla()" and the actual work, leading to
> many more cache misses when you interleave them.
>
> Doing the initialization on demand may be adding an unpredictable
> conditional branch to the subsequent flow.

This and the fact that lazy initialization has concurrency issues when
used in a multi threading context (which e.g. final members do not have)
has made me use this approach less frequently. Also, for short lived
objects in total it might be much more efficient to just allocate the
object even if it is not used because it won't survive new generation
anyway. I think the lazy init idiom only really pays off if
construction is expensive (e.g. because it involves IO or time consuming
calculations). In all other cases it's better to just unconditionally
create and let GC work. And because of improvements in JVM technology
the balance has moved a lot away from the lazy side because allocation
and deallocation overhead became smaller than in earlier Java versions.

Kind regards

robert

 Jan Burse 09-25-2011 11:38 AM

Re: Bulk Array Element Allocation, is it faster?

Robert Klemme schrieb:
> On 09/25/2011 03:41 AM, Patricia Shanahan wrote:
>> On 9/24/2011 6:17 PM, Jan Burse wrote:
>>> Dear All,
>>>
>>> I just wonder wether modern JIT do optimize
>>> Code of the following form:
>>>
>>> Bla[] bla = new Bla[n];
>>> for (int i=0; i<n; i++) {
>>> bla[i] = new Bla();
>>> }
>>>
>>> When I use the above in my code, my application
>>> spends 8800 ms in the benchmark I have.
>>>
>>> When I then change the code to:
>>>
>>> Bla[] bla = new Bla[n];
>>>
>>> ...
>>>
>>> if (bla[i]==null)
>>> bla[i] = new Bla();
>>>
>>> ..
>>>
>>> So instead of allocating all elements at once,
>>> I will allocate them on demand in the subsequent
>>> flow of my application.
>>>
>>> When I use the above in my code, my application
>>> now suddently sends 9600 ms in the benchmark
>>> I have.
>>>
>>> So I wonder whether eventually the JIT has
>>> a way to detect the bulk allocate of the first
>>> version and transform it into something more
>>> efficient than my lazy allocation.
>>>
>>> Any clues?

>>
>> You also need to consider the general optimization of processors in
>> favor of doing efficiently those things they have done in the recent
>> past.
>>
>> When you do the allocation all at once, the code and data for "new
>> Bla()" is in cache on the second and subsequent calls. There may be
>> cache conflicts between "new Bla()" and the actual work, leading to
>> many more cache misses when you interleave them.
>>
>> Doing the initialization on demand may be adding an unpredictable
>> conditional branch to the subsequent flow.

>
> This and the fact that lazy initialization has concurrency issues when
> used in a multi threading context (which e.g. final members do not have)
> has made me use this approach less frequently. Also, for short lived
> objects in total it might be much more efficient to just allocate the
> object even if it is not used because it won't survive new generation
> anyway. I think the lazy init idiom only really pays off if construction
> is expensive (e.g. because it involves IO or time consuming
> calculations). In all other cases it's better to just unconditionally
> create and let GC work. And because of improvements in JVM technology
> the balance has moved a lot away from the lazy side because allocation
> and deallocation overhead became smaller than in earlier Java versions.
>
> Kind regards
>
> robert

Yes, really seems so. Looks that the infinite heap idea
works (notion borrowed from a talk by Cliff Click found
on the net, should indicate that we just do this, allocate
and don't care much).

does not so much good in that it saves allocates. I got the
following figures for the application at hand:

Bulk: 84'393'262 allocates
Lazy: 81'662'843 allocates

So the application does not have many exit points in the flow
following the array creation, except for the last exit point
when anyway all objects were needed.

But still I have some doubts! The array I allocate are only
4 elements long or so? Why is there still such a big
difference between allocating in bulk only 4 elements and
doing them one after the other?

Overhead by the lazy detection itself? I doubt this also
since the array is anyway accessed, and the lazy check
is a small null pointer check.

I am still puzzled.

Bye

 Eric Sosman 09-25-2011 12:07 PM

Re: Bulk Array Element Allocation, is it faster?

On 9/25/2011 4:47 AM, Jan Burse wrote:
> Roedy Green schrieb:
>> new does two things:
>> 1. allocates X bytes of space
>> 2. initialises that block of space to a pattern.

>
> So a dumb translation of the loop would be:
>
> 1. for each i between 0 and n-1
> 1.1 allocates X bytes of space
> 1.2 call the initializer
>
> Then I was more thinking that the loop does the
> following, i.e. is cleverly compiled to:
>
> 1. allocates n*X bytes of space
> 2. for each i between 0 and n-1
> 2.1 call the initializer for location X*i
>
> At least I would hand assembler it like this.
> Saving n-1 calls to the heap allocator.

Since the heap allocator probably consists of one compare-and-
nextFreeAddress pointer"), the savings are unlikely to be large.
This isn't malloc(), you know.

> compilation techniques found for JIT today, but
> did not yet hit any evidence for clever loop
> allocation, but it is so obvious I really it
> must be done. But maybe it is not done because
> of multi-threading or GC, who knows...

Note that the semantics of the bulk and lazy allocations
in your original example are quite different. An "optimization"
that changes the meaning of the code is only a benefit if the
code was wrong to begin with. ;-)

--
Eric Sosman
esosman@ieee-dot-org.invalid

All times are GMT. The time now is 05:58 AM.