Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Bulk Array Element Allocation, is it faster?

Reply
Thread Tools

Bulk Array Element Allocation, is it faster?

 
 
Robert Klemme
Guest
Posts: n/a
 
      09-25-2011
On 09/25/2011 01:38 PM, Jan Burse wrote:
> Robert Klemme schrieb:
>> On 09/25/2011 03:41 AM, Patricia Shanahan wrote:
>>> On 9/24/2011 6:17 PM, Jan Burse wrote:
>>>> Dear All,
>>>>
>>>> I just wonder wether modern JIT do optimize
>>>> Code of the following form:
>>>>
>>>> Bla[] bla = new Bla[n];
>>>> for (int i=0; i<n; i++) {
>>>> bla[i] = new Bla();
>>>> }
>>>>
>>>> When I use the above in my code, my application
>>>> spends 8800 ms in the benchmark I have.
>>>>
>>>> When I then change the code to:
>>>>
>>>> Bla[] bla = new Bla[n];
>>>>
>>>> ...
>>>>
>>>> if (bla[i]==null)
>>>> bla[i] = new Bla();
>>>>
>>>> ..
>>>>
>>>> So instead of allocating all elements at once,
>>>> I will allocate them on demand in the subsequent
>>>> flow of my application.
>>>>
>>>> When I use the above in my code, my application
>>>> now suddently sends 9600 ms in the benchmark
>>>> I have.
>>>>
>>>> So I wonder whether eventually the JIT has
>>>> a way to detect the bulk allocate of the first
>>>> version and transform it into something more
>>>> efficient than my lazy allocation.
>>>>
>>>> Any clues?
>>>
>>> You also need to consider the general optimization of processors in
>>> favor of doing efficiently those things they have done in the recent
>>> past.
>>>
>>> When you do the allocation all at once, the code and data for "new
>>> Bla()" is in cache on the second and subsequent calls. There may be
>>> cache conflicts between "new Bla()" and the actual work, leading to
>>> many more cache misses when you interleave them.
>>>
>>> Doing the initialization on demand may be adding an unpredictable
>>> conditional branch to the subsequent flow.

>>
>> This and the fact that lazy initialization has concurrency issues when
>> used in a multi threading context (which e.g. final members do not have)
>> has made me use this approach less frequently. Also, for short lived
>> objects in total it might be much more efficient to just allocate the
>> object even if it is not used because it won't survive new generation
>> anyway. I think the lazy init idiom only really pays off if construction
>> is expensive (e.g. because it involves IO or time consuming
>> calculations). In all other cases it's better to just unconditionally
>> create and let GC work. And because of improvements in JVM technology
>> the balance has moved a lot away from the lazy side because allocation
>> and deallocation overhead became smaller than in earlier Java versions.
>>
>> Kind regards
>>
>> robert

>
> Yes, really seems so. Looks that the infinite heap idea
> works (notion borrowed from a talk by Cliff Click found
> on the net, should indicate that we just do this, allocate
> and don't care much).
>
> I made some additional benchmarks, in the present case the lazy
> does not so much good in that it saves allocates. I got the
> following figures for the application at hand:
>
> Bulk: 84'393'262 allocates
> Lazy: 81'662'843 allocates
>
> So the application does not have many exit points in the flow
> following the array creation, except for the last exit point
> when anyway all objects were needed.
>
> But still I have some doubts! The array I allocate are only
> 4 elements long or so? Why is there still such a big
> difference between allocating in bulk only 4 elements and
> doing them one after the other?
>
> Overhead by the lazy detection itself? I doubt this also
> since the array is anyway accessed, and the lazy check
> is a small null pointer check.


Yes, but the cost is not in the check but in the branching on processor
level (see what Patricia wrote).

Kind regards

robert
 
Reply With Quote
 
 
 
 
Jan Burse
Guest
Posts: n/a
 
      09-25-2011
Robert Klemme schrieb:
> Yes, but the cost is not in the check but in the branching on processor
> level (see what Patricia wrote).


Depends on the processor and on the branch. If
new is just heap -= size, what some papers suggest,
then it might not be important.

But if new is much more, then sure the branch
interrupts the normal code flow so much that
instruction piplining gets out of sync. And
the speed gain by instruction overlapping
is lost.

But my hypothesis is more that something
algorithmically on a higher level happens than
something on the lower hardware level.

So I also found something about "Lock
coarsening"(*), so if the new needs some lock
this lock could be aquired before the initialization
loop and released after the initialization
loop. So that:

Bla[] bla = new Bla[n];
for (int i=0; i<n; i++) {
bla[i] = new Bla();
}

Would only need 1 locking instruction pair,
where as:

Bla[] bla = new Bla[n];

...

if (bla[i]==null)
bla[i] = new Bla();

..

Would need n locking instruction pairs. But I
would still need some confirmation that JITs
are able to do such an optimization on a higher
level in the present case.

I am also not sure whether the bounded for
counts as a looping control flow, so that "Lock
coarsening" would not be applicable. But
I guess it can be deemed non looping, so that
the technique would be applicable.

Currently I am planning to change the loop
to something like that:

Bla[] bla = new Bla[n];
for (int i=0; i<n; i++) {
bla[i] = createBla();
}

So that the JIT has less information on what
the loop is about, and to do some new
measurements. To be fair I would also use
createBla() in the lazy scenario. Lets see
what happens.

Best Regards

(*)
http://java.sun.com/performance/refe...nce.html#2.1.2
 
Reply With Quote
 
 
 
 
Jan Burse
Guest
Posts: n/a
 
      09-25-2011
Jan Burse schrieb:
> Currently I am planning to change the loop
> to something like that:
>
> Bla[] bla = new Bla[n];
> for (int i=0; i<n; i++) {
> bla[i] = createBla();
> }
>
> So that the JIT has less information on what
> the loop is about, and to do some new
> measurements. To be fair I would also use
> createBla() in the lazy scenario. Lets see
> what happens.


Maybe there is a bug somewhere else in the
application that leads to the performance loss.
There are also points in the application where
some of the bla elements are forcefully set
to null so as to release Bla objects:

...
bla[j] = null;
...

I am not sure whether these releases work
the same in the bulk and the lazy version of
the code. Need to check first.

So measurements have already shown that the
release gives some gain, measurements where
16000 ms vs. 9600 ms. The release condition
is more complicated than the lazy new
condition, but the overhead is compensated
the overall gain.

Lets see.

Best Regards
 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      09-25-2011
On 9/25/2011 10:04 AM, Jan Burse wrote:
> Robert Klemme schrieb:
>> Yes, but the cost is not in the check but in the branching on processor
>> level (see what Patricia wrote).

>
> Depends on the processor and on the branch. If
> new is just heap -= size, what some papers suggest,
> then it might not be important.
>
> But if new is much more, then sure the branch
> interrupts the normal code flow so much that
> instruction piplining gets out of sync. And
> the speed gain by instruction overlapping
> is lost.
>
> But my hypothesis is more that something
> algorithmically on a higher level happens than
> something on the lower hardware level.
>
> So I also found something about "Lock
> coarsening"(*), so if the new needs some lock
> this lock could be aquired before the initialization
> loop and released after the initialization
> loop. [...]


I'm speculating almost as wildly as you are, but I strongly
doubt that a lock is acquired. Object creation happens so often
that I'm sure the JVM implementors will use something like compare-
and-swap on any platform that provides it (which I think means "all"
nowadays).

Even the check for "Should I wait for the garbage collector to
finish?" uses no lock, on at least some platforms. Their JVM's
dedicate an entire memory page to serve as a flag, whose state is
not recorded in its content but in its MMU protection bits. To see
if it's safe to allocate, an allocating thread just stores something
in the special page; if the store works allocation is safe. If GC
has raised its STOP sign, the page is write-protected and the store
generates a hardware trap -- very high overhead, to be sure, but
extremely low (not even a conditional branch!) in the common case.

> Would need n locking instruction pairs. But I
> would still need some confirmation that JITs
> are able to do such an optimization on a higher
> level in the present case.


Again, I point out that the bulk and lazy variants do not do
the same thing. Consider, for example

class Bla {
private static int master_seqno = 0;
public final int seqno = ++master_seqno;
}

Observe that the value of bla[42].seqno differs between the two
variants; it would therefore be an error to "optimize" either by
transforming it into the other.

--
Eric Sosman
d
 
Reply With Quote
 
Jan Burse
Guest
Posts: n/a
 
      09-25-2011
Eric Sosman schrieb:
> Again, I point out that the bulk and lazy variants do not do
> the same thing. Consider, for example
>
> class Bla {
> private static int master_seqno = 0;
> public final int seqno = ++master_seqno;
> }
>
> Observe that the value of bla[42].seqno differs between the two
> variants; it would therefore be an error to "optimize" either by
> transforming it into the other.


Not really. You could only see a difference in the order
that Bla() gets a number. Since in the bulk variant
the Bla() will be allocated from 0 .. n-1, but the lazy
might allocate in an arbitrary order, which is the case
in my application.

So if your application does not depend on the order
the seqno's are created there is no functional problem.
Main question I have here is not about relative
correctness of bulk versus lazy. But why bulk is
counterintuitively much faster?

Since the bulk is much faster I am assuming some
optimization is happening in this particular loop:

Bla[] bla = new Bla[n];
for (int i=0; i<n; i++) {
bla[i] = new Bla();
}

Actually this loop is something that shocked me
when I started working with Java many years ago.
What there is no way to make a "new" of an array
with all its elements?

This is not seen if one works with int[] or
double[] arrays. But as soon as you work with some
other array, either a higher dimensional of int[]
or double[] or an array of some class X. You are
responsible for populating the elements.

The advantage is that you can create diagonal
matrices, sparse arrays, polymorphic data in
arrays, etc.. But I have the feeling that a loop
as above is happening in many many applications
exactly because initializing the elements of an
array is left to the application programmer.

So I am really focusing on this array init, and
asking what it does under the hood, and the lazy
is only a reference for the performance. I am
not interessted in a general theory between going
from bulk to lazy and back and forth. Forget
about the lazy and explain the bulk!

Bye

 
Reply With Quote
 
Andreas Leitgeb
Guest
Posts: n/a
 
      09-25-2011
Jan Burse <> wrote:
> Main question I have here is not about relative
> correctness of bulk versus lazy. But why bulk is
> counterintuitively much faster?
>
> Since the bulk is much faster I am assuming ...


I understand perfectly well, that you *were* resorting
to those assumptions when you first noticed the effect,
but after Patricia's and others' answers I wouldn't
expect those assumptions to be still necessary to
explain your observations.

 
Reply With Quote
 
Jan Burse
Guest
Posts: n/a
 
      09-25-2011
Eric Sosman schrieb:
> Note that the semantics of the bulk and lazy allocations
> in your original example are quite different. An "optimization"
> that changes the meaning of the code is only a benefit if the
> code was wrong to begin with.


What I do in the code has nothing to do with my question. My
question circles only around the following code fragment:

Bla[] bla = new Bla[n];
for (int i=0; i<n; i++) {
bla[i] = new Bla();
}

And what the JIT does. And not what I am doing in a comparison
with lazy. That is why I started my post with:

"I just wonder wether modern JIT do optimize
Code of the following form:"

Please focus on that.

Bye
 
Reply With Quote
 
Jan Burse
Guest
Posts: n/a
 
      09-25-2011
Jan Burse schrieb:
> "I just wonder wether modern JIT do optimize
> Code of the following form:"
>
> Please focus on that.


Focus is also on JIT and not on JVM. So more
on compilation techniques than on execution
techniques (where draw the line?). Answer
could be balantly that JIT does nothing, and
then it must be something in the JVM.

Andreas Leitgreb;
> I understand perfectly well, that you *were* resorting
> to those assumptions when you first noticed the effect,
> but after Patricia's and others' answers I wouldn't
> expect those assumptions to be still necessary to
> explain your observations.


But when the JIT would do something, then it
is not necessary that the JVM does something,
and would be a technique that also works on
old CPUs without pipelining super cache
whatever. I did not not yet have time to test
the phaenomen on old CPU.

Probably when I switch to old CPUs and also
old JDKs I will also end up with old JITs.
Anyway would be interested into JIT insights
old and new.

Bye


 
Reply With Quote
 
Lew
Guest
Posts: n/a
 
      09-25-2011
Jan Burse wrote:
> Actually this loop is something that shocked me
> when I started working with Java many years ago.
> What there is no way to make a "new" of an array
> with all its elements?
>
> This is not seen if one works with int[] or
> double[] arrays. But as soon as you work with some
> other array, either a higher dimensional of int[]
> or double[] or an array of some class X. You are
> responsible for populating the elements.


This is equally true of primitive arrays. Why do you claim otherwise?

The JLS, §15.10.1, states, "... if a single DimExpr appears, a single-dimensional array is created of the specified length, and each component of the array is initialized to its default value (§4.12.5)." This is true whether the array is of a primitive or reference type.

In either case, you are responsible for populating any non-default values.

--
Lew
 
Reply With Quote
 
Lew
Guest
Posts: n/a
 
      09-25-2011
Roedy Green wrote:
> Lew wrote, quoted or indirectly quoted someone who said :
>> Foo#bar will be cleared to 'null' upon first construction, then initialized to 'null' by the explicit initialization.
>> Both steps happen. Both steps are required to happen, and they're required to happen separately.

>
> I doubt that. The JLS never says exactly how something works, just


> what the net effect must be. It is part of what makes it so difficult


Let me settle your doubt, then, because the JLS does say exactly that.

JLS §12.5:
http://java.sun.com/docs/books/jls/t...tion.html#12.5
"Whenever a new class instance is created, memory space is allocated for itwith room for all the instance variables declared in the class type and all the instance variables declared in each superclass of the class type, including all the instance variables that may be hidden (§8.3). If there is not sufficient space available to allocate memory for the object, then creation of the class instance completes abruptly with an OutOfMemoryError. Otherwise, all the instance variables in the new object, including those declared in superclasses, are initialized to their default values (§4.12.5).
"Just before a reference to the newly created object is returned as the result, the indicated constructor is processed to initialize the new object using the following procedure:
"...
"4. Execute the instance initializers and instance variable initializers for this class, assigning the values of instance variable initializers to thecorresponding instance variables, in the left-to-right order in which theyappear textually in the source code for the class. If execution of any of these initializers results in an exception, then no further initializers are processed and this procedure completes abruptly with that same exception.Otherwise, continue with step 5. (In some early implementations, the compiler incorrectly omitted the code to initialize a field if the field initializer expression was a constant expression whose value was equal to the default initialization value for its type.)"


> to follow. A JLS for programmers rather than implementors might
> describe a simple reference implementation, and say, "As a programmer
> you can presume it works like this, but actual implementations will be
> more sophisticated and may not give exactly the same behaviour as this
> simple explanation. For those fine points, see the implementors JLS."


The JLS is extremely explicit about the order and result of steps to load, allocate and initialize classes and instances. In this particular, non-hypothetical case, it requires members to get their default values, then separately to process explicit initializers, even (and especially) if the initialization is to the same default value.


> So for example it could describe how a simple GC works.
>
> In most cases the second init to null could be dropped by a clever
> optimiser because it would have the same net effect.


Except that that is, as you see from the cited passage, explicitly forbidden.

> Of course I defer to you and Patricia on divining the meaning of the
> JLS.


You don't need to. Just read the cited passage for yourself.

--
Lew
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
find if an array has any element present in another array Charanya Nagarajan Ruby 6 04-30-2009 11:14 AM
Once again pointer to first element vs pointer to array cast topointer to element Szabolcs Borsanyi C Programming 6 05-23-2008 11:06 AM
how to Update/insert an xml element's text----> (<element>text</element>) HANM XML 2 01-29-2008 03:31 PM
If given 1 element in an array of textboxes... find which element number it is \A_Michigan_User\ Javascript 4 11-16-2007 12:58 PM
How to delete array element and add to previous element Al Cholic Ruby 18 07-28-2007 04:56 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57