Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Bulk Array Element Allocation, is it faster?

Reply
Thread Tools

Bulk Array Element Allocation, is it faster?

 
 
Jan Burse
Guest
Posts: n/a
 
      09-25-2011
Lew schrieb:
> This is equally true of primitive arrays. Why do you claim otherwise?


No, I don't claim otherwise. new int[] and new double[]
would also need treatment when the values 0 and 0.0
are not appropriate.

But you never run into a null-pointer exception with
primitive arrays, so they are a little bit more ready
made than non-primitive arrays.

Bye
 
Reply With Quote
 
 
 
 
Jan Burse
Guest
Posts: n/a
 
      09-25-2011
Jan Burse schrieb:
> Lew schrieb:
>> This is equally true of primitive arrays. Why do you claim otherwise?

>
> No, I don't claim otherwise. new int[] and new double[]
> would also need treatment when the values 0 and 0.0
> are not appropriate.
>
> But you never run into a null-pointer exception with
> primitive arrays, so they are a little bit more ready
> made than non-primitive arrays.
>
> Bye


Just compare with C array allocation of a bunch of
fixed size records. In C you get the records. In Java
you only get the array but no records in it.

 
Reply With Quote
 
 
 
 
Lew
Guest
Posts: n/a
 
      09-25-2011
On Sunday, September 25, 2011 10:21:42 AM UTC-7, Jan Burse wrote:
> Eric Sosman schrieb:
> > Note that the semantics of the bulk and lazy allocations
> > in your original example are quite different. An "optimization"
> > that changes the meaning of the code is only a benefit if the
> > code was wrong to begin with.

>
> What I do in the code has nothing to do with my question. My


It has a great deal to do with your question.

> question circles only around the following code fragment:
>
> Bla[] bla = new Bla[n];


Here the system does allocate, as you plaintively request, "n*X bytes of space", where "X" is the size of a pointer.

> for (int i=0; i<n; i++) {
> bla[i] = new Bla();
> }


How would you imagine the JIT would optimize this without taking semantics into account? Each 'Bla' is required to be independently GC-able, you know.. The semantics of 'Bla[]' is not a block of instances; it's a block of references.

How would you bulk allocate a block of references each to a different reference? About the only way is to add a fixed offset to the value pushed intoeach successive array element, but that's already what you are doing with 'new' anyway, so you aren't "optimizing" by doing the same thing.

If the 'Bla' constructor is heavyweight, the construction of your n 'Bla' instances will far outweigh any slight overhead in calculating that offset increase for each array element.

I just don't see how you expect to bulk-load n different pointers into thatarray. Could you explain that in more detail, please?

> And what the JIT does. And not what I am doing in a comparison
> with lazy. That is why I started my post with:
>
> "I just wonder wether [sic] modern JIT [sic] do optimize
> Code [sic] of the following form:"
>
> Please focus on that.


Aye, aye, focusing on that, Captain!

The answer is "nothing", because the semantics of the operation are such that the current mechanism is already quite close to optimal. This is what people have been explaining to you, that you dismissed as irrelevant.

--
Lew
 
Reply With Quote
 
Joshua Cranmer
Guest
Posts: n/a
 
      09-25-2011
On 9/25/2011 2:23 AM, Roedy Green wrote:
> On Sat, 24 Sep 2011 23:22:45 -0700 (PDT), Lew<(E-Mail Removed)>
> wrote, quoted or indirectly quoted someone who said :
>
>> Foo#bar will be cleared to 'null' upon first construction, then initialized to 'null' by the explicit initialization.
>> Both steps happen. Both steps are required to happen, and they're required to happen separately.

>
> I doubt that. The JLS never says exactly how something works, just
> what the net effect must be.


The JLS clearly states the full observational effects of any action,
particularly in the realm of object initialization and the memory model.

> It is part of what makes it so difficult to follow.


The JLS is not difficult to follow. ISO C++11 is difficult to follow.
Well, the section on generic type inference could use a few more
examples to be easier to follow, but even that is relatively
straightforward.

> A JLS for programmers rather than implementors might
> describe a simple reference implementation, and say, "As a programmer
> you can presume it works like this, but actual implementations will be
> more sophisticated and may not give exactly the same behaviour as this
> simple explanation. For those fine points, see the implementors JLS."


As I stated earlier, the JLS states full observational effects. If the
underlying behavior is not the same (e.g., using escape analysis to
change heap allocations to stack allocations), it would require advanced
introspective techniques to observe this which are well beyond the
semantics of Java itself.

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth
 
Reply With Quote
 
Jan Burse
Guest
Posts: n/a
 
      09-25-2011
Lew schrieb:
> How would you imagine the JIT would optimize this
> without taking semantics into account? Each 'Bla'
> is required to be independently GC-able, you know.


In my posts I mentioned two possible optimizations.
Optimization 1: Moveing the locking outside of the
loop, Optimization 2: Allocating at once n*X where
X is the space for the Bla record, not the space
for the Bla point.

Since after allocation, we have the assignment
bla[i] = new Bla(), the object cannot escape as
long as bla does not escape, so there should be
not much problem with GC, but I am not 100% sure
what the problems with GC should be.

But anyway, let speak the figures, and lets stop
musing too much. I did some testing and it shows
that the 64bit can do the "speed up" whereby the
32bit cannot do it:

OS JDK Arch Bulk Lazy Delta %
Win 1.6 64bit 8'771 9'805 11.8%
Win 1.6 32bit 14'587 14'744 1.1%
Win 1.5 32bit 17'139 17'405 1.6%
Mac 1.6 64bit 11'003 12'363 12.4%
Unix 1.6 32bit 26'517 26'858 1.3%

Still this leaves open the question whether the
64-bit JIT is clever or the 64-bit JVM is clever
or the 64-bit CPU is clever.

Definitely it seems that the 32-bit is not clever,
there we all see a small overhead for the lazy,
which I interpret as the additional checks, eventually
done repeatedly, for the lazy.

> The answer is "nothing", because the semantics
> of the operation are such that the current
> mechanism is already quite close to optimal.
> This is what people have been explaining to you,
> that you dismissed as irrelevant.


Where do you see "nothing" in the above figures?

Bye
 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      09-25-2011
On 9/25/2011 11:14 AM, Jan Burse wrote:
> Eric Sosman schrieb:
>> Again, I point out that the bulk and lazy variants do not do
>> the same thing. Consider, for example
>>
>> class Bla {
>> private static int master_seqno = 0;
>> public final int seqno = ++master_seqno;
>> }
>>
>> Observe that the value of bla[42].seqno differs between the two
>> variants; it would therefore be an error to "optimize" either by
>> transforming it into the other.

>
> Not really. You could only see a difference in the order
> that Bla() gets a number.


You'll also see a difference in *which* number bla[42] gets.
In the bulk initialization, bla[42].seqno is 43, always. With
lazy initialization, it can be anything at all between 1 and n
inclusive. QED, and your "not really" makes no sense to me.

There are lots of other effects Bla() construction might have
that would turn out differently in bulk or lazy initialization.
If the constructor writes a message to System.out, all the messages
will appear together in bulk initialization, but will be scattered
all over the place with lazy initialization. If the constructor
throws an exception -- even a seldom-caught RuntimeException --
the exceptions will appear in a different order w.r.t. other actions
in the program. The two code sequences are irreconcilably different.

> So if your application does not depend on the order
> the seqno's are created there is no functional problem.


From the application's point of view perhaps not. But Java and
the JVM can't possibly know that; they follow the orders you give.
You say "Assign this seqno (not that one) to the new Bla, and throw
(or don't throw) this exception, write (or don't write) this output,
and do (or don't do) all these things in their proper order w.r.t.
all my other commands." Concrete example: In a sense you do not
care what value Math.random() returns, but Java is contractually
bound to deliver a specific value under specific circumstances, and
cannot violate its contract even if you are in a forgiving mood.

> Main question I have here is not about relative
> correctness of bulk versus lazy. But why bulk is
> counterintuitively much faster?


Nobody knows, because you've shown no code. My first guess
(already hinted at) is that your timing is faulty; it is notoriously
hard to get good timings even from statically-bound languages these
days, and even more difficult with dynamically-bound partially-
interpreted-partially-compiled garbage-collected multi-threaded
languages like Java. I'm sticking with that guess until you show
some evidence to refute it.

> Since the bulk is much faster I am assuming [...]


Yeah.

> Actually this loop is something that shocked me
> when I started working with Java many years ago.
> What there is no way to make a "new" of an array
> with all its elements?


That's another topic, and one that's been hashed over several
times in other threads. The principal reason, I think, is that
it would so seldom be useful to create an array of N objects *all*
created by the no-args constructor: Who needs N copies of "the
same" thing? And what if the class has no no-args constructor?

Okay, sure, it could happen occasionally. But often enough
to justify a special feature in the language, with special rules
for "What if I want to pass arguments to the constructors?" or
"What if the constructor is inaccessible and I'm supposed to use
a factory method?" and "What if I *don't* want objects created
yet?" and so on, and so on?

Challenge: Look through your own code, right now, and count
the number of times you have actually filled an array with references
to N identically-constructed-with-no-args objects. In other words,
count the number of times your own actual code would have been able
to take advantage of such a feature if Java provided it.

> This is not seen if one works with int[] or
> double[] arrays. But as soon as you work with some
> other array, either a higher dimensional of int[]
> or double[] or an array of some class X. You are
> responsible for populating the elements.


How often do you create an int[] and leave all its elements
alone, still holding their original zeroes? (On purpose, I mean.)

In any case, Java *does* populate the array for you: With
null references. Do you have a prejudice against `null'?

> I am
> not interessted in a general theory between going
> from bulk to lazy and back and forth. Forget
> about the lazy and explain the bulk!


Exhibit some actual code, so people can stop guessing about
what you've done.

--
Eric Sosman
http://www.velocityreviews.com/forums/(E-Mail Removed)d
 
Reply With Quote
 
Jan Burse
Guest
Posts: n/a
 
      09-25-2011
Jan Burse schrieb:
>
> OS JDK Arch Bulk Lazy Delta %
> Win 1.6 64bit 8'771 9'805 11.8%
> Win 1.6 32bit 14'587 14'744 1.1%
> Win 1.5 32bit 17'139 17'405 1.6%
> Mac 1.6 64bit 11'003 12'363 12.4%
> Unix 1.6 32bit 26'517 26'858 1.3%


Update:

Win 1.7 64bit 8'159 8'975 10.0%

Same machine like 1.6 64bit testing was done,
just another JDK, the newer 1.7. Advantage of
bulk over lazy slightly less pronounced.

But application itself 10% faster again (sic!).

Bye
 
Reply With Quote
 
Jan Burse
Guest
Posts: n/a
 
      09-25-2011
Eric Sosman schrieb:
> Challenge: Look through your own code, right now, and count
> the number of times you have actually filled an array with references
> to N identically-constructed-with-no-args objects.


> Exhibit some actual code, so people can stop guessing about
> what you've done.


I am not asking people to guess what I have done. I
am asking about whether JIT can optimize loops with
new in it. The new in it needs not to be a argument
less constructor. It could be also a constructor
with arguments.

All the optimizations that came to my mind easily
work also with argument based constructors. For
example moving the lock outside of the loop should
also work with argument based constructors in most
cases:

for (int i=0; i<n; i++) {
lock heap
bla[i] = new Bla(i);
unlock heap
}

Is the same as:

lock heap
for (int i=0; i<n; i++) {
bla[i] = new Bla(i);
}
unlock heap

If <init> of bla does not much. And the other optimization
with the memory allocate is also not impacted much by
argument based constructors. And it can be combined with
the heap locking, and we can move the lock before
the loop, and will lock less time:

for (int i=0; i<n; i++) {
lock heap
p = allocate size X
bla[i] = init p;
unlock heap
}

Is the same as:

lock heap
p = allocate size n*X
unlock heap
for (int i=0; i<n; i++) {
bla[i] = init p;
p += x;
}

I hope you see now what I am heading for. But I must
disappoint you, if a compiler would be able to optimize
constructors with arguments, maybe not in all situations
but to a great degree, I will not have use for it in
my application, since my current problem is argument
less.

But I would be of course also happy to hear something
about the more general problem of constructors with
arguments and what the JIT would do for loops. Since
it would then be very easy for me to specialize the
insight to my argument less problem.

I do not exclude that JITs can also deal with
constructors that have arguments in loops.

Bye
 
Reply With Quote
 
Lew
Guest
Posts: n/a
 
      09-25-2011
On Sunday, September 25, 2011 12:25:18 PM UTC-7, Jan Burse wrote:
> Lew schrieb:
> > How would you imagine the JIT would optimize this
> > without taking semantics into account? Each 'Bla'
> > is required to be independently GC-able, you know.

>
> In my posts I mentioned two possible optimizations.
> Optimization 1: Moveing the locking outside of the
> loop, Optimization 2: Allocating at once n*X where


That's not an optimization because things might need to happen in between loop cycles, so you can't "lock the heap" that long and call it an "optimization".

Semantically an array of n references is an array of n separate references.You can't count on the memory allocations to be contiguous, shouldn't have to count on that, and you certainly can't count on each reference becoming eligible for GC at the same time. GC in turn moves things around anyway,so any "benefit" of contiguous allocation will be gone quite quickly, if it ever existed in the first place.

And the point you ignore is that the "benefit" of contiguous allocation is questionable, let alone of your rather bizarre suggestion to move the "locking of the heap" outside the loop. Each 'new' just bumps up the heap pointer, so it's not much faster to bump it once than n times, in the grand scheme of things. Not when your tiny advantage is quickly eaten up by circumstances immediately afterward anyway.


> X is the space for the Bla record, not the space
> for the Bla point.
>


But that's the thing that's semantically different!

You can't "optimize" allocation of n references to also create a block of ninstances. The optimizer cannot know that that is the intended use of thearray. You can't "optimize" the allocation of n instances consecutively either, because you have to know a whole lot about what those n constructorswill try to do, including possibly allocate heap themselves or except out prematurely. For the few cases where block allocation *might* provide negligible speedup, it's not worth the analysis effort to determine that this one time is one of those rare exceptions when allocation might be sped up enough for anyone to notice.

As people keep pointing out in this conversation.

> Since after allocation, we have the assignment
> bla[i] = new Bla(), the object cannot escape as
> long as bla does not escape, so there should be
> not much problem with GC, but I am not 100% sure
> what the problems with GC should be.


What do you even mean by this? What do you mean by the object "escaping"?

I don't know about "problems" with GC, but you cannot be sure that the individual instances pointed to by the 'bla[i]' pointers will be descoped at the same time. Ergo you cannot know ahead of time when they will be eligiblefor GC. Instances that survive GC will be moved to contiguous memory blocks, but not their original ones. Whatever negligible benefit you might have gotten, but more likely did not, from contiguous allocation of the 'Bla' instances will be wiped out.

> But anyway, let speak the figures, and lets stop
> musing too much. I did some testing and it shows
> that the 64bit can do the "speed up" whereby the
> 32bit cannot do it:
>
> OS JDK Arch Bulk Lazy Delta %
> Win 1.6 64bit 8'771 9'805 11.8%
> Win 1.6 32bit 14'587 14'744 1.1%
> Win 1.5 32bit 17'139 17'405 1.6%
> Mac 1.6 64bit 11'003 12'363 12.4%
> Unix 1.6 32bit 26'517 26'858 1.3%
>
> Still this leaves open the question whether the
> 64-bit JIT is clever or the 64-bit JVM is clever
> or the 64-bit CPU is clever.
>
> Definitely it seems that the 32-bit is not clever,
> there we all see a small overhead for the lazy,
> which I interpret as the additional checks, eventually
> done repeatedly, for the lazy.
>
>> The answer is "nothing", because the semantics
>> of the operation are such that the current
>> mechanism is already quite close to optimal.
>> This is what people have been explaining to you,
>> that you dismissed as irrelevant.

>
> Where do you see "nothing" in the above figures?


Everywhere. You have some questionable method of timing something that youdo not describe, let alone with any precision, using doubtful methodology and suspect reasoning without reference to experimental error or confidenceanalysis. Your numbers literally tell me nothing.

--
Lew
 
Reply With Quote
 
Jan Burse
Guest
Posts: n/a
 
      09-25-2011
Lew schrieb:
> You can't "optimize" allocation of n references
> to also create a block of n instances. The optimizer
> cannot know that that is the intended use of the array.


The following optimization works independent of
the use of the array. You can even nullify an
array element or set it to a newly created object.
So going from an unoptimized code:

for (int i=0; i<n; i++) {
lock heap
bla[i] = new Bla(i);
unlock heap
}

To the following:


lock heap
p = allocate size n*X
unlock heap
for (int i=0; i<n; i++) {
bla[i] = init p;
p += x;
}

Shouldn't be a problem for a decent compilers.
Various compilers do much more with loops. Why
do you doubt that this optimization is possible.

> For the few cases where block allocation
> *might* provide negligible speedup.


Oh, you don't really doubt that it is possible.
You rather doubt that it is worth. Well I
can tell you, in my application, the allocation
of small arrays with preallocated empty constructor
less initialized objects is done over and over.

I have recently run visual vm. It is really
what is mostly happening in the app during
the process of solving a problem.

And it seems that since it is so dominant any
changes in the way this is done are directly
seen in the timings. Here are the timings
again:

OS JDK Arch Bulk Lazy Delta %
Win 1.7 64bit 8'159 8'975 10.0%
Win 1.6 64bit 8'771 9'805 11.8%
Win 1.6 32bit 14'587 14'744 1.1%
Win 1.5 32bit 17'139 17'405 1.6%
Mac 1.6 64bit 11'003 12'363 12.4%
Unix 1.6 32bit 26'517 26'858 1.3%

On 64bit the bulk is 10% faster for the
present application that makes heavy use
of allocating these objects and arrays.

> As people keep pointing out in this conversation.


I know, it is logical that if it were that
the application would not make heavy use
of allocating these objects and arrays, then
nothing would be seen. But if nothing would
be seen at all, I wouldn't post here and
asking what is going on. But something is
seen, the 10%. It is not "nothing".

So what is going on in the 64bit?

Bye




Bye
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
find if an array has any element present in another array Charanya Nagarajan Ruby 6 04-30-2009 11:14 AM
Once again pointer to first element vs pointer to array cast topointer to element Szabolcs Borsanyi C Programming 6 05-23-2008 11:06 AM
how to Update/insert an xml element's text----> (<element>text</element>) HANM XML 2 01-29-2008 03:31 PM
If given 1 element in an array of textboxes... find which element number it is \A_Michigan_User\ Javascript 4 11-16-2007 12:58 PM
How to delete array element and add to previous element Al Cholic Ruby 18 07-28-2007 04:56 PM



Advertisments