![]() |
Bulk Array Element Allocation, is it faster?
Dear All,
I just wonder wether modern JIT do optimize Code of the following form: Bla[] bla = new Bla[n]; for (int i=0; i<n; i++) { bla[i] = new Bla(); } When I use the above in my code, my application spends 8800 ms in the benchmark I have. When I then change the code to: Bla[] bla = new Bla[n]; ... if (bla[i]==null) bla[i] = new Bla(); .. So instead of allocating all elements at once, I will allocate them on demand in the subsequent flow of my application. When I use the above in my code, my application now suddently sends 9600 ms in the benchmark I have. So I wonder whether eventually the JIT has a way to detect the bulk allocate of the first version and transform it into something more efficient than my lazy allocation. Any clues? Bye |
Re: Bulk Array Element Allocation, is it faster?
On 9/24/2011 9:17 PM, Jan Burse wrote:
> Dear All, > > I just wonder wether modern JIT do optimize > Code of the following form: > > Bla[] bla = new Bla[n]; > for (int i=0; i<n; i++) { > bla[i] = new Bla(); > } > > When I use the above in my code, my application > spends 8800 ms in the benchmark I have. > > When I then change the code to: > > Bla[] bla = new Bla[n]; > > ... > > if (bla[i]==null) > bla[i] = new Bla(); > > .. > > So instead of allocating all elements at once, > I will allocate them on demand in the subsequent > flow of my application. This could be a win if you will actually use only a few of the array elements, and if constructing a Bla is expensive (for example, if it involves I/O). If Bla's are cheap to make, or if you'll wind up using most of them anyhow, there's no obvious reason to conditionalize. > When I use the above in my code, my application > now suddently sends 9600 ms in the benchmark > I have. Nine percent longer. But I'm highly suspicious of those nice, round numbers: It seems an unlikely coincidence that two rather different chunks of code should both execute in even tenths of seconds. Are you sure you've measured what you want to measure? Are you sure that what you want to measure is the important thing? > So I wonder whether eventually the JIT has > a way to detect the bulk allocate of the first > version and transform it into something more > efficient than my lazy allocation. Based on your (suspect) timings, "just leave it be" would be exactly the kind of improvement you ask for. The JIT very likely does something like that already :-) -- Eric Sosman esosman@ieee-dot-org.invalid |
Re: Bulk Array Element Allocation, is it faster?
On Sun, 25 Sep 2011 03:17:19 +0200, Jan Burse <janburse@fastmail.fm>
wrote, quoted or indirectly quoted someone who said : >Any clues new does two things: 1. allocates X bytes of space 2. initialises that block of space to a pattern. A clever jit may notice that some objects are simple, and always generate the exact same pattern. It can then replace the constructor code with a block move. A clever jit may notice that some objects are simple, except for a few computed bytes. It might be able to compose a fast constructor that does a block move then call the methods to compute the few computed bytes. Moving 0 to a group of bytes is even faster than a block move, since the source is a register or the instruction itself. So another idea is the jit could reorder the fields so that the ones needing the same initialisation byte pattern live side by side, so you can do it with a REP constant move. The flat footed way to do a constructor is to zap the entire block to 0, then go field by field initialised to non-null not-0 and plop that value in on top. Because constructors are called so often, even a very minor optimisation would pay off. -- Roedy Green Canadian Mind Products http://mindprod.com It should not be considered an error when the user starts something already started or stops something already stopped. This applies to browsers, services, editors... It is inexcusable to punish the user by requiring some elaborate sequence to atone, e.g. open the task editor, find and kill some processes. |
Re: Bulk Array Element Allocation, is it faster?
Roedy Green wrote:
> The flat footed way to do a constructor is to zap the entire block to > 0, then go field by field initialised to non-null not-0 and plop that > value in on top. This "flat footed [sic] way" is what Java constructors do, and must. Member fields are set to their default values prior to any constructor assignment, even to the same value as the default. public class Foo { private Bar bar = null; } Foo#bar will be cleared to 'null' upon first construction, then initialized to 'null' by the explicit initialization. Both steps happen. Both steps are required to happen, and they're required to happen separately. > Because constructors are called so often, even a very minor > optimisation would pay off. Perhaps, depending on what you mean by "pay off". Some have argued that the optimization achieved by omitting explicit initialization to default values is not worth the supposed reduction in clarity. Since we all know that member fields are constructed with their default values, I don't think it's any less clear to omit their explicit (and redundant) initialization to default values. I guess it depends on whether that's your opinion, and whether omitting the redundant initialization is an optimization that would pay off. -- Lew |
Re: Bulk Array Element Allocation, is it faster?
On Sat, 24 Sep 2011 23:22:45 -0700 (PDT), Lew <lewbloch@gmail.com>
wrote, quoted or indirectly quoted someone who said : >Foo#bar will be cleared to 'null' upon first construction, then initialized to 'null' by the explicit initialization. > Both steps happen. Both steps are required to happen, and they're required to happen separately. I doubt that. The JLS never says exactly how something works, just what the net effect must be. It is part of what makes it so difficult to follow. A JLS for programmers rather than implementors might describe a simple reference implementation, and say, "As a programmer you can presume it works like this, but actual implementations will be more sophisticated and may not give exactly the same behaviour as this simple explanation. For those fine points, see the implementors JLS." So for example it could describe how a simple GC works. In most cases the second init to null could be dropped by a clever optimiser because it would have the same net effect. Of course I defer to you and Patricia on divining the meaning of the JLS. -- Roedy Green Canadian Mind Products http://mindprod.com It should not be considered an error when the user starts something already started or stops something already stopped. This applies to browsers, services, editors... It is inexcusable to punish the user by requiring some elaborate sequence to atone, e.g. open the task editor, find and kill some processes. |
Re: Bulk Array Element Allocation, is it faster?
Eric Sosman schrieb:
> This could be a win if you will actually use only a few of > the array elements, and if constructing a Bla is expensive (for > example, if it involves I/O). If Bla's are cheap to make, or if > you'll wind up using most of them anyhow, there's no obvious > reason to conditionalize. The bla <init> does nothing. And yes it could be that a bla is not needed, that is why I invented the lazy scheme. But I see the slowdown especially for when all the bla's are needed. > Nine percent longer. But I'm highly suspicious of those nice, > round numbers: It seems an unlikely coincidence that two rather > different chunks of code should both execute in even tenths of > seconds. Repeated measurements give different figures, which lead me to see an accuracy of measurement of around 100ms, so instead of telling you that the application needs one time 9538 ms and another time 9622 ms etc.. I gave only the order of the time spend rounded to hundreds. Bye |
Re: Bulk Array Element Allocation, is it faster?
Roedy Green schrieb:
> new does two things: > 1. allocates X bytes of space > 2. initialises that block of space to a pattern. So a dumb translation of the loop would be: 1. for each i between 0 and n-1 1.1 allocates X bytes of space 1.2 call the initializer Then I was more thinking that the loop does the following, i.e. is cleverly compiled to: 1. allocates n*X bytes of space 2. for each i between 0 and n-1 2.1 call the initializer for location X*i At least I would hand assembler it like this. Saving n-1 calls to the heap allocator. But maybe positive effect is not based on such a clever allocation and only on what Patricia said. I was already googling a little bit about the compilation techniques found for JIT today, but did not yet hit any evidence for clever loop allocation, but it is so obvious I really it must be done. But maybe it is not done because of multi-threading or GC, who knows... Bye |
Re: Bulk Array Element Allocation, is it faster?
On 09/25/2011 03:41 AM, Patricia Shanahan wrote:
> On 9/24/2011 6:17 PM, Jan Burse wrote: >> Dear All, >> >> I just wonder wether modern JIT do optimize >> Code of the following form: >> >> Bla[] bla = new Bla[n]; >> for (int i=0; i<n; i++) { >> bla[i] = new Bla(); >> } >> >> When I use the above in my code, my application >> spends 8800 ms in the benchmark I have. >> >> When I then change the code to: >> >> Bla[] bla = new Bla[n]; >> >> ... >> >> if (bla[i]==null) >> bla[i] = new Bla(); >> >> .. >> >> So instead of allocating all elements at once, >> I will allocate them on demand in the subsequent >> flow of my application. >> >> When I use the above in my code, my application >> now suddently sends 9600 ms in the benchmark >> I have. >> >> So I wonder whether eventually the JIT has >> a way to detect the bulk allocate of the first >> version and transform it into something more >> efficient than my lazy allocation. >> >> Any clues? > > You also need to consider the general optimization of processors in > favor of doing efficiently those things they have done in the recent past. > > When you do the allocation all at once, the code and data for "new > Bla()" is in cache on the second and subsequent calls. There may be > cache conflicts between "new Bla()" and the actual work, leading to > many more cache misses when you interleave them. > > Doing the initialization on demand may be adding an unpredictable > conditional branch to the subsequent flow. This and the fact that lazy initialization has concurrency issues when used in a multi threading context (which e.g. final members do not have) has made me use this approach less frequently. Also, for short lived objects in total it might be much more efficient to just allocate the object even if it is not used because it won't survive new generation anyway. I think the lazy init idiom only really pays off if construction is expensive (e.g. because it involves IO or time consuming calculations). In all other cases it's better to just unconditionally create and let GC work. And because of improvements in JVM technology the balance has moved a lot away from the lazy side because allocation and deallocation overhead became smaller than in earlier Java versions. Kind regards robert |
Re: Bulk Array Element Allocation, is it faster?
Robert Klemme schrieb:
> On 09/25/2011 03:41 AM, Patricia Shanahan wrote: >> On 9/24/2011 6:17 PM, Jan Burse wrote: >>> Dear All, >>> >>> I just wonder wether modern JIT do optimize >>> Code of the following form: >>> >>> Bla[] bla = new Bla[n]; >>> for (int i=0; i<n; i++) { >>> bla[i] = new Bla(); >>> } >>> >>> When I use the above in my code, my application >>> spends 8800 ms in the benchmark I have. >>> >>> When I then change the code to: >>> >>> Bla[] bla = new Bla[n]; >>> >>> ... >>> >>> if (bla[i]==null) >>> bla[i] = new Bla(); >>> >>> .. >>> >>> So instead of allocating all elements at once, >>> I will allocate them on demand in the subsequent >>> flow of my application. >>> >>> When I use the above in my code, my application >>> now suddently sends 9600 ms in the benchmark >>> I have. >>> >>> So I wonder whether eventually the JIT has >>> a way to detect the bulk allocate of the first >>> version and transform it into something more >>> efficient than my lazy allocation. >>> >>> Any clues? >> >> You also need to consider the general optimization of processors in >> favor of doing efficiently those things they have done in the recent >> past. >> >> When you do the allocation all at once, the code and data for "new >> Bla()" is in cache on the second and subsequent calls. There may be >> cache conflicts between "new Bla()" and the actual work, leading to >> many more cache misses when you interleave them. >> >> Doing the initialization on demand may be adding an unpredictable >> conditional branch to the subsequent flow. > > This and the fact that lazy initialization has concurrency issues when > used in a multi threading context (which e.g. final members do not have) > has made me use this approach less frequently. Also, for short lived > objects in total it might be much more efficient to just allocate the > object even if it is not used because it won't survive new generation > anyway. I think the lazy init idiom only really pays off if construction > is expensive (e.g. because it involves IO or time consuming > calculations). In all other cases it's better to just unconditionally > create and let GC work. And because of improvements in JVM technology > the balance has moved a lot away from the lazy side because allocation > and deallocation overhead became smaller than in earlier Java versions. > > Kind regards > > robert Yes, really seems so. Looks that the infinite heap idea works (notion borrowed from a talk by Cliff Click found on the net, should indicate that we just do this, allocate and don't care much). I made some additional benchmarks, in the present case the lazy does not so much good in that it saves allocates. I got the following figures for the application at hand: Bulk: 84'393'262 allocates Lazy: 81'662'843 allocates So the application does not have many exit points in the flow following the array creation, except for the last exit point when anyway all objects were needed. But still I have some doubts! The array I allocate are only 4 elements long or so? Why is there still such a big difference between allocating in bulk only 4 elements and doing them one after the other? Overhead by the lazy detection itself? I doubt this also since the array is anyway accessed, and the lazy check is a small null pointer check. I am still puzzled. Bye |
Re: Bulk Array Element Allocation, is it faster?
On 9/25/2011 4:47 AM, Jan Burse wrote:
> Roedy Green schrieb: >> new does two things: >> 1. allocates X bytes of space >> 2. initialises that block of space to a pattern. > > So a dumb translation of the loop would be: > > 1. for each i between 0 and n-1 > 1.1 allocates X bytes of space > 1.2 call the initializer > > Then I was more thinking that the loop does the > following, i.e. is cleverly compiled to: > > 1. allocates n*X bytes of space > 2. for each i between 0 and n-1 > 2.1 call the initializer for location X*i > > At least I would hand assembler it like this. > Saving n-1 calls to the heap allocator. Since the heap allocator probably consists of one compare-and- branch ("Is there enough room?") and one addition ("Advance the nextFreeAddress pointer"), the savings are unlikely to be large. This isn't malloc(), you know. > I was already googling a little bit about the > compilation techniques found for JIT today, but > did not yet hit any evidence for clever loop > allocation, but it is so obvious I really it > must be done. But maybe it is not done because > of multi-threading or GC, who knows... Note that the semantics of the bulk and lazy allocations in your original example are quite different. An "optimization" that changes the meaning of the code is only a benefit if the code was wrong to begin with. ;-) -- Eric Sosman esosman@ieee-dot-org.invalid |
| All times are GMT. The time now is 08:52 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.