Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Java ready for number crunching?

Reply
Thread Tools

Java ready for number crunching?

 
 
Tom Anderson
Guest
Posts: n/a
 
      05-15-2008
On Thu, 15 May 2008, Monty Hall wrote:

> Lew wrote:
>> Monty Hall wrote:
>>
>>> While I can understand why Sun implemented templates as they did, to
>>> me the problem w/ Java templates is that primitives. I don't like
>>> boxing and unboxing primitives as there is a space and speed penality
>>> (or is there? some optimization?)

>>
>> Study the matter, then make the claim. There is generally little to no
>> "penalty" for boxing and unboxing.

>
> WOW! Now that's a claim! Please let Sun, IBM, NIST, the academy, or
> any authors of java numerical software know of your insight. I know a
> fair number of people who would love a generic java numerics library.
> Little or no penalty casting to/fro primitive numerical types? What
> libraries are you using? Is it open source and where can I get it!


Just for information, looping over an array of ten million ints and adding
them up took me ~30 ms. Doing the same with an array of Integers and using
auto-unboxing took me ~55 ms. So, 25 ms to unbox ten million ints, or 2.5
ns to unbox one.

In relative terms, unboxing takes five-sixths the time as it takes to
increment a loop counter, compare to a loop bound, index into an array,
and add two ints. That's not so surprising, since it's probably a
comparable sequence of operations - starting with the pointer to the
Integer, you've got to displace it by a small constant to get the index of
the value field, then index into the object. I bet the times for both
sequences are dominated by memory accesses; basically, the former makes
one access and the latter two. 2.5 ns for a fetch 400 megatransfers per
second, which is in the same ballpark as memory speed, so this seems like
a plausible explanation to me.

I also tried with an ArrayList of Integers, and couldn't get below 145 ms.
If we had a 'true' ArrayList<int>, i estimate that it would take 145 - 25
- 120 ms in this test.

145 vs 120 ms doesn't seem like a colossal difference to me - it's a 17%
saving. 145 vs 55, or 120 vs 30, on the other hand, is a big difference -
it's 300-400%. The speed penalty for a List over an array is thus much
bigger than the penalty for objects over primitives. If Sun are going to
focus on optimising anything, it should probably be access to collections
rather than unboxing.

And in the meantime, our inner loops will still be around arrays of
primitives.

tom

--
.... but when you spin it it looks like a dancing foetus!
 
Reply With Quote
 
 
 
 
Monty Hall
Guest
Posts: n/a
 
      05-15-2008
Tom Anderson wrote:
> On Thu, 15 May 2008, Monty Hall wrote:
>
>> Lew wrote:
>>> Monty Hall wrote:
>>>
>>>> While I can understand why Sun implemented templates as they did, to
>>>> me the problem w/ Java templates is that primitives. I don't like
>>>> boxing and unboxing primitives as there is a space and speed
>>>> penality (or is there? some optimization?)
>>>
>>> Study the matter, then make the claim. There is generally little to
>>> no "penalty" for boxing and unboxing.

>>
>> WOW! Now that's a claim! Please let Sun, IBM, NIST, the academy, or
>> any authors of java numerical software know of your insight. I know a
>> fair number of people who would love a generic java numerics library.
>> Little or no penalty casting to/fro primitive numerical types? What
>> libraries are you using? Is it open source and where can I get it!

>
> Just for information, looping over an array of ten million ints and
> adding them up took me ~30 ms. Doing the same with an array of Integers
> and using auto-unboxing took me ~55 ms. So, 25 ms to unbox ten million
> ints, or 2.5 ns to unbox one.
>
> In relative terms, unboxing takes five-sixths the time as it takes to
> increment a loop counter, compare to a loop bound, index into an array,
> and add two ints. That's not so surprising, since it's probably a
> comparable sequence of operations - starting with the pointer to the
> Integer, you've got to displace it by a small constant to get the index
> of the value field, then index into the object. I bet the times for both
> sequences are dominated by memory accesses; basically, the former makes
> one access and the latter two. 2.5 ns for a fetch 400 megatransfers per
> second, which is in the same ballpark as memory speed, so this seems
> like a plausible explanation to me.
>
> I also tried with an ArrayList of Integers, and couldn't get below 145
> ms. If we had a 'true' ArrayList<int>, i estimate that it would take 145
> - 25 - 120 ms in this test.
>
> 145 vs 120 ms doesn't seem like a colossal difference to me - it's a 17%
> saving. 145 vs 55, or 120 vs 30, on the other hand, is a big difference
> - it's 300-400%. The speed penalty for a List over an array is thus much
> bigger than the penalty for objects over primitives. If Sun are going to
> focus on optimising anything, it should probably be access to
> collections rather than unboxing.
>
> And in the meantime, our inner loops will still be around arrays of
> primitives.
>
> tom
>


Primitives crush Objects - especially on doubles. Floating points would
have to be used for factorizations, etc.. Server option turned on/off,
& rearranging code object first primitive second and vise-versa - to
warm up the JVM made no difference. Not that I've seen alot of java
linear algebra packages(I use MTL), but I've yet to see one that works
w/ objects. All backing stores are flat primitive arrays for dense
structures.

10,000,000 Integers
-------------------
Primitive Object
a[i] += a[i] 57 118
a[i] *= a[i] 67 168


10,000,000 Doubles
------------------
Primitive Object
a[i] += a[i] 115 2898
a[i] *= a[i] 111 2964
 
Reply With Quote
 
 
 
 
Tom Anderson
Guest
Posts: n/a
 
      05-15-2008
On Thu, 15 May 2008, Monty Hall wrote:

> Tom Anderson wrote:
>> On Thu, 15 May 2008, Monty Hall wrote:
>>
>>> Lew wrote:
>>>> Monty Hall wrote:
>>>>
>>>>> While I can understand why Sun implemented templates as they did, to me
>>>>> the problem w/ Java templates is that primitives. I don't like boxing
>>>>> and unboxing primitives as there is a space and speed penality (or is
>>>>> there? some optimization?)
>>>>
>>>> Study the matter, then make the claim. There is generally little to no
>>>> "penalty" for boxing and unboxing.
>>>
>>> WOW! Now that's a claim! Please let Sun, IBM, NIST, the academy, or any
>>> authors of java numerical software know of your insight. I know a fair
>>> number of people who would love a generic java numerics library. Little or
>>> no penalty casting to/fro primitive numerical types? What libraries are
>>> you using? Is it open source and where can I get it!

>>
>> Just for information, looping over an array of ten million ints and adding
>> them up took me ~30 ms. Doing the same with an array of Integers and using
>> auto-unboxing took me ~55 ms. So, 25 ms to unbox ten million ints, or 2.5
>> ns to unbox one.
>>
>> I also tried with an ArrayList of Integers, and couldn't get below 145 ms.
>> If we had a 'true' ArrayList<int>, i estimate that it would take 145 - 25 -
>> 120 ms in this test.

>
> Primitives crush Objects - especially on doubles. Floating points would have
> to be used for factorizations, etc.. Server option turned on/off, &


I didn't think to try -server! The results are that are interesting: both
the array and all list versions using Integer take 50 ms, and the array
version takes 10 ms. So, the difference between lists and arrays is
eliminated, but the difference between primitives and wrappers is
enhanced. My understanding is that -server applies more compiler
optimisation upfront; that suggests that Sun have done a lot of work on
optimising the object-oriented mucking about (method calls and the like)
involved in a list, but haven't (yet) done anything about eliding boxing.

I would imagine that a year from now, the int vs Integer difference will
be a lot smaller.

I'm on 1.5.0_13, on an intel Mac running OS X 10.4.11, FWIW.

> rearranging code object first primitive second and vise-versa - to warm up
> the JVM made no difference.


I measured different implementations in separate runs, and just looped the
measurement to warm the JVM up. I didn't notice a significant change in
time due to warming-up, so this probably isn't necessary.

> 10,000,000 Integers
> -------------------
> Primitive Object
> a[i] += a[i] 57 118
> a[i] *= a[i] 67 168


That's consistent with what i measured - you spend about 60 ms on an unbox
and rebox. For some reason, a lot more when multiplying, which is odd.

> 10,000,000 Doubles
> ------------------
> Primitive Object
> a[i] += a[i] 115 2898
> a[i] *= a[i] 111 2964


Wow.

Okay, modifying my test to look at longs and doubles, here are all my
results:

Type -server? foo[] Foo[] List<Foo> List<Foo> optimised
int n 30 55 250 145
long n 60 85 270 170
double n 45 60 270 150
int y 10 50 50 50
long y 25 60 50 50
double y 20 50 50 50

I don't see the amazing slowdown with doubles that you do. I see the same
pattern with the big types as with int - lists are much slower than
arrays, -server makes them as fast, and primitives are about twice as fast
as objects.

Hang on, i'll implement your test ...

Aha! With the same inner loop as you:

Type -server? Time
double n 55
double y 55
Double n 1261, 3659, 25530 (!), 1369, 1028
Double y 775, 1378, 1350, 612, 1069, 1071, 612, 1069, 1582

It's garbage collection, isn't it? My code was never creating new wrapper
objects, but yours does, and then puts them in an array. It creates huge
amounts of garbage. That's what slows things down. -server does seem to be
altering the way memory is managed, though, since it's not only a lot
faster than the client VM, but avoids having the occasional massive time.

I believe this is still optimisable, at least in this microbenchmark; a
bit of escape analysis would let the compiler rewrite the Double[] as a
double[]. In the more complex real-world case, though, it might not.

tom

--
The real romance is out ahead and yet to come. The computer revolution
hasn't started yet. -- Alan Kay
 
Reply With Quote
 
Arne Vajhøj
Guest
Posts: n/a
 
      05-15-2008
Lew wrote:
> http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
>> Also, there is nothing preventing one from running the C pre-processor
>> on a Java source file....

>
> Good sense.


If one need a C style preprocessor then using the C one would make
perfectly sense.

Arne
 
Reply With Quote
 
Roger Lindsj
Guest
Posts: n/a
 
      05-16-2008
Tom Anderson wrote:
> On Thu, 15 May 2008, Monty Hall wrote:
>
>> Lew wrote:
>>> Monty Hall wrote:
>>>
>>>> While I can understand why Sun implemented templates as they did, to
>>>> me the problem w/ Java templates is that primitives. I don't like
>>>> boxing and unboxing primitives as there is a space and speed
>>>> penality (or is there? some optimization?)
>>>
>>> Study the matter, then make the claim. There is generally little to
>>> no "penalty" for boxing and unboxing.

>>
>> WOW! Now that's a claim! Please let Sun, IBM, NIST, the academy, or
>> any authors of java numerical software know of your insight. I know a
>> fair number of people who would love a generic java numerics library.
>> Little or no penalty casting to/fro primitive numerical types? What
>> libraries are you using? Is it open source and where can I get it!

>
> Just for information, looping over an array of ten million ints and
> adding them up took me ~30 ms. Doing the same with an array of Integers
> and using auto-unboxing took me ~55 ms. So, 25 ms to unbox ten million
> ints, or 2.5 ns to unbox one.


I made a simple test, just looping 10000000 million times did assignment:

1. int = int
2. int = Integer
3. Integer = Integer
4. Integer = int

The results I got were ~.8ns per iteration with for cases 1, 2 and 3
while for case 4 the iterations took ~1.2ns.

However, this was for numbers between -128 - 127. Getting outside this
range made no difference to cases 1, 2 and 3, but a huge difference to
4. Now each iteration took ~80ns. This is probably due to object
creation and garbage collection.

--
Roger Lindsj
 
Reply With Quote
 
Mark Thornton
Guest
Posts: n/a
 
      05-16-2008
Tom Anderson wrote:
> It's garbage collection, isn't it? My code was never creating new
> wrapper objects, but yours does, and then puts them in an array. It
> creates huge amounts of garbage. That's what slows things down. -server
> does seem to be altering the way memory is managed, though, since it's
> not only a lot faster than the client VM, but avoids having the
> occasional massive time.
>
> I believe this is still optimisable, at least in this microbenchmark; a
> bit of escape analysis would let the compiler rewrite the Double[] as a
> double[]. In the more complex real-world case, though, it might not.
>


I suspect it will be rather hard to optimise Double[] to double[] in
useful cases. Nulls and equals versus == are a particular problem as
they allow the object behaviour of Double to be seen.

Mark Thornton
 
Reply With Quote
 
Kevin McMurtrie
Guest
Posts: n/a
 
      05-17-2008
In article <(E-Mail Removed)>,
Lew <(E-Mail Removed)> wrote:

> Monty Hall wrote:
> > While I can understand why Sun implemented templates as they did, to me
> > the problem w/ Java templates is that

>
> Java does not have templates.
>
> > primitives. I don't like boxing and unboxing primitives as there is a
> > space and speed penality (or is there? some optimization?)

>
> Study the matter, then make the claim. There is generally little to no
> "penalty" for boxing and unboxing.
>
> > I mostly use templates

>
> but not in Java.


Are you nuts? Unboxing is not too bad because conditions are right for
it to be optimized at a low level. Boxing uses Integer.valueOf(int i),
and that's only fast for values -128 to 127. The rest are very slow.

Anyways, I don't care about boxing and unboxing. My number crunching
gripe is the lack of unsigned integer math other than char. Clumsy
workarounds are OK for a few formulas here and there but not hardcore
signal processing of gigabytes of data. Get unsigned integer math in
Java and I'll dump a ton of C codecs that need them.

--
Block Google's spam and enjoy Usenet again.
Reply with Google and I won't hear from you.
 
Reply With Quote
 
Tom Anderson
Guest
Posts: n/a
 
      05-17-2008
On Fri, 16 May 2008, Mark Thornton wrote:

> Tom Anderson wrote:
>
>> It's garbage collection, isn't it? My code was never creating new wrapper
>> objects, but yours does, and then puts them in an array. It creates huge
>> amounts of garbage. That's what slows things down. -server does seem to be
>> altering the way memory is managed, though, since it's not only a lot
>> faster than the client VM, but avoids having the occasional massive time.
>>
>> I believe this is still optimisable, at least in this microbenchmark; a bit
>> of escape analysis would let the compiler rewrite the Double[] as a
>> double[]. In the more complex real-world case, though, it might not.

>
> I suspect it will be rather hard to optimise Double[] to double[] in
> useful cases. Nulls and equals versus == are a particular problem as
> they allow the object behaviour of Double to be seen.


Yes, absolutely. Basically, you need to do escape analysis to find all the
places where the Doubles can be seen, so you can determine if it's safe to
do the unboxing. Escape analysis is notoriously hard in real programs.

Still, encapsulation means this might not be as bad as you might think. If
i have a private Double[] in my class, and i never pass objects from it to
methods of other classes (and possibly if the class is final), the escape
analysis is pretty trivial. It's only when i start doing things like
having globally visible arrays or passing values out to other bits of code
it goes wrong.

tom

--
I didn't think, "I'm going to change the world." No, I'm just going to
build the best machines I can build that I would want to use in my own
life. -- Woz
 
Reply With Quote
 
Mark Thornton
Guest
Posts: n/a
 
      05-17-2008
Tom Anderson wrote:
> Still, encapsulation means this might not be as bad as you might think.
> If i have a private Double[] in my class, and i never pass objects from
> it to methods of other classes (and possibly if the class is final), the
> escape analysis is pretty trivial. It's only when i start doing things
> like having globally visible arrays or passing values out to other bits
> of code it goes wrong.
>


Quite likely in a lot of non trivial mathematics

Mark Thornton
 
Reply With Quote
 
Arne Vajhøj
Guest
Posts: n/a
 
      05-18-2008
Lew wrote:
> Tom Anderson wrote:
>>> Still, encapsulation means this might not be as bad as you might
>>> think. If i [sic] have a private Double[] in my class, and i [sic]
>>> never pass objects from it to methods of other classes (and possibly
>>> if the class is final), the escape analysis is pretty trivial. It's
>>> only when i [sic] start doing things like having globally visible
>>> arrays or passing values out to other bits of code it goes wrong.

>
> Mark Thornton wrote:
>> Quite likely in a lot of non trivial mathematics

>
> The mathematician should have the intelligence to employ a skilled
> programmer to write the software, if the mathematician wants good
> software. Mathematicians have about as much business writing software as
> investment analysts or warehouse managers.
>
> I won't try to poke holes in the recent claimed proof of Fermat's Last
> Theorem, and the mathematician won't try to write well-engineered software.


There is a long tradition in a lot of sciences for researchers either
doing coding themselves or mere frequently paying some of their students
to do it.

Physics, chemistry, medicine, economics.

And a least a huge part of it will not fit well with traditional
software development processes as done on a typical business project.

There are no requirements because it is research and requirements
change as results get in. The logic is very domain specific and
very complex (not the typical CRUD stuff that is a big portion
of most busines apps), so deep domain knowledge is needed. A lot
of apps are literally junked as soon as they have run (so they
do not need to think about the 10-20-30 years of maintenance
that business apps code has to).

Arne
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Prepped and ready to mod. Ludnix Case Modding 137 10-09-2005 11:04 AM
OT: Number Nine, Number Nine, Number Nine Frisbee MCSE 37 09-26-2005 04:06 PM
Hiper Type-R 580W SLI Ready Modular Power Supply Review Silverstrand Front Page News 0 09-03-2005 01:45 PM
FireFox & Thunderbird v1.0.6 ready for Download Old Gringo Firefox 6 07-24-2005 10:53 PM
Firefox...not ready for prime time. Jim L Firefox 21 02-17-2005 02:05 PM



Advertisments