Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Boolean array implementation

Reply
Thread Tools

Boolean array implementation

 
 
aloha.kakuikanu
Guest
Posts: n/a
 
      02-15-2007
JLS describes boolean array

boolean[] x

implemented via array of integers. So if there is a space
consideration, then one may want to pack bits into the integers and
define boolean array as:

class BooleanArray {
private int[] data;
public BooleanArray( int size ) {
data = new int[size/Integer.SIZE+1];
for( int i = 0; i < data.length; i++ )
data[i] = 0;
}
public boolean get( int index ) {
int row = index/Integer.SIZE;
int col = index%Integer.SIZE;
return ((data[row]>>col)&1)==1;
}
public void set( int index, boolean val ) {
int row = index/Integer.SIZE;
int col = index%Integer.SIZE;
int bitmap = data[row];
if( !val ) if( (((bitmap>>col)&1)==1) )
data[row]=bitmap ^ (1<<col);
if( val ) if( (((bitmap>>col)&1)==0) )
data[row]=bitmap | (1<<col);
}
}


Now, I found this implementation to be more than 3 times slower than
native boolean[]. Is this expected behavior, or I'm missing something?
Or, perhaps there is a well known alternative packed boolean array
implementation which doesn't suffer the above space-time tradeoff?

BTW, why boolean is not the same as byte[] (or char[] -- whatever is
smaller?-)

 
Reply With Quote
 
 
 
 
Patricia Shanahan
Guest
Posts: n/a
 
      02-15-2007
aloha.kakuikanu wrote:
> JLS describes boolean array
>
> boolean[] x
>
> implemented via array of integers. So if there is a space
> consideration, then one may want to pack bits into the integers and
> define boolean array as:

....

> Now, I found this implementation to be more than 3 times slower than
> native boolean[]. Is this expected behavior, or I'm missing something?
> Or, perhaps there is a well known alternative packed boolean array
> implementation which doesn't suffer the above space-time tradeoff?


The space/performance tradeoff is probably the reason why boolean[] is
typically implemented with one byte per boolean, rather than one bit.
Processors commonly have efficient hardware support for extracting a
byte from a word and sticking it in a register.

If you want bit packed true/false, why not use java.util.BitSet? It has
methods similar to your get and set.


>
> BTW, why boolean is not the same as byte[] (or char[] -- whatever is
> smaller?-)


At the implementation level, the systems I've measured appear to use one
byte per element. At the language level, a Java boolean is not a number
at all, so it would not make sense for an array of them to be an array
of numbers.

Patricia
 
Reply With Quote
 
 
 
 
Jack Marsh
Guest
Posts: n/a
 
      02-15-2007
Patricia Shanahan wrote:
> aloha.kakuikanu wrote:
>> JLS describes boolean array
>>
>> boolean[] x
>>
>> implemented via array of integers. So if there is a space
>> consideration, then one may want to pack bits into the integers and
>> define boolean array as:

> ...
>
>> Now, I found this implementation to be more than 3 times slower than
>> native boolean[]. Is this expected behavior, or I'm missing something?
>> Or, perhaps there is a well known alternative packed boolean array
>> implementation which doesn't suffer the above space-time tradeoff?

>
> The space/performance tradeoff is probably the reason why boolean[] is
> typically implemented with one byte per boolean, rather than one bit.
> Processors commonly have efficient hardware support for extracting a
> byte from a word and sticking it in a register.
>
> If you want bit packed true/false, why not use java.util.BitSet? It has
> methods similar to your get and set.


java.util.BitSet is actually slower than the OP submitted code. But the
OP code is far from optimal. The following snippit will speed up his
code by about 1/3. But it will still be 1/2 the speed of a boolean []
version. But as Patrica has pointed the speed of boolean [] is at the
cost of memory. boolean [] = new boolean[100000000] (that 100 million )
will take 100 Mbytes of heap. The OP's BooleanArray class and
java.util.BitSet will both take a little over 12 Mbytes. So a boolean []
may be suitable for small sets but not for large sets.

// code to speed up OP version

public void set (int i, boolean b) {
if(b) set(i);
else clear(i);
}
public void set (int i) {
data[i>>5]|=(1<<(i & 0x1f));
}
public void clear (int i) {
data[i>>5] &=~(1<<(i & 0x1f));
}
public boolean get (int i) {
return (data[i>>5] & (1<<(i & 0x1f)))!=0;
}

The code above is prone to ArrayIndexOutofBoundsException, as is a
boolean [] implementation. As an aside java.util.BitSet is an
extensible class that can throw an OutOfMemoryError at any time a
miscalculated index is entered.
>
>
>>
>> BTW, why boolean is not the same as byte[] (or char[] -- whatever is
>> smaller?-)

>
> At the implementation level, the systems I've measured appear to use one
> byte per element. At the language level, a Java boolean is not a number
> at all, so it would not make sense for an array of them to be an array
> of numbers.
>
> Patricia

 
Reply With Quote
 
tlas
Guest
Posts: n/a
 
      02-15-2007
Jack Marsh wrote:
> java.util.BitSet is actually slower than the OP submitted code. But the
> OP code is far from optimal. The following snippit will speed up his
> code by about 1/3. But it will still be 1/2 the speed of a boolean []
> version. But as Patrica has pointed the speed of boolean [] is at the
> cost of memory. boolean [] = new boolean[100000000] (that 100 million )
> will take 100 Mbytes of heap. The OP's BooleanArray class and
> java.util.BitSet will both take a little over 12 Mbytes. So a boolean []
> may be suitable for small sets but not for large sets.
>
> // code to speed up OP version
>
> public void set (int i, boolean b) {
> if(b) set(i);
> else clear(i);
> }
> public void set (int i) {
> data[i>>5]|=(1<<(i & 0x1f));
> }
> public void clear (int i) {
> data[i>>5] &=~(1<<(i & 0x1f));
> }
> public boolean get (int i) {
> return (data[i>>5] & (1<<(i & 0x1f)))!=0;
> }


Actually, any class implementation of an array will be slower than
primitive array. Consider this class

class BooleanBoolArray {
private boolean[] data;

public BooleanBoolArray(int size) {
data = new boolean[size];
}

public boolean get( int index ) {
return (data[index]);
}

public void set( int index, boolean val ) {
data[index] = val;
}
}

and following test

int size = 100000000;
long startTime;
boolean ba[];
BooleanArray cbba;

startTime = System.currentTimeMillis();
ba = new boolean[size];
for (int i = 0; i < size; i++) {
ba[i] = true;
if (!ba[i]) throw new Error("boolean array : index = " + i);
ba[i] = false;
if (ba[i]) throw new Error("boolean array : index = " + i);
}
println("boolean array : " + (System.currentTimeMillis() -
startTime) + " ms.");

ba = null;

startTime = System.currentTimeMillis();
cbba = new BooleanArray(size);
for (int i = 0; i < size; i++) {
cbba.set(i,true);
if (!cbba.get(i)) throw new Error("BooleanArray : index = " + i);
cbba.set(i,false);
if (cbba.get(i)) throw new Error("BooleanArray : index = " + i);
}
println("BooleanBoolArray : " + (System.currentTimeMillis() -
startTime) + " ms.");

On my machine the results are:

boolean array : 1843 ms.
BooleanArray : 4860 ms.

It seems that this kind of "boxing" is almost 3 times slower.

Tomek
 
Reply With Quote
 
tlas
Guest
Posts: n/a
 
      02-15-2007
tlas wrote:
> Actually, any class implementation of an array will be slower than
> primitive array. Consider this class
>
> class BooleanBoolArray {
> private boolean[] data;
>
> public BooleanBoolArray(int size) {
> data = new boolean[size];
> }
>
> public boolean get( int index ) {
> return (data[index]);
> }
>
> public void set( int index, boolean val ) {
> data[index] = val;
> }
> }
>
> and following test
>
> int size = 100000000;
> long startTime;
> boolean ba[];
> BooleanArray cbba;
>
> startTime = System.currentTimeMillis();
> ba = new boolean[size];
> for (int i = 0; i < size; i++) {
> ba[i] = true;
> if (!ba[i]) throw new Error("boolean array : index = " + i);
> ba[i] = false;
> if (ba[i]) throw new Error("boolean array : index = " + i);
> }
> println("boolean array : " + (System.currentTimeMillis() -
> startTime) + " ms.");
>
> ba = null;
>
> startTime = System.currentTimeMillis();
> cbba = new BooleanArray(size);
> for (int i = 0; i < size; i++) {
> cbba.set(i,true);
> if (!cbba.get(i)) throw new Error("BooleanArray : index = " + i);
> cbba.set(i,false);
> if (cbba.get(i)) throw new Error("BooleanArray : index = " + i);
> }
> println("BooleanBoolArray : " + (System.currentTimeMillis() -
> startTime) + " ms.");
>
> On my machine the results are:
>
> boolean array : 1843 ms.
> BooleanArray : 4860 ms.
>
> It seems that this kind of "boxing" is almost 3 times slower.
>
> Tomek



Comparing various implementations I encountered unexpected results.
Consider this class and tests:

class IntTest {
public int i;

public static int j;


public int get(int i) {
return i;
}

public void set(int i) {
this.i = i;
}
}

and test

int size = 100000000;
long startTime;
int test;
IntTest intTest = new IntTest();

startTime = System.currentTimeMillis();
for (int i = 0; i < size; i++) {
intTest.i = i;
test = intTest.i;
}
println("intTest.i : " + (System.currentTimeMillis() - startTime) +
" ms.");

startTime = System.currentTimeMillis();
for (int i = 0; i < size; i++) {
IntTest.j = i;
test = IntTest.j;
}
println("IntTest.j : " + (System.currentTimeMillis() - startTime) +
" ms.");

startTime = System.currentTimeMillis();
for (int i = 0; i < size; i++) {
intTest.set(i);
test = intTest.get(i);
}
println("IntTest.set/get : " + (System.currentTimeMillis() -
startTime) + " ms.");

and the results are

intTest.i : 3593 ms.
IntTest.j : 3657 ms.
IntTest.set/get : 1656 ms.

Direct read/write to/from class field is 2 times slower than using
get/set methods. What a surprise (at least for me)!!! Can anyone explain
this?

Tomek
 
Reply With Quote
 
tlas
Guest
Posts: n/a
 
      02-15-2007
tlas wrote:
[...]

Ooops, my mistake

> public int get(int i) {
> return i;
> }


should be

public int get() {
return i;
}

> startTime = System.currentTimeMillis();
> for (int i = 0; i < size; i++) {
> intTest.set(i);
> test = intTest.get(i);
> }
> println("IntTest.set/get : " + (System.currentTimeMillis() -
>startTime) + " ms.");



should be

startTime = System.currentTimeMillis();
for (int i = 0; i < size; i++) {
intTest.set(i);
test = intTest.get();
}
println("IntTest.set/get : " + (System.currentTimeMillis() -
startTime) + " ms.");

Sorry

Tomek
 
Reply With Quote
 
Daniel Dyer
Guest
Posts: n/a
 
      02-15-2007
On Thu, 15 Feb 2007 13:15:30 -0000, tlas <(E-Mail Removed)>
wrote:

> On my machine the results are:
>
> boolean array : 1843 ms.
> BooleanArray : 4860 ms.
>
> It seems that this kind of "boxing" is almost 3 times slower.
>


Tried your example on my machine and the difference was even more
pronounced:

boolean array : 641 ms.
BooleanBoolArray : 3658 ms.

However, switch to server VM (JDK 1.5.0_10 on WinXP):

boolean array : 172 ms.
BooleanArray : 235 ms.

Most of the time for the first run seems to be down to allocating the
memory, taking the array allocations out of the timings gives these
figures:

Client VM:
boolean array : 594 ms.
BooleanArray : 626 ms.

Server VM:
boolean array : 94 ms.
BooleanArray : 109 ms.


Dan.

--
Daniel Dyer
http://www.uncommons.org
 
Reply With Quote
 
aloha.kakuikanu
Guest
Posts: n/a
 
      02-15-2007
On Feb 14, 8:03 pm, Patricia Shanahan <(E-Mail Removed)> wrote:
> The space/performance tradeoff is probably the reason why boolean[] is
> typically implemented with one byte per boolean, rather than one bit.
> Processors commonly have efficient hardware support for extracting a
> byte from a word and sticking it in a register.
>
> If you want bit packed true/false, why not use java.util.BitSet? It has
> methods similar to your get and set.


I was unaware of BitSet! (I naively expected that google would point
me to all Boolean Array classes if asked for "Boolean Array").

BitSet is almost twice as fast as my homegrown class, although almost
twice as slow as native boolean[]. So the tradeoff is a little bit
slower.

It looks like I'm going to use boolean[] for small arrays and switch
to BitSet for larger ones. Nice application of polymorphism (which I
didn't find useful in my code for years).

 
Reply With Quote
 
Chris Uppal
Guest
Posts: n/a
 
      02-15-2007
tlas wrote:

> Direct read/write to/from class field is 2 times slower than using
> get/set methods. What a surprise (at least for me)!!! Can anyone explain
> this?


It's always difficult to guess (correctly) what the JIT will do. However, your
test would be better if it followed common practise for micro-benchmarks. Move
each test into a separate method. Run all the tests in turn inside a loop (an
infinite loop is as good as any). I'll append a modified version of the test
driver class to this post.

Using it, and using the -client JMV from JDK1.6.0 I see:

intTest.i : 390 ms.
IntTest.j : 271 ms.
IntTest.set/get : 210 ms.
intTest.i : 200 ms.
IntTest.j : 281 ms.
IntTest.set/get : 210 ms.
intTest.i : 200 ms.
IntTest.j : 271 ms.
IntTest.set/get : 210 ms.
intTest.i : 200 ms.
IntTest.j : 271 ms.
IntTest.set/get : 210 ms.
intTest.i : 200 ms.
IntTest.j : 271 ms.
IntTest.set/get : 210 ms.
intTest.i : 200 ms.
IntTest.j : 271 ms.
IntTest.set/get : 210 ms.
intTest.i : 210 ms.
IntTest.j : 271 ms.
IntTest.set/get : 200 ms.
intTest.i : 210 ms.
IntTest.j : 271 ms.
IntTest.set/get : 200 ms.
...

Notice how the first run through the loop is not representative. This
particular test settles down much more quickly than usual, but it does appear
that the JITed (and presumably inlined) code generated for the set/get pair is
more efficient than that generated for the direct field access. At a guess the
optimiser isn't tuned to work as hard on direct access from outside the object
itself (why should it be ? No one writes code like that in reality.)

Now, switching the -server JVM. I see:

intTest.i : 90 ms.
IntTest.j : 90 ms.
IntTest.set/get : 70 ms.
intTest.i : 101 ms.
IntTest.j : 50 ms.
IntTest.set/get : 40 ms.
intTest.i : 40 ms.
IntTest.j : 40 ms.
IntTest.set/get : 20 ms.
intTest.i : 20 ms.
IntTest.j : 20 ms.
IntTest.set/get : 20 ms.
intTest.i : 20 ms.
IntTest.j : 30 ms.
IntTest.set/get : 20 ms.
intTest.i : 20 ms.
IntTest.j : 20 ms.
IntTest.set/get : 20 ms.
...

Again notice how it takes a while to settle down, and that the initial figures
mean nothing. In this case it appears that the -server option has largely
eliminated the inner loop (to be honest I had expected that it would reduce it
to zero -- I'm not sure why it didn't manage that for this example, it usually
does unless you take special care). However, it clearly has done enough
optimisation to invalidate this micro-benchmark completely.

I haven't tried applying a similar benchmarkng approach to your (and the OPs)
original BooleanArray code myself, but unless/until you do try it, you won't
know what the performance of that code actually is.

-- chris

=======================
public class Test
{
static final int LOOPS = 100000000;
public static void
main(String[] args)
{
for (;
{
timeit1();
timeit2();
timeit3();
}
}

static void
timeit1()
{
long startTime;
int test;
IntTest intTest = new IntTest();

startTime = System.currentTimeMillis();
for (int i = 0; i < LOOPS; i++)
{
intTest.i = i;
test = intTest.i;
}
println("intTest.i : " + (System.currentTimeMillis() - startTime) + "
ms.");
}

static void
timeit2()
{
long startTime;
int test;
IntTest intTest = new IntTest();

startTime = System.currentTimeMillis();
for (int i = 0; i < LOOPS; i++)
{
IntTest.j = i;
test = IntTest.j;
}
println("IntTest.j : " + (System.currentTimeMillis() - startTime) + "
ms.");
}

static void
timeit3()
{
long startTime;
int test;
IntTest intTest = new IntTest();

startTime = System.currentTimeMillis();
for (int i = 0; i < LOOPS; i++)
{
intTest.set(i);
test = intTest.get();
}
println("IntTest.set/get : " + (System.currentTimeMillis() - startTime)
+ " ms.");
}

static void
println(String s)
{
System.out.println(s);
}
}=======================




 
Reply With Quote
 
Daniel Pitts
Guest
Posts: n/a
 
      02-15-2007
On Feb 15, 9:18 am, "aloha.kakuikanu" <(E-Mail Removed)>
wrote:
> On Feb 14, 8:03 pm, Patricia Shanahan <(E-Mail Removed)> wrote:
>
> > The space/performance tradeoff is probably the reason why boolean[] is
> > typically implemented with one byte per boolean, rather than one bit.
> > Processors commonly have efficient hardware support for extracting a
> > byte from a word and sticking it in a register.

>
> > If you want bit packed true/false, why not use java.util.BitSet? It has
> > methods similar to your get and set.

>
> I was unaware of BitSet! (I naively expected that google would point
> me to all Boolean Array classes if asked for "Boolean Array").
>
> BitSet is almost twice as fast as my homegrown class, although almost
> twice as slow as native boolean[]. So the tradeoff is a little bit
> slower.
>
> It looks like I'm going to use boolean[] for small arrays and switch
> to BitSet for larger ones. Nice application of polymorphism (which I
> didn't find useful in my code for years).



BitSet should be faster on some operations than a boolean[].

Specifically, bulk operations (eg: AND these two bitsets together)
It is also much more memory effecient (at best 32 times, and worst 8
times)

It would also be faster for nextSetBit and nextClearBit than a naive
boolean[] approach.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Subtle difference between boolean value and boolean comparison? Metre Meter Javascript 7 08-06-2010 08:40 PM
difference between 'boolean' and 'java.lang.Boolean' J Leonard Java 4 01-19-2008 02:56 AM
Writing BOOLEAN Array to a file Speed C Programming 2 02-17-2006 04:39 PM
Set boolean array elements to false using STL algorithm? Jason Heyes C++ 11 01-16-2006 10:39 AM
Find if elements of one array are present in the other and return a boolean array Shalini C++ 2 01-09-2004 06:13 PM



Advertisments