Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C++ (http://www.velocityreviews.com/forums/f39-c.html)
-   -   bench: binary trees (http://www.velocityreviews.com/forums/t948664-bench-binary-trees.html)

Melzzzzz 07-28-2012 11:20 PM

bench: binary trees
 
I have played a little tree with benchmark from computer language game
site:
http://shootout.alioth.debian.org/u6...st=binarytrees

Interested in gc vs manual I have been very surprised with Java
performance ;)
Seems that Java gc is blazingly fast in comparison with standard
new/delete.
I think this is because gc deallocates in chunks rather piece by
piece, perhaps allocates from pool. Also there are no destructors
calls.
fastest C++ program on this site uses boost pool which calls
destructors and can deallocate, therefore, is slower then C variant
which uses apache's pool, which just deallocate.

So I have coded simple custom pool that does not keeps track of
allocated objects and doesn't call destructors and result is
that it is slightly faster than C ;)

What is clear is that Java garbage collector cannot be bitten
in this case if manual deletion of objects is required.
(as seen here)
http://shootout.alioth.debian.org/u6...st=binarytrees

Timings for cpp with my custom alloc
bmaxa@maxa:~/shootout/trees$ time ./binarytreescpp 20
stretch tree of depth 21 check: -1
2097152 trees of depth 4 check: -2097152
524288 trees of depth 6 check: -524288
131072 trees of depth 8 check: -131072
32768 trees of depth 10 check: -32768
8192 trees of depth 12 check: -8192
2048 trees of depth 14 check: -2048
512 trees of depth 16 check: -512
128 trees of depth 18 check: -128
32 trees of depth 20 check: -32
long lived tree of depth 20 check: -1

real 0m4.188s
user 0m7.036s
sys 0m0.112s
bmaxa@maxa:~/shootout/trees$ time taskset -c 0 ./binarytreescpp 20
stretch tree of depth 21 check: -1
2097152 trees of depth 4 check: -2097152
524288 trees of depth 6 check: -524288
131072 trees of depth 8 check: -131072
32768 trees of depth 10 check: -32768
8192 trees of depth 12 check: -8192
2048 trees of depth 14 check: -2048
512 trees of depth 16 check: -512
128 trees of depth 18 check: -128
32 trees of depth 20 check: -32
long lived tree of depth 20 check: -1

real 0m7.077s
user 0m6.976s
sys 0m0.084s

Timings for C with apache pool:
bmaxa@maxa:~/shootout/trees$ time ./binarytreesc 20
stretch tree of depth 21 check: -1
2097152 trees of depth 4 check: -2097152
524288 trees of depth 6 check: -524288
131072 trees of depth 8 check: -131072
32768 trees of depth 10 check: -32768
8192 trees of depth 12 check: -8192
2048 trees of depth 14 check: -2048
512 trees of depth 16 check: -512
128 trees of depth 18 check: -128
32 trees of depth 20 check: -32
long lived tree of depth 20 check: -1

real 0m4.625s
user 0m8.085s
sys 0m0.140s
bmaxa@maxa:~/shootout/trees$ time taskset -c 0 ./binarytreesc 20
stretch tree of depth 21 check: -1
2097152 trees of depth 4 check: -2097152
524288 trees of depth 6 check: -524288
131072 trees of depth 8 check: -131072
32768 trees of depth 10 check: -32768
8192 trees of depth 12 check: -8192
2048 trees of depth 14 check: -2048
512 trees of depth 16 check: -512
128 trees of depth 18 check: -128
32 trees of depth 20 check: -32
long lived tree of depth 20 check: -1

real 0m8.009s
user 0m7.888s
sys 0m0.116s

code follows if someone is interested:
/* The Computer Language Benchmarks Game
* http://shootout.alioth.debian.org/
*
* contributed by Jon Harrop
* modified by Alex Mizrahi
* modified by Andreas Schäfer
* very minor omp tweak by The Anh Tran
* custom pool Branimir Maksimovic
*/

#include <iostream>
#include <stdlib.h>
#include <stdio.h>
#include <vector>
#include <new>
#include <pthread.h>
#include <boost/pool/object_pool.hpp>

const size_t LINE_SIZE = 64;

template <class T>
class Pool{
static const unsigned NELEM = 10000*sizeof(T);
public:
Pool()
: pos_(0)
{
pool_.push_back(alloc());
}
Pool(const Pool&)=delete;
Pool& operator=(const Pool&)=delete;
~Pool()
{
for(auto i : pool_)
{
free(i);
}
}
template <class ...T1>
T* allocate(T1... args)
{
if(NELEM <= pos_)
{
pool_.push_back(alloc());
pos_=0;
}
void* p = &pool_.back()[pos_];
pos_+=sizeof(T);
return new(p)T(args...);
}
private:
static char* alloc()
{
Inner* next = (Inner*)pthread_getspecific(k.key);
if(next == nullptr)
{
next = (Inner*)::operator new(NELEM);
next->next = nullptr;
}
char* p = (char*)next;
next = next->next;
pthread_setspecific(k.key,next);
return p;
}
static void free(void* p)
{
Inner* next = (Inner*)pthread_getspecific(k.key);
Inner* tmp = (Inner*)p;
tmp->next = next;
next = tmp;
pthread_setspecific(k.key,next);
}
std::vector<char*> pool_;
unsigned pos_;
struct Inner{
Inner* next;
};
static struct Key{
Key()
{
pthread_key_create(&key,destroy);
}
~Key()
{
pthread_key_delete(key);
}
static void destroy(void* p)
{
Inner* next = (Inner*)p;
while(next)
{
Inner* tmp = next->next;
::operator delete(next);
next = tmp;
}
}
pthread_key_t key;
}k;
};

template <class T>
typename Pool<T>::Key Pool<T>::k;

struct Node
{
Node *l, *r;
int i;

Node(int i2) :l(0),r(0), i(i2)
{}
Node(Node *l2, int i2, Node *r2) : l(l2), r(r2), i(i2)
{}
Node(const Node&)=delete;
Node& operator=(const Node&)=delete;
int check() const
{
if (l)
return l->check() + i - r->check();
else return i;
}
};

typedef boost::object_pool<Node> NodePool;

Node *make(int i, int d, Pool<Node>& p)
{
if (d > 0)
return p.allocate( make(2*i-1, d-1,p),
i,
make(2*i, d-1,p));
return p.allocate(i);
}

Node *make(int i, int d, NodePool& p)
{
if (d > 0)
return p.construct( make(2*i-1, d-1,p),
i,
make(2*i, d-1,p));
return p.construct(i);
}

int main(int argc, char *argv[])
{
int min_depth = 4;
int max_depth = std::max(min_depth+2,
(argc == 2 ? atoi(argv[1]) : 10));
int stretch_depth = max_depth+1;

// Alloc then dealloc stretchdepth tree
{
Pool<Node> c_pool;
//NodePool c_pool;
Node* c (make(0, stretch_depth,c_pool));
std::cout << "stretch tree of depth " << stretch_depth << "\t "
<< "check: " << c->check() << '\n';
}

Pool<Node> long_lived_tree_pool;
//NodePool long_lived_tree_pool;
Node* long_lived_tree(make(0, max_depth,long_lived_tree_pool));

char *outputstr = (char*)malloc(LINE_SIZE * (max_depth +1) * sizeof(char));

#pragma omp parallel for
for (int d = min_depth; d <= max_depth; d += 2)
{
int iterations = 1 << (max_depth - d + min_depth);
int c = 0;

for (int i = 1; i <= iterations; ++i)
{
Pool<Node> pool;
//NodePool pool;
Node *a(make(i, d,pool)), *b(make(-i, d,pool));
c += a->check() + b->check();
}

sprintf(outputstr + LINE_SIZE * d, "%d\t trees of depth %d\t check: %d\n", (2 * iterations), d, c);
}

for (int d = min_depth; d <= max_depth; d += 2)
printf("%s", outputstr + (d * LINE_SIZE) );
free(outputstr);

std::cout << "long lived tree of depth " << max_depth << "\t "
<< "check: " << (long_lived_tree->check()) << '\n';

return 0;
}



W Karas 07-28-2012 11:31 PM

Re: bench: binary trees
 
On Saturday, July 28, 2012 7:20:04 PM UTC-4, Melzzzzz wrote:
> I have played a little tree with benchmark from computer language game
>
> site:
>
> http://shootout.alioth.debian.org/u6...st=binarytrees
>
>
>
> Interested in gc vs manual I have been very surprised with Java
>
> performance ;)
>
> Seems that Java gc is blazingly fast in comparison with standard
>
> new/delete.


Is it possible that the benchmark never causes GC to run? Is so, that would also mean any time it took to run destructors was also ignored in the Java case.

>
> I think this is because gc deallocates in chunks rather piece by
>
> piece, perhaps allocates from pool. Also there are no destructors
>
> calls.
>
> fastest C++ program on this site uses boost pool which calls
>
> destructors and can deallocate, therefore, is slower then C variant
>
> which uses apache's pool, which just deallocate.
>
>
>
> So I have coded simple custom pool that does not keeps track of
>
> allocated objects and doesn't call destructors and result is
>
> that it is slightly faster than C ;)
>
>
>
> What is clear is that Java garbage collector cannot be bitten
>
> in this case if manual deletion of objects is required.
>
> (as seen here)
>
> http://shootout.alioth.debian.org/u6...st=binarytrees
>
>
>
> Timings for cpp with my custom alloc
>
> bmaxa@maxa:~/shootout/trees$ time ./binarytreescpp 20
>
> stretch tree of depth 21 check: -1
>
> 2097152 trees of depth 4 check: -2097152
>
> 524288 trees of depth 6 check: -524288
>
> 131072 trees of depth 8 check: -131072
>
> 32768 trees of depth 10 check: -32768
>
> 8192 trees of depth 12 check: -8192
>
> 2048 trees of depth 14 check: -2048
>
> 512 trees of depth 16 check: -512
>
> 128 trees of depth 18 check: -128
>
> 32 trees of depth 20 check: -32
>
> long lived tree of depth 20 check: -1
>
>
>
> real 0m4.188s
>
> user 0m7.036s
>
> sys 0m0.112s
>
> bmaxa@maxa:~/shootout/trees$ time taskset -c 0 ./binarytreescpp 20
>
> stretch tree of depth 21 check: -1
>
> 2097152 trees of depth 4 check: -2097152
>
> 524288 trees of depth 6 check: -524288
>
> 131072 trees of depth 8 check: -131072
>
> 32768 trees of depth 10 check: -32768
>
> 8192 trees of depth 12 check: -8192
>
> 2048 trees of depth 14 check: -2048
>
> 512 trees of depth 16 check: -512
>
> 128 trees of depth 18 check: -128
>
> 32 trees of depth 20 check: -32
>
> long lived tree of depth 20 check: -1
>
>
>
> real 0m7.077s
>
> user 0m6.976s
>
> sys 0m0.084s
>
>
>
> Timings for C with apache pool:
>
> bmaxa@maxa:~/shootout/trees$ time ./binarytreesc 20
>
> stretch tree of depth 21 check: -1
>
> 2097152 trees of depth 4 check: -2097152
>
> 524288 trees of depth 6 check: -524288
>
> 131072 trees of depth 8 check: -131072
>
> 32768 trees of depth 10 check: -32768
>
> 8192 trees of depth 12 check: -8192
>
> 2048 trees of depth 14 check: -2048
>
> 512 trees of depth 16 check: -512
>
> 128 trees of depth 18 check: -128
>
> 32 trees of depth 20 check: -32
>
> long lived tree of depth 20 check: -1
>
>
>
> real 0m4.625s
>
> user 0m8.085s
>
> sys 0m0.140s
>
> bmaxa@maxa:~/shootout/trees$ time taskset -c 0 ./binarytreesc 20
>
> stretch tree of depth 21 check: -1
>
> 2097152 trees of depth 4 check: -2097152
>
> 524288 trees of depth 6 check: -524288
>
> 131072 trees of depth 8 check: -131072
>
> 32768 trees of depth 10 check: -32768
>
> 8192 trees of depth 12 check: -8192
>
> 2048 trees of depth 14 check: -2048
>
> 512 trees of depth 16 check: -512
>
> 128 trees of depth 18 check: -128
>
> 32 trees of depth 20 check: -32
>
> long lived tree of depth 20 check: -1
>
>
>
> real 0m8.009s
>
> user 0m7.888s
>
> sys 0m0.116s
>
>
>
> code follows if someone is interested:
>
> /* The Computer Language Benchmarks Game
>
> * http://shootout.alioth.debian.org/
>
> *
>
> * contributed by Jon Harrop
>
> * modified by Alex Mizrahi
>
> * modified by Andreas Schäfer
>
> * very minor omp tweak by The Anh Tran
>
> * custom pool Branimir Maksimovic
>
> */
>
>
>
> #include <iostream>
>
> #include <stdlib.h>
>
> #include <stdio.h>
>
> #include <vector>
>
> #include <new>
>
> #include <pthread.h>
>
> #include <boost/pool/object_pool.hpp>
>
>
>
> const size_t LINE_SIZE = 64;
>
>
>
> template <class T>
>
> class Pool{
>
> static const unsigned NELEM = 10000*sizeof(T);
>
> public:
>
> Pool()
>
> : pos_(0)
>
> {
>
> pool_.push_back(alloc());
>
> }
>
> Pool(const Pool&)=delete;
>
> Pool& operator=(const Pool&)=delete;
>
> ~Pool()
>
> {
>
> for(auto i : pool_)
>
> {
>
> free(i);
>
> }
>
> }
>
> template <class ...T1>
>
> T* allocate(T1... args)
>
> {
>
> if(NELEM <= pos_)
>
> {
>
> pool_.push_back(alloc());
>
> pos_=0;
>
> }
>
> void* p = &pool_.back()[pos_];
>
> pos_+=sizeof(T);
>
> return new(p)T(args...);
>
> }
>
> private:
>
> static char* alloc()
>
> {
>
> Inner* next = (Inner*)pthread_getspecific(k.key);
>
> if(next == nullptr)
>
> {
>
> next = (Inner*)::operator new(NELEM);
>
> next->next = nullptr;
>
> }
>
> char* p = (char*)next;
>
> next = next->next;
>
> pthread_setspecific(k.key,next);
>
> return p;
>
> }
>
> static void free(void* p)
>
> {
>
> Inner* next = (Inner*)pthread_getspecific(k.key);
>
> Inner* tmp = (Inner*)p;
>
> tmp->next = next;
>
> next = tmp;
>
> pthread_setspecific(k.key,next);
>
> }
>
> std::vector<char*> pool_;
>
> unsigned pos_;
>
> struct Inner{
>
> Inner* next;
>
> };
>
> static struct Key{
>
> Key()
>
> {
>
> pthread_key_create(&key,destroy);
>
> }
>
> ~Key()
>
> {
>
> pthread_key_delete(key);
>
> }
>
> static void destroy(void* p)
>
> {
>
> Inner* next = (Inner*)p;
>
> while(next)
>
> {
>
> Inner* tmp = next->next;
>
> ::operator delete(next);
>
> next = tmp;
>
> }
>
> }
>
> pthread_key_t key;
>
> }k;
>
> };
>
>
>
> template <class T>
>
> typename Pool<T>::Key Pool<T>::k;
>
>
>
> struct Node
>
> {
>
> Node *l, *r;
>
> int i;
>
>
>
> Node(int i2) :l(0),r(0), i(i2)
>
> {}
>
> Node(Node *l2, int i2, Node *r2) : l(l2), r(r2), i(i2)
>
> {}
>
> Node(const Node&)=delete;
>
> Node& operator=(const Node&)=delete;
>
> int check() const
>
> {
>
> if (l)
>
> return l->check() + i - r->check();
>
> else return i;
>
> }
>
> };
>
>
>
> typedef boost::object_pool<Node> NodePool;
>
>
>
> Node *make(int i, int d, Pool<Node>& p)
>
> {
>
> if (d > 0)
>
> return p.allocate( make(2*i-1, d-1,p),
>
> i,
>
> make(2*i, d-1,p));
>
> return p.allocate(i);
>
> }
>
>
>
> Node *make(int i, int d, NodePool& p)
>
> {
>
> if (d > 0)
>
> return p.construct( make(2*i-1, d-1,p),
>
> i,
>
> make(2*i, d-1,p));
>
> return p.construct(i);
>
> }
>
>
>
> int main(int argc, char *argv[])
>
> {
>
> int min_depth = 4;
>
> int max_depth = std::max(min_depth+2,
>
> (argc == 2 ? atoi(argv[1]) : 10));
>
> int stretch_depth = max_depth+1;
>
>
>
> // Alloc then dealloc stretchdepth tree
>
> {
>
> Pool<Node> c_pool;
>
> //NodePool c_pool;
>
> Node* c (make(0, stretch_depth,c_pool));
>
> std::cout << "stretch tree of depth " << stretch_depth << "\t "
>
> << "check: " << c->check() << '\n';
>
> }
>
>
>
> Pool<Node> long_lived_tree_pool;
>
> //NodePool long_lived_tree_pool;
>
> Node* long_lived_tree(make(0, max_depth,long_lived_tree_pool));
>
>
>
> char *outputstr = (char*)malloc(LINE_SIZE * (max_depth +1) * sizeof(char));
>
>
>
> #pragma omp parallel for
>
> for (int d = min_depth; d <= max_depth; d += 2)
>
> {
>
> int iterations = 1 << (max_depth - d + min_depth);
>
> int c = 0;
>
>
>
> for (int i = 1; i <= iterations; ++i)
>
> {
>
> Pool<Node> pool;
>
> //NodePool pool;
>
> Node *a(make(i, d,pool)), *b(make(-i, d,pool));
>
> c += a->check() + b->check();
>
> }
>
>
>
> sprintf(outputstr + LINE_SIZE * d, "%d\t trees of depth %d\t check: %d\n", (2 * iterations), d, c);
>
> }
>
>
>
> for (int d = min_depth; d <= max_depth; d += 2)
>
> printf("%s", outputstr + (d * LINE_SIZE) );
>
> free(outputstr);
>
>
>
> std::cout << "long lived tree of depth " << max_depth << "\t "
>
> << "check: " << (long_lived_tree->check()) << '\n';
>
>
>
> return 0;
>
> }




On Saturday, July 28, 2012 7:20:04 PM UTC-4, Melzzzzz wrote:
> I have played a little tree with benchmark from computer language game
>
> site:
>
> http://shootout.alioth.debian.org/u6...st=binarytrees
>
>
>
> Interested in gc vs manual I have been very surprised with Java
>
> performance ;)
>
> Seems that Java gc is blazingly fast in comparison with standard
>
> new/delete.
>
> I think this is because gc deallocates in chunks rather piece by
>
> piece, perhaps allocates from pool. Also there are no destructors
>
> calls.
>
> fastest C++ program on this site uses boost pool which calls
>
> destructors and can deallocate, therefore, is slower then C variant
>
> which uses apache's pool, which just deallocate.
>
>
>
> So I have coded simple custom pool that does not keeps track of
>
> allocated objects and doesn't call destructors and result is
>
> that it is slightly faster than C ;)
>
>
>
> What is clear is that Java garbage collector cannot be bitten
>
> in this case if manual deletion of objects is required.
>
> (as seen here)
>
> http://shootout.alioth.debian.org/u6...st=binarytrees
>
>
>
> Timings for cpp with my custom alloc
>
> bmaxa@maxa:~/shootout/trees$ time ./binarytreescpp 20
>
> stretch tree of depth 21 check: -1
>
> 2097152 trees of depth 4 check: -2097152
>
> 524288 trees of depth 6 check: -524288
>
> 131072 trees of depth 8 check: -131072
>
> 32768 trees of depth 10 check: -32768
>
> 8192 trees of depth 12 check: -8192
>
> 2048 trees of depth 14 check: -2048
>
> 512 trees of depth 16 check: -512
>
> 128 trees of depth 18 check: -128
>
> 32 trees of depth 20 check: -32
>
> long lived tree of depth 20 check: -1
>
>
>
> real 0m4.188s
>
> user 0m7.036s
>
> sys 0m0.112s
>
> bmaxa@maxa:~/shootout/trees$ time taskset -c 0 ./binarytreescpp 20
>
> stretch tree of depth 21 check: -1
>
> 2097152 trees of depth 4 check: -2097152
>
> 524288 trees of depth 6 check: -524288
>
> 131072 trees of depth 8 check: -131072
>
> 32768 trees of depth 10 check: -32768
>
> 8192 trees of depth 12 check: -8192
>
> 2048 trees of depth 14 check: -2048
>
> 512 trees of depth 16 check: -512
>
> 128 trees of depth 18 check: -128
>
> 32 trees of depth 20 check: -32
>
> long lived tree of depth 20 check: -1
>
>
>
> real 0m7.077s
>
> user 0m6.976s
>
> sys 0m0.084s
>
>
>
> Timings for C with apache pool:
>
> bmaxa@maxa:~/shootout/trees$ time ./binarytreesc 20
>
> stretch tree of depth 21 check: -1
>
> 2097152 trees of depth 4 check: -2097152
>
> 524288 trees of depth 6 check: -524288
>
> 131072 trees of depth 8 check: -131072
>
> 32768 trees of depth 10 check: -32768
>
> 8192 trees of depth 12 check: -8192
>
> 2048 trees of depth 14 check: -2048
>
> 512 trees of depth 16 check: -512
>
> 128 trees of depth 18 check: -128
>
> 32 trees of depth 20 check: -32
>
> long lived tree of depth 20 check: -1
>
>
>
> real 0m4.625s
>
> user 0m8.085s
>
> sys 0m0.140s
>
> bmaxa@maxa:~/shootout/trees$ time taskset -c 0 ./binarytreesc 20
>
> stretch tree of depth 21 check: -1
>
> 2097152 trees of depth 4 check: -2097152
>
> 524288 trees of depth 6 check: -524288
>
> 131072 trees of depth 8 check: -131072
>
> 32768 trees of depth 10 check: -32768
>
> 8192 trees of depth 12 check: -8192
>
> 2048 trees of depth 14 check: -2048
>
> 512 trees of depth 16 check: -512
>
> 128 trees of depth 18 check: -128
>
> 32 trees of depth 20 check: -32
>
> long lived tree of depth 20 check: -1
>
>
>
> real 0m8.009s
>
> user 0m7.888s
>
> sys 0m0.116s
>
>
>
> code follows if someone is interested:
>
> /* The Computer Language Benchmarks Game
>
> * http://shootout.alioth.debian.org/
>
> *
>
> * contributed by Jon Harrop
>
> * modified by Alex Mizrahi
>
> * modified by Andreas Schäfer
>
> * very minor omp tweak by The Anh Tran
>
> * custom pool Branimir Maksimovic
>
> */
>
>
>
> #include <iostream>
>
> #include <stdlib.h>
>
> #include <stdio.h>
>
> #include <vector>
>
> #include <new>
>
> #include <pthread.h>
>
> #include <boost/pool/object_pool.hpp>
>
>
>
> const size_t LINE_SIZE = 64;
>
>
>
> template <class T>
>
> class Pool{
>
> static const unsigned NELEM = 10000*sizeof(T);
>
> public:
>
> Pool()
>
> : pos_(0)
>
> {
>
> pool_.push_back(alloc());
>
> }
>
> Pool(const Pool&)=delete;
>
> Pool& operator=(const Pool&)=delete;
>
> ~Pool()
>
> {
>
> for(auto i : pool_)
>
> {
>
> free(i);
>
> }
>
> }
>
> template <class ...T1>
>
> T* allocate(T1... args)
>
> {
>
> if(NELEM <= pos_)
>
> {
>
> pool_.push_back(alloc());
>
> pos_=0;
>
> }
>
> void* p = &pool_.back()[pos_];
>
> pos_+=sizeof(T);
>
> return new(p)T(args...);
>
> }
>
> private:
>
> static char* alloc()
>
> {
>
> Inner* next = (Inner*)pthread_getspecific(k.key);
>
> if(next == nullptr)
>
> {
>
> next = (Inner*)::operator new(NELEM);
>
> next->next = nullptr;
>
> }
>
> char* p = (char*)next;
>
> next = next->next;
>
> pthread_setspecific(k.key,next);
>
> return p;
>
> }
>
> static void free(void* p)
>
> {
>
> Inner* next = (Inner*)pthread_getspecific(k.key);
>
> Inner* tmp = (Inner*)p;
>
> tmp->next = next;
>
> next = tmp;
>
> pthread_setspecific(k.key,next);
>
> }
>
> std::vector<char*> pool_;
>
> unsigned pos_;
>
> struct Inner{
>
> Inner* next;
>
> };
>
> static struct Key{
>
> Key()
>
> {
>
> pthread_key_create(&key,destroy);
>
> }
>
> ~Key()
>
> {
>
> pthread_key_delete(key);
>
> }
>
> static void destroy(void* p)
>
> {
>
> Inner* next = (Inner*)p;
>
> while(next)
>
> {
>
> Inner* tmp = next->next;
>
> ::operator delete(next);
>
> next = tmp;
>
> }
>
> }
>
> pthread_key_t key;
>
> }k;
>
> };
>
>
>
> template <class T>
>
> typename Pool<T>::Key Pool<T>::k;
>
>
>
> struct Node
>
> {
>
> Node *l, *r;
>
> int i;
>
>
>
> Node(int i2) :l(0),r(0), i(i2)
>
> {}
>
> Node(Node *l2, int i2, Node *r2) : l(l2), r(r2), i(i2)
>
> {}
>
> Node(const Node&)=delete;
>
> Node& operator=(const Node&)=delete;
>
> int check() const
>
> {
>
> if (l)
>
> return l->check() + i - r->check();
>
> else return i;
>
> }
>
> };
>
>
>
> typedef boost::object_pool<Node> NodePool;
>
>
>
> Node *make(int i, int d, Pool<Node>& p)
>
> {
>
> if (d > 0)
>
> return p.allocate( make(2*i-1, d-1,p),
>
> i,
>
> make(2*i, d-1,p));
>
> return p.allocate(i);
>
> }
>
>
>
> Node *make(int i, int d, NodePool& p)
>
> {
>
> if (d > 0)
>
> return p.construct( make(2*i-1, d-1,p),
>
> i,
>
> make(2*i, d-1,p));
>
> return p.construct(i);
>
> }
>
>
>
> int main(int argc, char *argv[])
>
> {
>
> int min_depth = 4;
>
> int max_depth = std::max(min_depth+2,
>
> (argc == 2 ? atoi(argv[1]) : 10));
>
> int stretch_depth = max_depth+1;
>
>
>
> // Alloc then dealloc stretchdepth tree
>
> {
>
> Pool<Node> c_pool;
>
> //NodePool c_pool;
>
> Node* c (make(0, stretch_depth,c_pool));
>
> std::cout << "stretch tree of depth " << stretch_depth << "\t "
>
> << "check: " << c->check() << '\n';
>
> }
>
>
>
> Pool<Node> long_lived_tree_pool;
>
> //NodePool long_lived_tree_pool;
>
> Node* long_lived_tree(make(0, max_depth,long_lived_tree_pool));
>
>
>
> char *outputstr = (char*)malloc(LINE_SIZE * (max_depth +1) * sizeof(char));
>
>
>
> #pragma omp parallel for
>
> for (int d = min_depth; d <= max_depth; d += 2)
>
> {
>
> int iterations = 1 << (max_depth - d + min_depth);
>
> int c = 0;
>
>
>
> for (int i = 1; i <= iterations; ++i)
>
> {
>
> Pool<Node> pool;
>
> //NodePool pool;
>
> Node *a(make(i, d,pool)), *b(make(-i, d,pool));
>
> c += a->check() + b->check();
>
> }
>
>
>
> sprintf(outputstr + LINE_SIZE * d, "%d\t trees of depth %d\t check: %d\n", (2 * iterations), d, c);
>
> }
>
>
>
> for (int d = min_depth; d <= max_depth; d += 2)
>
> printf("%s", outputstr + (d * LINE_SIZE) );
>
> free(outputstr);
>
>
>
> std::cout << "long lived tree of depth " << max_depth << "\t "
>
> << "check: " << (long_lived_tree->check()) << '\n';
>
>
>
> return 0;
>
> }




On Saturday, July 28, 2012 7:20:04 PM UTC-4, Melzzzzz wrote:
> I have played a little tree with benchmark from computer language game
>
> site:
>
> http://shootout.alioth.debian.org/u6...st=binarytrees
>
>
>
> Interested in gc vs manual I have been very surprised with Java
>
> performance ;)
>
> Seems that Java gc is blazingly fast in comparison with standard
>
> new/delete.
>
> I think this is because gc deallocates in chunks rather piece by
>
> piece, perhaps allocates from pool. Also there are no destructors
>
> calls.
>
> fastest C++ program on this site uses boost pool which calls
>
> destructors and can deallocate, therefore, is slower then C variant
>
> which uses apache's pool, which just deallocate.
>
>
>
> So I have coded simple custom pool that does not keeps track of
>
> allocated objects and doesn't call destructors and result is
>
> that it is slightly faster than C ;)
>
>
>
> What is clear is that Java garbage collector cannot be bitten
>
> in this case if manual deletion of objects is required.
>
> (as seen here)
>
> http://shootout.alioth.debian.org/u6...st=binarytrees
>
>
>
> Timings for cpp with my custom alloc
>
> bmaxa@maxa:~/shootout/trees$ time ./binarytreescpp 20
>
> stretch tree of depth 21 check: -1
>
> 2097152 trees of depth 4 check: -2097152
>
> 524288 trees of depth 6 check: -524288
>
> 131072 trees of depth 8 check: -131072
>
> 32768 trees of depth 10 check: -32768
>
> 8192 trees of depth 12 check: -8192
>
> 2048 trees of depth 14 check: -2048
>
> 512 trees of depth 16 check: -512
>
> 128 trees of depth 18 check: -128
>
> 32 trees of depth 20 check: -32
>
> long lived tree of depth 20 check: -1
>
>
>
> real 0m4.188s
>
> user 0m7.036s
>
> sys 0m0.112s
>
> bmaxa@maxa:~/shootout/trees$ time taskset -c 0 ./binarytreescpp 20
>
> stretch tree of depth 21 check: -1
>
> 2097152 trees of depth 4 check: -2097152
>
> 524288 trees of depth 6 check: -524288
>
> 131072 trees of depth 8 check: -131072
>
> 32768 trees of depth 10 check: -32768
>
> 8192 trees of depth 12 check: -8192
>
> 2048 trees of depth 14 check: -2048
>
> 512 trees of depth 16 check: -512
>
> 128 trees of depth 18 check: -128
>
> 32 trees of depth 20 check: -32
>
> long lived tree of depth 20 check: -1
>
>
>
> real 0m7.077s
>
> user 0m6.976s
>
> sys 0m0.084s
>
>
>
> Timings for C with apache pool:
>
> bmaxa@maxa:~/shootout/trees$ time ./binarytreesc 20
>
> stretch tree of depth 21 check: -1
>
> 2097152 trees of depth 4 check: -2097152
>
> 524288 trees of depth 6 check: -524288
>
> 131072 trees of depth 8 check: -131072
>
> 32768 trees of depth 10 check: -32768
>
> 8192 trees of depth 12 check: -8192
>
> 2048 trees of depth 14 check: -2048
>
> 512 trees of depth 16 check: -512
>
> 128 trees of depth 18 check: -128
>
> 32 trees of depth 20 check: -32
>
> long lived tree of depth 20 check: -1
>
>
>
> real 0m4.625s
>
> user 0m8.085s
>
> sys 0m0.140s
>
> bmaxa@maxa:~/shootout/trees$ time taskset -c 0 ./binarytreesc 20
>
> stretch tree of depth 21 check: -1
>
> 2097152 trees of depth 4 check: -2097152
>
> 524288 trees of depth 6 check: -524288
>
> 131072 trees of depth 8 check: -131072
>
> 32768 trees of depth 10 check: -32768
>
> 8192 trees of depth 12 check: -8192
>
> 2048 trees of depth 14 check: -2048
>
> 512 trees of depth 16 check: -512
>
> 128 trees of depth 18 check: -128
>
> 32 trees of depth 20 check: -32
>
> long lived tree of depth 20 check: -1
>
>
>
> real 0m8.009s
>
> user 0m7.888s
>
> sys 0m0.116s
>
>
>
> code follows if someone is interested:
>
> /* The Computer Language Benchmarks Game
>
> * http://shootout.alioth.debian.org/
>
> *
>
> * contributed by Jon Harrop
>
> * modified by Alex Mizrahi
>
> * modified by Andreas Schäfer
>
> * very minor omp tweak by The Anh Tran
>
> * custom pool Branimir Maksimovic
>
> */
>
>
>
> #include <iostream>
>
> #include <stdlib.h>
>
> #include <stdio.h>
>
> #include <vector>
>
> #include <new>
>
> #include <pthread.h>
>
> #include <boost/pool/object_pool.hpp>
>
>
>
> const size_t LINE_SIZE = 64;
>
>
>
> template <class T>
>
> class Pool{
>
> static const unsigned NELEM = 10000*sizeof(T);
>
> public:
>
> Pool()
>
> : pos_(0)
>
> {
>
> pool_.push_back(alloc());
>
> }
>
> Pool(const Pool&)=delete;
>
> Pool& operator=(const Pool&)=delete;
>
> ~Pool()
>
> {
>
> for(auto i : pool_)
>
> {
>
> free(i);
>
> }
>
> }
>
> template <class ...T1>
>
> T* allocate(T1... args)
>
> {
>
> if(NELEM <= pos_)
>
> {
>
> pool_.push_back(alloc());
>
> pos_=0;
>
> }
>
> void* p = &pool_.back()[pos_];
>
> pos_+=sizeof(T);
>
> return new(p)T(args...);
>
> }
>
> private:
>
> static char* alloc()
>
> {
>
> Inner* next = (Inner*)pthread_getspecific(k.key);
>
> if(next == nullptr)
>
> {
>
> next = (Inner*)::operator new(NELEM);
>
> next->next = nullptr;
>
> }
>
> char* p = (char*)next;
>
> next = next->next;
>
> pthread_setspecific(k.key,next);
>
> return p;
>
> }
>
> static void free(void* p)
>
> {
>
> Inner* next = (Inner*)pthread_getspecific(k.key);
>
> Inner* tmp = (Inner*)p;
>
> tmp->next = next;
>
> next = tmp;
>
> pthread_setspecific(k.key,next);
>
> }
>
> std::vector<char*> pool_;
>
> unsigned pos_;
>
> struct Inner{
>
> Inner* next;
>
> };
>
> static struct Key{
>
> Key()
>
> {
>
> pthread_key_create(&key,destroy);
>
> }
>
> ~Key()
>
> {
>
> pthread_key_delete(key);
>
> }
>
> static void destroy(void* p)
>
> {
>
> Inner* next = (Inner*)p;
>
> while(next)
>
> {
>
> Inner* tmp = next->next;
>
> ::operator delete(next);
>
> next = tmp;
>
> }
>
> }
>
> pthread_key_t key;
>
> }k;
>
> };
>
>
>
> template <class T>
>
> typename Pool<T>::Key Pool<T>::k;
>
>
>
> struct Node
>
> {
>
> Node *l, *r;
>
> int i;
>
>
>
> Node(int i2) :l(0),r(0), i(i2)
>
> {}
>
> Node(Node *l2, int i2, Node *r2) : l(l2), r(r2), i(i2)
>
> {}
>
> Node(const Node&)=delete;
>
> Node& operator=(const Node&)=delete;
>
> int check() const
>
> {
>
> if (l)
>
> return l->check() + i - r->check();
>
> else return i;
>
> }
>
> };
>
>
>
> typedef boost::object_pool<Node> NodePool;
>
>
>
> Node *make(int i, int d, Pool<Node>& p)
>
> {
>
> if (d > 0)
>
> return p.allocate( make(2*i-1, d-1,p),
>
> i,
>
> make(2*i, d-1,p));
>
> return p.allocate(i);
>
> }
>
>
>
> Node *make(int i, int d, NodePool& p)
>
> {
>
> if (d > 0)
>
> return p.construct( make(2*i-1, d-1,p),
>
> i,
>
> make(2*i, d-1,p));
>
> return p.construct(i);
>
> }
>
>
>
> int main(int argc, char *argv[])
>
> {
>
> int min_depth = 4;
>
> int max_depth = std::max(min_depth+2,
>
> (argc == 2 ? atoi(argv[1]) : 10));
>
> int stretch_depth = max_depth+1;
>
>
>
> // Alloc then dealloc stretchdepth tree
>
> {
>
> Pool<Node> c_pool;
>
> //NodePool c_pool;
>
> Node* c (make(0, stretch_depth,c_pool));
>
> std::cout << "stretch tree of depth " << stretch_depth << "\t "
>
> << "check: " << c->check() << '\n';
>
> }
>
>
>
> Pool<Node> long_lived_tree_pool;
>
> //NodePool long_lived_tree_pool;
>
> Node* long_lived_tree(make(0, max_depth,long_lived_tree_pool));
>
>
>
> char *outputstr = (char*)malloc(LINE_SIZE * (max_depth +1) * sizeof(char));
>
>
>
> #pragma omp parallel for
>
> for (int d = min_depth; d <= max_depth; d += 2)
>
> {
>
> int iterations = 1 << (max_depth - d + min_depth);
>
> int c = 0;
>
>
>
> for (int i = 1; i <= iterations; ++i)
>
> {
>
> Pool<Node> pool;
>
> //NodePool pool;
>
> Node *a(make(i, d,pool)), *b(make(-i, d,pool));
>
> c += a->check() + b->check();
>
> }
>
>
>
> sprintf(outputstr + LINE_SIZE * d, "%d\t trees of depth %d\t check: %d\n", (2 * iterations), d, c);
>
> }
>
>
>
> for (int d = min_depth; d <= max_depth; d += 2)
>
> printf("%s", outputstr + (d * LINE_SIZE) );
>
> free(outputstr);
>
>
>
> std::cout << "long lived tree of depth " << max_depth << "\t "
>
> << "check: " << (long_lived_tree->check()) << '\n';
>
>
>
> return 0;
>
> }



Melzzzzz 07-28-2012 11:49 PM

Re: bench: binary trees
 
On Sat, 28 Jul 2012 16:31:01 -0700 (PDT)
W Karas <wkaras@yahoo.com> wrote:

> On Saturday, July 28, 2012 7:20:04 PM UTC-4, Melzzzzz wrote:
> > I have played a little tree with benchmark from computer language
> > game
> >
> > site:
> >
> > http://shootout.alioth.debian.org/u6...st=binarytrees
> >
> >
> >
> > Interested in gc vs manual I have been very surprised with Java
> >
> > performance ;)
> >
> > Seems that Java gc is blazingly fast in comparison with standard
> >
> > new/delete.

>
> Is it possible that the benchmark never causes GC to run? Is so,
> that would also mean any time it took to run destructors was also
> ignored in the Java case.
>

GC runs heavily in this case as memory will be exhausted pretty
quickly if not.
Take for example comparison between C on single CPU and Java
(non threaded version, as threaded version suffers some on single CPU).
C uses fast pool from apache while Java uses GC!
What is also interesting is that with openmp it is actually easier
to parallelize with C/C++ than Java ;)

bmaxa@maxa:~/shootout/trees$ time java binarytrees 20
stretch tree of depth 21 check: -1
2097152 trees of depth 4 check: -2097152
524288 trees of depth 6 check: -524288
131072 trees of depth 8 check: -131072
32768 trees of depth 10 check: -32768
8192 trees of depth 12 check: -8192
2048 trees of depth 14 check: -2048
512 trees of depth 16 check: -512
128 trees of depth 18 check: -128
32 trees of depth 20 check: -32
long lived tree of depth 20 check: -1

real 0m8.537s
user 0m8.949s
sys 0m0.236s

bmaxa@maxa:~/shootout/trees$ time taskset -c 0 ./binarytreesc 20
stretch tree of depth 21 check: -1
2097152 trees of depth 4 check: -2097152
524288 trees of depth 6 check: -524288
131072 trees of depth 8 check: -131072
32768 trees of depth 10 check: -32768
8192 trees of depth 12 check: -8192
2048 trees of depth 14 check: -2048
512 trees of depth 16 check: -512
128 trees of depth 18 check: -128
32 trees of depth 20 check: -32
long lived tree of depth 20 check: -1

real 0m8.054s
user 0m7.936s
sys 0m0.112s


Juha Nieminen 07-29-2012 05:27 AM

Re: bench: binary trees
 
Melzzzzz <mel@zzzzz.com> wrote:
> Seems that Java gc is blazingly fast in comparison with standard
> new/delete.


Yes, new/delete in C++ is very slow (as well as malloc()/free() in C).
That's the reason why a savvy C++ programmer will avoid doing tons of
them if it's possible to avoid them. Depending on the algorith, this is,
in fact, a relatively frequent occurrence (ie. that you can avoid using
tons of news and deletes).

James Kanze 07-29-2012 10:14 AM

Re: bench: binary trees
 
On Sunday, July 29, 2012 12:20:04 AM UTC+1, Melzzzzz wrote:
> I have played a little tree with benchmark from computer language game
> site:


> http://shootout.alioth.debian.org/u6...st=binarytrees


> Interested in gc vs manual I have been very surprised with Java
> performance ;)


Beware of any benchmark you haven't falsified yourself.

> Seems that Java gc is blazingly fast in comparison with standard
> new/delete.


Typically, the classical algorithms for garbage collection will
be faster than manual allocation in some applications, slower
than manual in other---the classical gc algorithms shine when
there are a lot of small, short lived objects, for example. So
people interested in showing garbage collection in its best
light write benchmarks which make use of a lot of small,
short-lived objects: trees and graphs are very popular for this.
If your application makes extensive use of trees or graphs with
a large number of very small nodes, you'd benefit,
performance-wise, by using garbage collection in C++. (The
Boehm collector works very well, for example.)

Of course, the real argument for garbage collection isn't
performance; it's safety. There's no way to get a dangling
pointer (to dynamically allocated memory) with garbage
collection. So when you destruct an object, you can mark it as
destructed, and be sure that it will stay marked until there are
no pointers to it.

> I think this is because gc deallocates in chunks rather piece by
> piece, perhaps allocates from pool. Also there are no destructors
> calls.


> fastest C++ program on this site uses boost pool which calls
> destructors and can deallocate, therefore, is slower then C variant
> which uses apache's pool, which just deallocate.


> So I have coded simple custom pool that does not keeps track of
> allocated objects and doesn't call destructors and result is
> that it is slightly faster than C ;)


> What is clear is that Java garbage collector cannot be bitten
> in this case if manual deletion of objects is required.


I'm not sure what you mean by "manual deletion". If the objects
have a non-trivial destructor even in the absense of memory
management issues, then it must be called. Regardless of the
language. (Not calling it is a frequent error in Java code.)

--
James

Juha Nieminen 07-29-2012 05:48 PM

Re: bench: binary trees
 
James Kanze <james.kanze@gmail.com> wrote:
> Of course, the real argument for garbage collection isn't
> performance; it's safety. There's no way to get a dangling
> pointer (to dynamically allocated memory) with garbage
> collection. So when you destruct an object, you can mark it as
> destructed, and be sure that it will stay marked until there are
> no pointers to it.


Depends on your definition of "safety". Garbage collection makes the
execution of destructors ("finalizers") non-deterministic and, in fact,
they might not be executed *at all* in Java. Thus destructors are
basically useless. AFAIK Java does not offer any alternative (other
than C-style manual resource management, other than memory).

James Kanze 07-30-2012 10:54 PM

Re: bench: binary trees
 
On Sunday, July 29, 2012 6:48:28 PM UTC+1, Juha Nieminen wrote:
> James Kanze <james.kanze@gmail.com> wrote:


> > Of course, the real argument for garbage collection isn't
> > performance; it's safety. There's no way to get a dangling
> > pointer (to dynamically allocated memory) with garbage
> > collection. So when you destruct an object, you can mark it as
> > destructed, and be sure that it will stay marked until there are
> > no pointers to it.


> Depends on your definition of "safety". Garbage collection makes the
> execution of destructors ("finalizers") non-deterministic and, in fact,
> they might not be executed *at all* in Java. Thus destructors are
> basically useless. AFAIK Java does not offer any alternative (other
> than C-style manual resource management, other than memory).


I don't think you understand how garbage collection works.
Finalizers have nothing to do with destructors; destructors work
exactly as they always have, even with garbage collection.
Garbage collection has no impact with regards to destructors.

Note that the safety aspect doesn't mean that you don't "delete"
objects. What it means, above all, is that you don't reuse
"deleted" memory until there are no more pointers to it. So if
you somehow mark the memory as "deleted", you can detect any
attempt to use it---that mark won't be overwritten by a later
allocation. And that if you forget too delete the object before
the last pointer to it disappears, the finalize method can tell
you.

There's also a convenience aspect, that you don't need to
destruct objects with trivial destructors, and that most
destructors do become trivial once they don't have to worry
about memory management. But this is secondary to the safety
issue.

--
James

Juha Nieminen 07-31-2012 05:21 AM

Re: bench: binary trees
 
James Kanze <james.kanze@gmail.com> wrote:
>> Depends on your definition of "safety". Garbage collection makes the
>> execution of destructors ("finalizers") non-deterministic and, in fact,
>> they might not be executed *at all* in Java. Thus destructors are
>> basically useless. AFAIK Java does not offer any alternative (other
>> than C-style manual resource management, other than memory).

>
> I don't think you understand how garbage collection works.
> Finalizers have nothing to do with destructors; destructors work
> exactly as they always have, even with garbage collection.
> Garbage collection has no impact with regards to destructors.


You are right. I don't understand what you are talking about. I don't
even understand what you mean by "finalizer" and "destructor" here.

Java classes have destructors, but due to GC they get called some time
in the future, maybe not at all. That makes them practically useless.
You better not rely on them to free important non-memory resources.
And Java doesn't really provide any alternative either.

James Kanze 08-01-2012 09:31 PM

Re: bench: binary trees
 
On Tuesday, July 31, 2012 6:21:35 AM UTC+1, Juha Nieminen wrote:
> James Kanze <james.kanze@gmail.com> wrote:


> >> Depends on your definition of "safety". Garbage collection makes the
> >> execution of destructors ("finalizers") non-deterministic and, in fact,
> >> they might not be executed *at all* in Java. Thus destructors are
> >> basically useless. AFAIK Java does not offer any alternative (other
> >> than C-style manual resource management, other than memory).


> > I don't think you understand how garbage collection works.
> > Finalizers have nothing to do with destructors; destructors work
> > exactly as they always have, even with garbage collection.
> > Garbage collection has no impact with regards to destructors.


> You are right. I don't understand what you are talking about. I don't
> even understand what you mean by "finalizer" and "destructor" here.


> Java classes have destructors,


Java classes don't have destructors. At least not according to
the Java language specification.

In practice, "destructors" (in the largest sense of the word)
are essential for some classes. There is a relatively
widespread (but far from universal) convention in Java to call
them "dispose": if your class has a "dispose" function, clients
are expected to call it whenever the lifetime of the object ends
(logically, of course). Typically using a try ... finally,
since the language has no mechanism for automating calling it.

> but due to GC they get called some time in the future, maybe
> not at all. That makes them practically useless.


Those aren't destructors, but finalizer methods. And they can
be useful: if the object requires destruction (most don't), you
assert that the destructor has been called.

> You better not rely on them to free important non-memory resources.


Of course not. That's not what they're there for,

> And Java doesn't really provide any alternative either.


The "alternative" is try ... finally. In my opinion, not a
particularly good alternative, since it leaves the
responsibility in the hands of the client, rather than having
the author of the class enforce it.

But the argument try...finally vs. destructors is orthogonaly to
arguments concerning garbage collection.

--
James

Juha Nieminen 08-02-2012 06:54 AM

Re: bench: binary trees
 
James Kanze <james.kanze@gmail.com> wrote:
> The "alternative" is try ... finally. In my opinion, not a
> particularly good alternative, since it leaves the
> responsibility in the hands of the client, rather than having
> the author of the class enforce it.


Besides, that only works for objects that do not outlive the scope where
they are created, nor are shared. (In other words, it works only for
strictly scope-bound, non-shared objects.) Since it's next to impossible
to implement any kind of reference counting in Java (because you can't
overload a reference assignment nor destruction), and it's overall difficult
to know if an object is being currently shared or not, it's likewise very
difficult to implement any kind of automatic "when the last reference to
this object dies, execute this function".


All times are GMT. The time now is 05:04 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.