Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Why is java consumer/producer so much faster than C++

Reply
Thread Tools

Why is java consumer/producer so much faster than C++

 
 
Melzzzzz
Guest
Posts: n/a
 
      07-22-2012
I have played little bit with new C++11 features and compared,
java performance to c++.
Actually this was meant to be GC vs RAII memory management,
but boiled down to speed of BlockingQueue class in java,
and mine in c++.
It seems that java implementation is so much more efficient
but I don't know why. I even tried c++ without dynamic
memory management (except queue itself) and that is *even slower*.
Must be some quirks with a queue

These are timings:
(java openjdk 1.7)
bmaxa@maxa:~/examples$ time java consprod

real 0m13.411s
user 0m19.853s
sys 0m0.960s

(c++ gcc 4.6.3)
bmaxa@maxa:~/examples$ time ./consprod

real 0m28.726s
user 0m34.518s
sys 0m6.800s

Example programs follow (I think implementations of
blocking queues are similar):
// java
import java.util.concurrent.*;
import java.util.Random;

class Vars{
public final static int nitems = 100000000;
public final static Random rnd = new Random(12345);
public final static int size = 10000;
}

class Producer implements Runnable {
private final BlockingQueue<Integer> queue;
Producer(BlockingQueue<Integer> q) { queue = q; }
public void run() {
try {
int i = Vars.nitems;
while(i-- > 0) { queue.put(produce(i)); }
} catch (InterruptedException ex)
{

}
}
Integer produce(int i) { return new Integer(i); }
}

class Consumer implements Runnable {
private final BlockingQueue<Integer> queue;
Consumer(BlockingQueue<Integer> q)
{
queue = q;
}
public void run() {
try {
Integer[] arr = new Integer[10000];
int i = Vars.nitems;
while(i-- > 0) { consume(queue.take(),arr); }
} catch (InterruptedException ex)
{
}
}
void consume(Integer x, Integer[] arr)
{
arr[Vars.rnd.nextInt(Vars.size)] = x;
}
}

public class consprod {
public static void main(String [] args) {
try{
BlockingQueue<Integer> q = new ArrayBlockingQueue<Integer>(100000);
Producer p = new Producer(q);
Consumer c = new Consumer(q);
new Thread(p).start();
new Thread(c).start();
} catch(Exception e)
{
e.printStackTrace();
}
}
}
//-----------------------------------------
// c++
#include <condition_variable>
#include <mutex>
#include <thread>
#include <deque>
#include <cstdlib>

template <class T>
class BlockingQueue{
public:
BlockingQueue(unsigned cap):capacity_(cap)
{
}
void put(T t)
{
std::unique_lock<std::mutex> lock(m_);
while(queue_.size() >= capacity_)c_full_.wait(lock);
queue_.push_back(std::move(t));
c_empty_.notify_one();
}
T take()
{
std::unique_lock<std::mutex> lock(m_);
while(queue_.empty())c_empty_.wait(lock);
T tmp = std::move(queue_.front());
queue_.pop_front();
c_full_.notify_one();
return std::move(tmp);
}
bool empty()
{
std::unique_lock<std::mutex> lock(m_);
return queue_.empty();
}
private:
std::mutex m_;
std::condition_variable c_empty_,c_full_;
std::deque<T> queue_;
unsigned capacity_;
};

int main()
{
BlockingQueue<std::unique_ptr<int>> produced(100000);
const int nitems = 100000000;
std::srand(12345);


std::function<void()> f_prod = [&]() {
int i = nitems;
while(i-- > 0){
produced.put(std::unique_ptr<int>(new int(i)));
}
};

std::thread producer1(f_prod);

std::function<void()> f_cons = [&]() {
const int size = 10000;
std::unique_ptr<int> arr[size];
int i = nitems;
while(i-- > 0)
{
arr[std::rand()%size] = produced.take();
}
};

std::thread consumer1(f_cons);

producer1.join();
consumer1.join();
}
// --------------------------------------

 
Reply With Quote
 
 
 
 
Joshua Maurice
Guest
Posts: n/a
 
      07-22-2012
On Jul 22, 2:59*pm, Melzzzzz <m...@zzzzz.com> wrote:
> I have played little bit with new C++11 features and compared,
> java performance to c++.
> Actually this was meant to be GC vs RAII memory management,
> but boiled down to speed of BlockingQueue class in java,
> and mine in c++.
> It seems that java implementation is so much more efficient
> but I don't know why. I even tried c++ without dynamic
> memory management (except queue itself) and that is *even slower*.
> Must be some quirks with a queue
>
> These are timings:
> (java openjdk 1.7)
> bmaxa@maxa:~/examples$ time java consprod
>
> real * *0m13.411s
> user * *0m19.853s
> sys * * 0m0.960s
>
> (c++ gcc 4.6.3)
> bmaxa@maxa:~/examples$ time ./consprod
>
> real * *0m28.726s
> user * *0m34.518s
> sys * * 0m6.800s
>
> Example programs follow (I think implementations of
> blocking queues are similar):
> // java
> import java.util.concurrent.*;
> import java.util.Random;
>
> class Vars{
> public final static int nitems = 100000000;
> public final static Random rnd = new Random(12345);
> public final static int size = 10000;
>
> }
>
> class Producer implements Runnable {
> * *private final BlockingQueue<Integer> queue;
> * *Producer(BlockingQueue<Integer> q) { queue = q; }
> * *public void run() {
> * * *try {
> * * * *int i = Vars.nitems;
> * * * *while(i-- > 0) { queue.put(produce(i)); }
> * * *} catch (InterruptedException ex)
> * * *{
>
> * * *}
> * *}
> * *Integer produce(int i) { return new Integer(i); }
> *}
>
> *class Consumer implements Runnable {
> * *private final BlockingQueue<Integer> queue;
> * *Consumer(BlockingQueue<Integer> q)
> * *{
> * * queue = q;
> * *}
> * *public void run() {
> * * *try {
> * * * *Integer[] arr = new Integer[10000];
> * * * *int i = Vars.nitems;
> * * * *while(i-- > 0) { consume(queue.take(),arr); }
> * * *} catch (InterruptedException ex)
> * * *{
> * * *}
> * *}
> * *void consume(Integer x, Integer[] arr)
> * *{
> * * arr[Vars.rnd.nextInt(Vars.size)] = x;
> * *}
> *}
>
> *public class consprod {
> * *public static void main(String [] args) {
> * *try{
> * * *BlockingQueue<Integer> q = new ArrayBlockingQueue<Integer>(100000);
> * * *Producer p = new Producer(q);
> * * *Consumer c = new Consumer(q);
> * * *new Thread(p).start();
> * * *new Thread(c).start();
> * * *} catch(Exception e)
> * * *{
> * * * e.printStackTrace();
> * * *}
> * *}
> *}
> //-----------------------------------------
> // c++
> #include <condition_variable>
> #include <mutex>
> #include <thread>
> #include <deque>
> #include <cstdlib>
>
> template <class T>
> class BlockingQueue{
> public:
> * * BlockingQueue(unsigned cap):capacity_(cap)
> * * {
> * * }
> * * void put(T t)
> * * {
> * * * * std::unique_lock<std::mutex> lock(m_);
> * * * * while(queue_.size() >= capacity_)c_full_.wait(lock);
> * * * * queue_.push_back(std::move(t));
> * * * * c_empty_.notify_one();
> * * }
> * * T take()
> * * {
> * * * * std::unique_lock<std::mutex> lock(m_);
> * * * * while(queue_.empty())c_empty_.wait(lock);
> * * * * T tmp = std::move(queue_.front());
> * * * * queue_.pop_front();
> * * * * c_full_.notify_one();
> * * * * return std::move(tmp);
> * * }
> * * bool empty()
> * * {
> * * * * std::unique_lock<std::mutex> lock(m_);
> * * * * return queue_.empty();
> * * }
> private:
> * * std::mutex m_;
> * * std::condition_variable c_empty_,c_full_;
> * * std::deque<T> queue_;
> * * unsigned capacity_;
>
> };
>
> int main()
> {
> * * BlockingQueue<std::unique_ptr<int>> produced(100000);
> * * const int nitems = 100000000;
> * * std::srand(12345);
>
> * * std::function<void()> f_prod = [&]() {
> * * * * int i = nitems;
> * * * * while(i-- > 0){
> * * * * * * produced.put(std::unique_ptr<int>(new int(i)));
> * * * * }
> * * };
>
> * * std::thread producer1(f_prod);
>
> * * std::function<void()> f_cons = [&]() {
> * * * * const int size = 10000;
> * * * * std::unique_ptr<int> arr[size];
> * * * * int i = nitems;
> * * * * while(i-- > 0)
> * * * * {
> * * * * * * arr[std::rand()%size] = produced.take();
> * * * * }
> * * };
>
> * * std::thread consumer1(f_cons);
>
> * * producer1.join();
> * * consumer1.join();}



What g++ optimization options did you use? If you didn't compile with
proper optimization flags (e.g. at least -O2), then the numbers you
have are meaningless.

Also, you might be testing the difference between std::rand() and
Java's Random.

Also, why did you use dynamic allocation in the C++ code for an int?
If your original goal was to compare the Java way to the C++ way, you
are not doing it right. This is especially troubling after you
apparently went through the effort to make the queue support move-only
types. (Not sure. I'm still new to move-semantics.)

Also,
> std::unique_ptr<int> arr[size];

....
> arr[std::rand()%size] = produced.take();

I wonder if it's possible to optimize that into a simple assignment,
or whether there will be a call to delete() - or at least a branch to
test if the internal member pointer is null.

Just off the top of my head.
 
Reply With Quote
 
 
 
 
Melzzzzz
Guest
Posts: n/a
 
      07-22-2012
On Sun, 22 Jul 2012 15:42:55 -0700 (PDT)
Joshua Maurice <> wrote:

>
>
> What g++ optimization options did you use? If you didn't compile with
> proper optimization flags (e.g. at least -O2), then the numbers you
> have are meaningless.


I have used -O2.

>
> Also, you might be testing the difference between std::rand() and
> Java's Random.


Possibly.

>
> Also, why did you use dynamic allocation in the C++ code for an int?


Actually with non dynamical allocation seems to be slower

> If your original goal was to compare the Java way to the C++ way, you
> are not doing it right. This is especially troubling after you
> apparently went through the effort to make the queue support move-only
> types. (Not sure. I'm still new to move-semantics.)


That was necessary in order to put unique_ptr in queue.

>
> Also,
> > std::unique_ptr<int> arr[size];

> ...
> > arr[std::rand()%size] = produced.take();

> I wonder if it's possible to optimize that into a simple assignment,
> or whether there will be a call to delete() - or at least a branch to
> test if the internal member pointer is null.


I have removed that statement , removed called to rand, left
only take, made int's non dynamic
and I got following time:

bmaxa@maxa:~/examples$ g++ -Wall -O2 -pthread -std=c++0x consprod1.cpp -oconsprod1
consprod1.cpp: In lambda function:
consprod1.cpp:58:19: warning: unused variable ‘size’ [-Wunused-variable]
bmaxa@maxa:~/examples$ time ./consprod1

real 0m23.335s
user 0m22.177s
sys 0m16.417s

with java, I have removed rand too...
bmaxa@maxa:~/examples$ javac consprod.java
bmaxa@maxa:~/examples$ time java consprod

real 0m12.491s
user 0m21.001s
sys 0m0.524s

Still twice as fast as c++.
What it looks like, is that c++ spends much more time in syscalls
(futex) than java and that explains big difference.



 
Reply With Quote
 
Luca Risolia
Guest
Posts: n/a
 
      07-23-2012
On 23/07/2012 01:28, Melzzzzz wrote:
> Still twice as fast as c++.
> What it looks like, is that c++ spends much more time in syscalls
> (futex) than java and that explains big difference.


A lock-free "BlockingQueue" would probably be more efficient.
You can also try to implement your Queue with std::array instead of
std::deque and see how it performs.

In the C++ version change:

while (queue_.size() >= capacity_)c_full_.wait(lock);
with
c_full_.wait(lock, [this]{return queue_.size() < capacity_;});

and
while (queue_.empty())c_empty_.wait(lock);
with
c_empty_.wait(lock, [this]{return !queue_.empty();});

Also use a std::lock_guard in empty() instead of std::unique_lock. The
former is faster.

(On a side note, the C++ version you have given does not have the best
possible interface for a thread-safe queue)

 
Reply With Quote
 
Juha Nieminen
Guest
Posts: n/a
 
      07-23-2012
In comp.lang.c++ Melzzzzz <> wrote:
> I even tried c++ without dynamic
> memory management (except queue itself) and that is *even slower*.


If two C++ programs are otherwise identical, except that one uses
'new int' and the other just uses the ints by value, and the former
is faster than the latter, then there's something horribly wrong in
the way you are measuring, or something else. I don't think it's
physically possible for 'new int' to be faster than using ints by
value under any possible circumstance, even if we assumed a highly
optimized version of 'new' that does magic under the hood to be 10
times faster than the regular 'new'.

> std::unique_lock<std::mutex> lock(m_);


> produced.put(std::unique_ptr<int>(new int(i)));


> arr[std::rand()%size] = produced.take();


I don't know what's the cause of the slowness in your program, but
I quoted above the three possible culprits I would first investigate
(which happen to be in order of likeliness).

Locks are slow. They are probably slower than 'new'.

'new' is slow, especially when compared to java's.

std::rand() is slow, although probably does not account for that much
slowness, but could be a minor contributing factor.
 
Reply With Quote
 
Melzzzzz
Guest
Posts: n/a
 
      07-23-2012
On Mon, 23 Jul 2012 02:03:28 +0200
Luca Risolia <> wrote:

> On 23/07/2012 01:28, Melzzzzz wrote:
> > Still twice as fast as c++.
> > What it looks like, is that c++ spends much more time in syscalls
> > (futex) than java and that explains big difference.

>
> A lock-free "BlockingQueue" would probably be more efficient.
> You can also try to implement your Queue with std::array instead of
> std::deque and see how it performs.


I will try. Seems that java ArrayBlockingQueue is more efficient
because if that.

>
> In the C++ version change:
>
> while (queue_.size() >= capacity_)c_full_.wait(lock);
> with
> c_full_.wait(lock, [this]{return queue_.size() < capacity_;});
>
> and
> while (queue_.empty())c_empty_.wait(lock);
> with
> c_empty_.wait(lock, [this]{return !queue_.empty();});
>
> Also use a std::lock_guard in empty() instead of std::unique_lock.
> The former is faster.


Yes, I got speed up of few seconds, thanks.

>
> (On a side note, the C++ version you have given does not have the
> best possible interface for a thread-safe queue)


I have just copied Java BlockingQueue interface, which is of course
suited for Java. Don't know actually how ideal interface would look
like.


>



 
Reply With Quote
 
Melzzzzz
Guest
Posts: n/a
 
      07-23-2012
On Mon, 23 Jul 2012 06:37:17 +0000 (UTC)
Juha Nieminen <> wrote:

> In comp.lang.c++ Melzzzzz <> wrote:
> > I even tried c++ without dynamic
> > memory management (except queue itself) and that is *even slower*.

>
> If two C++ programs are otherwise identical, except that one uses
> 'new int' and the other just uses the ints by value, and the former
> is faster than the latter, then there's something horribly wrong in
> the way you are measuring, or something else.


I think that pressure on condition variable is greater with non
dynamic allocation, therefore it is slower.
If I reduce queue capacity on eg 1000 (more pressure on condition
variable) than it is much slower than with capacity of 100000.


I don't think it's
> physically possible for 'new int' to be faster than using ints by
> value under any possible circumstance, even if we assumed a highly
> optimized version of 'new' that does magic under the hood to be 10
> times faster than the regular 'new'.


That's because new is very fast and don;t take much time of program
in this example.

>
> > std::unique_lock<std::mutex> lock(m_);

>
> > produced.put(std::unique_ptr<int>(new int(i)));

>
> > arr[std::rand()%size] = produced.take();

>
> I don't know what's the cause of the slowness in your program, but
> I quoted above the three possible culprits I would first investigate
> (which happen to be in order of likeliness).
>
> Locks are slow. They are probably slower than 'new'.


I think that locks are not that slow in comparison to
condition variables. I have feeling that somehow
Java has less pressure on condition variables
in it's queue implementation.

>
> 'new' is slow, especially when compared to java's.


In this example new takes very little time.

>
> std::rand() is slow, although probably does not account for that much
> slowness, but could be a minor contributing factor.


This also takes little time.
Here is difference between new and rand and without.
Both versions are optimized now as per Luca's suggestion.


(with new and rand array assignment)
bmaxa@maxa:~/examples$ time ./consprod

real 0m25.127s
user 0m32.202s
sys 0m5.540s
(without)
bmaxa@maxa:~/examples$ time ./consprod1

real 0m23.883s
user 0m23.009s
sys 0m17.153s

Java:
(without rand and array assignment)
bmaxa@maxa:~/examples$ time java consprod

real 0m12.825s
user 0m21.833s
sys 0m0.592s

(with rand and array assignment)
bmaxa@maxa:~/examples$ time java consprod

real 0m15.571s
user 0m25.634s
sys 0m1.168s

Java pays higher price for random assignments, I think.


 
Reply With Quote
 
Juha Nieminen
Guest
Posts: n/a
 
      07-23-2012
In comp.lang.c++ Melzzzzz <> wrote:
> I think that pressure on condition variable is greater with non
> dynamic allocation, therefore it is slower.


That sentence doesn't make any kind of sense.

> I don't think it's
>> physically possible for 'new int' to be faster than using ints by
>> value under any possible circumstance, even if we assumed a highly
>> optimized version of 'new' that does magic under the hood to be 10
>> times faster than the regular 'new'.

>
> That's because new is very fast and don;t take much time of program
> in this example.


'new' is a very slow operation in C++, and even if it weren't, it just
can't be faster than using an integer by value. It's physically impossible.
(The pointer that points to the allocated int is also an integral value at
the low level, so by allocating something dynamically you are only adding
to the amount of values being handled, and the overall complexity of the
program.)

Can you give any scenario where handling pointers would be faster than
handling ints?

>> 'new' is slow, especially when compared to java's.

>
> In this example new takes very little time.


You are executing tens of thousands of 'new's. That is slow.
 
Reply With Quote
 
Melzzzzz
Guest
Posts: n/a
 
      07-23-2012
On Mon, 23 Jul 2012 11:46:37 +0000 (UTC)
Juha Nieminen <> wrote:

> In comp.lang.c++ Melzzzzz <> wrote:
> > I think that pressure on condition variable is greater with non
> > dynamic allocation, therefore it is slower.

>
> That sentence doesn't make any kind of sense.

I has sense in this context. I will provide example.

>
> > I don't think it's
> >> physically possible for 'new int' to be faster than using ints by
> >> value under any possible circumstance, even if we assumed a highly
> >> optimized version of 'new' that does magic under the hood to be 10
> >> times faster than the regular 'new'.

> >
> > That's because new is very fast and don;t take much time of program
> > in this example.

>
> 'new' is a very slow operation in C++, and even if it weren't, it just
> can't be faster than using an integer by value. It's physically
> impossible. (The pointer that points to the allocated int is also an
> integral value at the low level, so by allocating something
> dynamically you are only adding to the amount of values being
> handled, and the overall complexity of the program.)


Yes, but overall complexity can be masked with something else like in
this case.

>
> Can you give any scenario where handling pointers would be faster than
> handling ints?


This scenario is exactly example. My queue is limited to capacity
nodes. When capacity is reached, producer waits on condition
variable until it is emptied.
In first example queue was 100000 nodes so slow down with plain ints
isn't just that noticabe.
But if I put just 1000 nodes here is what happens:
(dynamic ints , unique_ptr, random function and array assignement)
bmaxa@maxa:~/examples$ time ./consprod

real 0m23.523s
user 0m32.390s
sys 0m10.581s

but look at this:
(no dynamic ints, no random function, no assignment)
:
bmaxa@maxa:~/examples$ time ./consprod1

real 0m35.654s
user 0m30.882s
sys 0m34.202s

it's more than 10 seconds slower! just ints!


>
> >> 'new' is slow, especially when compared to java's.

> >
> > In this example new takes very little time.

>
> You are executing tens of thousands of 'new's. That is slow.


It is slow but waiting on condition is much slower so,
faster producer increases number of times condition waits,
so actually that makes program slower...



 
Reply With Quote
 
Zoltan Juhasz
Guest
Posts: n/a
 
      07-23-2012
On Sunday, 22 July 2012 17:59:20 UTC-4, Melzzzzz wrote:
> I have played little bit with new C++11 features and compared,
> java performance to c++.


My take on this:

I guess the Java BlockingQueue is a lock-free implementation
of the blocking circular-buffer concept. Your implementation
uses a global, single lock around operations, the _m mutex,
which essentially serializes your code (plus the synchronization
overhead). Your code might be executed concurrently, but not in
parallel for sure.

What is wrong with your code? That you compare a naive,
inefficient blocking queue implementation, with an industrial
strength solution.

- look up a lock-free, single producer / single consumer
circular-buffer implementation
- use rvalue semantics to move items around, no need for
ptrs; at very least specialize the container for built-in types
- eliminate noise (i.e. rand)

The most important part: profile your code to find out
where you spend your time, whether you have some kind of
contention etc. Performance optimization is a tricky business,
especially in a multi-threaded environment, naive
implementations usually produce catastrophic results (see above).

-- Zoltan
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Why {} is much faster than Hash.new ? Iñaki Baz Castillo Ruby 5 01-17-2011 02:30 PM
Wow, Python much faster than MatLab Stef Mientki Python 11 01-01-2007 02:05 AM
Is a Java Application much faster than an Applet. Sanny Java 12 12-15-2006 02:47 AM
findcontrol("PlaceHolderPrice") why why why why why why why why why why why Mr. SweatyFinger ASP .Net 2 12-02-2006 03:46 PM
why is perl -e 'unlink(glob("*"))' so much faster than rm ? ewaguespack@gmail.com Perl Misc 13 07-19-2006 10:37 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57