Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Why is java consumer/producer so much faster than C++

Reply
Thread Tools

Why is java consumer/producer so much faster than C++

 
 
Melzzzzz
Guest
Posts: n/a
 
      07-23-2012
On Mon, 23 Jul 2012 06:57:03 -0700 (PDT)
Zoltan Juhasz <(E-Mail Removed)> wrote:

> On Sunday, 22 July 2012 17:59:20 UTC-4, Melzzzzz wrote:
> > I have played little bit with new C++11 features and compared,
> > java performance to c++.

>
> My take on this:
>
> I guess the Java BlockingQueue is a lock-free implementation
> of the blocking circular-buffer concept. Your implementation
> uses a global, single lock around operations, the _m mutex,
> which essentially serializes your code (plus the synchronization
> overhead). Your code might be executed concurrently, but not in
> parallel for sure.
>

This is implementation of put/take methods of Java queue:

public void put(E e) throws InterruptedException {
244: if (e == null) throw new NullPointerException();
245: final E[] items = this.items;
246: final ReentrantLock lock = this.lock;
247: lock.lockInterruptibly();
248: try {
249: try {
250: while (count == items.length)
251: notFull.await();
252: } catch (InterruptedException ie) {
253: notFull.signal(); // propagate to non-interrupted
thread 254: throw ie;
255: }
256: insert(e);
257: } finally {
258: lock.unlock();
259: }
260: }

310: public E take() throws InterruptedException {
311: final ReentrantLock lock = this.lock;
312: lock.lockInterruptibly();
313: try {
314: try {
315: while (count == 0)
316: notEmpty.await();
317: } catch (InterruptedException ie) {
318: notEmpty.signal(); // propagate to non-interrupted thread
319: throw ie;
320: }
321: E x = extract();
322: return x;
323: } finally {
324: lock.unlock();
325: }
326: }
327:

> What is wrong with your code? That you compare a naive,
> inefficient blocking queue implementation, with an industrial
> strength solution.


Heh, I don't think so. This implementation is pretty classic.
It does well for blocking queue, but this test is too
much for it. It gets contented with mutex/condition_variable
calls, so does Java, but for some reason, much less.


>
> - look up a lock-free, single producer / single consumer
> circular-buffer implementation


Java ArrayBlockingQueue use circular buffer but is
not lock free, rather implementation is identical,
as I see (one mutex, two condition variables)

> - use rvalue semantics to move items around, no need for
> ptrs; at very least specialize the container for built-in types
> - eliminate noise (i.e. rand)
>
> The most important part: profile your code to find out
> where you spend your time, whether you have some kind of
> contention etc. Performance optimization is a tricky business,
> especially in a multi-threaded environment, naive
> implementations usually produce catastrophic results (see above).


Agreed. I think that I am close. Somehow (probably because it
is faster? c++ puts much more pressure on queue than Java does).
Perhaps someone can confirm or deny this.



 
Reply With Quote
 
 
 
 
Howard Hinnant
Guest
Posts: n/a
 
      07-23-2012
I made a few changes to your code:

1. I only signal if the capacity becomes non-zero or non-full.

2. If the queue is the queue has got some items in it and the mutex is locked, the put thread yields to the take thread. If the queue has some free space in it and the mutex is locked, the take thread yields to the put thread.

3. Traffic in int instead of unique_ptr<int>.


#include <condition_variable>
#include <mutex>
#include <thread>
#include <deque>
#include <cstdlib>

template <class T>
class BlockingQueue{
public:
BlockingQueue(unsigned cap):capacity_(cap)
{
}
void put(T t)
{
std::unique_lock<std::mutex> lock(m_, std::try_to_lock);
int retry = 0;
while (!lock.owns_lock())
{
if (queue_.size() > capacity_/4 && ++retry < 1000)
{
std::this_thread::yield();
}
else
{
lock.lock();
}
}
while(queue_.size() >= capacity_)c_full_.wait(lock);
queue_.push_back(std::move(t));
if (queue_.size() == 1)
c_empty_.notify_one();
}
T take()
{
std::unique_lock<std::mutex> lock(m_, std::try_to_lock);
int retry = 0;
while (!lock.owns_lock())
{
if (queue_.size() < 3*capacity_/4 && ++retry < 1000)
{
std::this_thread::yield();
}
else
{
lock.lock();
}
}
while(queue_.empty())c_empty_.wait(lock);
T tmp = std::move(queue_.front());
queue_.pop_front();
if (queue_.size() == capacity_-1)
c_full_.notify_one();
return tmp;
}
bool empty()
{
std::unique_lock<std::mutex> lock(m_);
return queue_.empty();
}
private:
std::mutex m_;
std::condition_variable c_empty_,c_full_;
std::deque<T> queue_;
unsigned capacity_;
};

int main()
{
BlockingQueue<int> produced(100000);
const int nitems = 100000000;
std::srand(12345);


std::function<void()> f_prod = [&]() {
int i = nitems;
while(i-- > 0){
produced.put(i);
}
};

std::thread producer1(f_prod);

std::function<void()> f_cons = [&]() {
const int size = 10000;
int arr[size];
int i = nitems;
while(i-- > 0)
{
arr[std::rand()%size] = produced.take();
}
};

std::thread consumer1(f_cons);

producer1.join();
consumer1.join();
}

On my system this sped the C++ solution up considerably (to about 13.7 seconds). I didn't time the Java solution.
 
Reply With Quote
 
 
 
 
Dombo
Guest
Posts: n/a
 
      07-23-2012
Op 22-Jul-12 23:59, Melzzzzz schreef:
> I have played little bit with new C++11 features and compared,
> java performance to c++.
> Actually this was meant to be GC vs RAII memory management,
> but boiled down to speed of BlockingQueue class in java,
> and mine in c++.
> It seems that java implementation is so much more efficient
> but I don't know why. I even tried c++ without dynamic
> memory management (except queue itself) and that is *even slower*.
> Must be some quirks with a queue
>
> These are timings:
> (java openjdk 1.7)
> bmaxa@maxa:~/examples$ time java consprod
>
> real 0m13.411s
> user 0m19.853s
> sys 0m0.960s
>
> (c++ gcc 4.6.3)
> bmaxa@maxa:~/examples$ time ./consprod
>
> real 0m28.726s
> user 0m34.518s
> sys 0m6.800s
>
> Example programs follow (I think implementations of
> blocking queues are similar):
> // java
> import java.util.concurrent.*;
> import java.util.Random;
>
> class Vars{
> public final static int nitems = 100000000;
> public final static Random rnd = new Random(12345);
> public final static int size = 10000;
> }
>
> class Producer implements Runnable {
> private final BlockingQueue<Integer> queue;
> Producer(BlockingQueue<Integer> q) { queue = q; }
> public void run() {
> try {
> int i = Vars.nitems;
> while(i-- > 0) { queue.put(produce(i)); }
> } catch (InterruptedException ex)
> {
>
> }
> }
> Integer produce(int i) { return new Integer(i); }
> }
>
> class Consumer implements Runnable {
> private final BlockingQueue<Integer> queue;
> Consumer(BlockingQueue<Integer> q)
> {
> queue = q;
> }
> public void run() {
> try {
> Integer[] arr = new Integer[10000];
> int i = Vars.nitems;
> while(i-- > 0) { consume(queue.take(),arr); }
> } catch (InterruptedException ex)
> {
> }
> }
> void consume(Integer x, Integer[] arr)
> {
> arr[Vars.rnd.nextInt(Vars.size)] = x;
> }
> }
>
> public class consprod {
> public static void main(String [] args) {
> try{
> BlockingQueue<Integer> q = new ArrayBlockingQueue<Integer>(100000);
> Producer p = new Producer(q);
> Consumer c = new Consumer(q);
> new Thread(p).start();
> new Thread(c).start();
> } catch(Exception e)
> {
> e.printStackTrace();
> }
> }
> }
> //-----------------------------------------
> // c++
> #include <condition_variable>
> #include <mutex>
> #include <thread>
> #include <deque>
> #include <cstdlib>
>
> template <class T>
> class BlockingQueue{
> public:
> BlockingQueue(unsigned cap):capacity_(cap)
> {
> }
> void put(T t)
> {
> std::unique_lock<std::mutex> lock(m_);
> while(queue_.size() >= capacity_)c_full_.wait(lock);
> queue_.push_back(std::move(t));
> c_empty_.notify_one();
> }
> T take()
> {
> std::unique_lock<std::mutex> lock(m_);
> while(queue_.empty())c_empty_.wait(lock);
> T tmp = std::move(queue_.front());
> queue_.pop_front();
> c_full_.notify_one();
> return std::move(tmp);
> }
> bool empty()
> {
> std::unique_lock<std::mutex> lock(m_);
> return queue_.empty();
> }
> private:
> std::mutex m_;
> std::condition_variable c_empty_,c_full_;
> std::deque<T> queue_;
> unsigned capacity_;
> };
>
> int main()
> {
> BlockingQueue<std::unique_ptr<int>> produced(100000);
> const int nitems = 100000000;
> std::srand(12345);
>
>
> std::function<void()> f_prod = [&]() {
> int i = nitems;
> while(i-- > 0){
> produced.put(std::unique_ptr<int>(new int(i)));
> }
> };
>
> std::thread producer1(f_prod);
>
> std::function<void()> f_cons = [&]() {
> const int size = 10000;
> std::unique_ptr<int> arr[size];
> int i = nitems;
> while(i-- > 0)
> {
> arr[std::rand()%size] = produced.take();
> }
> };
>
> std::thread consumer1(f_cons);
>
> producer1.join();
> consumer1.join();
> }
> // --------------------------------------


In the C++ version at the end of the main function it waits for the
threads to complete, in the Java version the threads are not joined at
the end of the main function. I don't know if Java keeps the process
alive until all threads have terminated, but if not it would explain the
difference.

I'm a bit surpirsed that the C++ version compiles without complaints on
your compiler because there is no return statement in the main() function.


 
Reply With Quote
 
Melzzzzz
Guest
Posts: n/a
 
      07-23-2012
On Mon, 23 Jul 2012 13:23:30 -0700 (PDT)
Howard Hinnant <(E-Mail Removed)> wrote:

> I made a few changes to your code:
>
>
> On my system this sped the C++ solution up considerably (to about
> 13.7 seconds). I didn't time the Java solution.


Yes, that's it, on my system:
bmaxa@maxa:~/examples$ time ./consprod2

real 0m9.478s
user 0m10.797s
sys 0m7.196s

which is more than twice speed of original program!
(and faster than java)

 
Reply With Quote
 
Luca Risolia
Guest
Posts: n/a
 
      07-23-2012
On 23/07/2012 12:17, Melzzzzz wrote:
>> (On a side note, the C++ version you have given does not have the
>> best possible interface for a thread-safe queue)

>
> I have just copied Java BlockingQueue interface, which is of course
> suited for Java. Don't know actually how ideal interface would look
> like.


In C++11 the ideal interface for a generic thread-safe and
exception-safe queue should at least provide std::shared_ptr<T> take()
and/or void take(T&). T take() is limited to those types having no-throw
copy constructors or move-constructors. This also means that you should
check the type at compile-time by using type traits. However, in your
test case you used move-semantic everywhere, so that is not problem.
Also note that the term "pop" instead of "take" is more in the spirit of
C++.

 
Reply With Quote
 
Howard Hinnant
Guest
Posts: n/a
 
      07-24-2012
On Monday, July 23, 2012 6:32:10 PM UTC-4, Melzzzzz wrote:
> On Mon, 23 Jul 2012 13:23:30 -0700 (PDT)
> Howard Hinnant wrote:
>
> &gt; I made a few changes to your code:
> &gt;
> &gt;
> &gt; On my system this sped the C++ solution up considerably (to about
> &gt; 13.7 seconds). I didn't time the Java solution.
>
> Yes, that's it, on my system:
> bmaxa@maxa:~/examples$ time ./consprod2
>
> real 0m9.478s
> user 0m10.797s
> sys 0m7.196s
>
> which is more than twice speed of original program!
> (and faster than java)


At second glance I wasn't that fond of my solution and tweaked it as below:


void put(T t)
{
std::unique_lock<std::mutex> lock(m_, std::try_to_lock);
while (!lock.owns_lock())
{
if (queue_.size() > capacity_/4)
{
for (int i = 0; i < 3250; ++i)
std::this_thread::yield();
lock.try_lock();
}
else
{
lock.lock();
break;
}
}
while(queue_.size() >= capacity_)c_full_.wait(lock);
queue_.push_back(std::move(t));
if (queue_.size() == 1)
c_empty_.notify_one();
}
T take()
{
std::unique_lock<std::mutex> lock(m_, std::try_to_lock);
int retry = 0;
while (!lock.owns_lock())
{
if (queue_.size() < 3*capacity_/4)
{
for (int i = 0; i < 3250; ++i)
std::this_thread::yield();
lock.try_lock();
}
else
{
lock.lock();
break;
}
}
while(queue_.empty())c_empty_.wait(lock);
T tmp = std::move(queue_.front());
queue_.pop_front();
if (queue_.size() == capacity_-1)
c_full_.notify_one();
return tmp;
}

I am admittedly coding to the benchmark (which is not a really great idea). But I got the time on my system down from 13.7 seconds to about 9.7 seconds (another 40%).
 
Reply With Quote
 
Joshua Maurice
Guest
Posts: n/a
 
      07-24-2012
On Jul 23, 1:57*pm, Dombo <(E-Mail Removed)> wrote:
> I'm a bit surpirsed that the C++ version compiles without complaints on
> your compiler because there is no return statement in the main() function..


It's an exception in C and C++ for main only. Running off the end of
main is allowed, and is equivalent to return 0, itself equivalent to
exit(0), IIRC.
 
Reply With Quote
 
Melzzzzz
Guest
Posts: n/a
 
      07-24-2012
On Mon, 23 Jul 2012 18:11:58 -0700 (PDT)
Howard Hinnant <(E-Mail Removed)> wrote:

> On Monday, July 23, 2012 6:32:10 PM UTC-4, Melzzzzz wrote:
> > On Mon, 23 Jul 2012 13:23:30 -0700 (PDT)
> > Howard Hinnant wrote:
> >
> > &gt; I made a few changes to your code:
> > &gt;
> > &gt;
> > &gt; On my system this sped the C++ solution up considerably (to
> > about &gt; 13.7 seconds). I didn't time the Java solution.
> >
> > Yes, that's it, on my system:
> > bmaxa@maxa:~/examples$ time ./consprod2
> >
> > real 0m9.478s
> > user 0m10.797s
> > sys 0m7.196s
> >
> > which is more than twice speed of original program!
> > (and faster than java)

>
> At second glance I wasn't that fond of my solution and tweaked it as
> below:
>
>
> void put(T t)
> {
> std::unique_lock<std::mutex> lock(m_, std::try_to_lock);
> while (!lock.owns_lock())
> {
> if (queue_.size() > capacity_/4)
> {
> for (int i = 0; i < 3250; ++i)
> std::this_thread::yield();
> lock.try_lock();
> }
> else
> {
> lock.lock();
> break;
> }
> }
> while(queue_.size() >= capacity_)c_full_.wait(lock);
> queue_.push_back(std::move(t));
> if (queue_.size() == 1)
> c_empty_.notify_one();
> }
> T take()
> {
> std::unique_lock<std::mutex> lock(m_, std::try_to_lock);
> int retry = 0;
> while (!lock.owns_lock())
> {
> if (queue_.size() < 3*capacity_/4)
> {
> for (int i = 0; i < 3250; ++i)
> std::this_thread::yield();
> lock.try_lock();
> }
> else
> {
> lock.lock();
> break;
> }
> }
> while(queue_.empty())c_empty_.wait(lock);
> T tmp = std::move(queue_.front());
> queue_.pop_front();
> if (queue_.size() == capacity_-1)
> c_full_.notify_one();
> return tmp;
> }
>
> I am admittedly coding to the benchmark (which is not a really great
> idea). But I got the time on my system down from 13.7 seconds to
> about 9.7 seconds (another 40%).


On my system speed up is not that big.
(new version)
bmaxa@maxa:~/examples$ time ./consprod3

real 0m9.296s
user 0m10.061s
sys 0m8.069s

(old version)
bmaxa@maxa:~/examples$ time ./consprod2

real 0m9.496s
user 0m10.917s
sys 0m7.292s
bmaxa@maxa:~/examples$

(java with array assignment)
bmaxa@maxa:~/examples$ time java consprod

real 0m16.098s
user 0m25.674s
sys 0m1.932s


 
Reply With Quote
 
none
Guest
Posts: n/a
 
      07-24-2012
In article <(E-Mail Removed)>,
Howard Hinnant <(E-Mail Removed)> wrote:
>I made a few changes to your code:
>
>1. I only signal if the capacity becomes non-zero or non-full.


>2. If the queue is the queue has got some items in it and the mutex is locked, the put thread yields to the take
>thread. If the queue has some free space in it and the mutex is locked, the take thread yields to the put
>thread.


Are you sure about the usefulness of #2?

Doing a few tests myself, adding this in m queue seems to reduce performance by some 20%:
while (!lock.owns_lock())
{
if (queue_.size() > capacity_/4)
{
for (int i = 0; i < 3250; ++i)
std::this_thread::yield();
lock.try_lock();
}
else
{
lock.lock();
break;
}
}


>3. Traffic in int instead of unique_ptr<int>.



Yannick

 
Reply With Quote
 
Melzzzzz
Guest
Posts: n/a
 
      07-24-2012
On Tue, 24 Jul 2012 12:51:20 +0000 (UTC)
yatremblay@bel1lin202.(none) (Yannick Tremblay) wrote:

> In article <(E-Mail Removed)>,
> Howard Hinnant <(E-Mail Removed)> wrote:
> >I made a few changes to your code:
> >
> >1. I only signal if the capacity becomes non-zero or non-full.

>
> >2. If the queue is the queue has got some items in it and the mutex
> >is locked, the put thread yields to the take thread. If the queue
> >has some free space in it and the mutex is locked, the take thread
> >yields to the put thread.

>
> Are you sure about the usefulness of #2?


Problem is that we need to reduce waiting on condition to minimum
as more condition waits program is slower.
Ideally there would be no condition waits (maximum speed).

>
> Doing a few tests myself, adding this in m queue seems to reduce
> performance by some 20%: while (!lock.owns_lock())
> {
> if (queue_.size() > capacity_/4)
> {
> for (int i = 0; i < 3250; ++i)
> std::this_thread::yield();
> lock.try_lock();
> }
> else
> {
> lock.lock();
> break;
> }
> }
>


This is second version, on my system first version seems better
for smaller capacities...


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Why {} is much faster than Hash.new ? IƱaki Baz Castillo Ruby 5 01-17-2011 02:30 PM
Wow, Python much faster than MatLab Stef Mientki Python 11 01-01-2007 02:05 AM
Is a Java Application much faster than an Applet. Sanny Java 12 12-15-2006 02:47 AM
findcontrol("PlaceHolderPrice") why why why why why why why why why why why Mr. SweatyFinger ASP .Net 2 12-02-2006 03:46 PM
why is perl -e 'unlink(glob("*"))' so much faster than rm ? ewaguespack@gmail.com Perl Misc 13 07-19-2006 10:37 AM



Advertisments