Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Why is java consumer/producer so much faster than C++

Reply
Thread Tools

Why is java consumer/producer so much faster than C++

 
 
Luca Risolia
Guest
Posts: n/a
 
      07-26-2012
On 26/07/2012 21:56, Melzzzzz wrote:
>> f = ++f % cap_;

>
> compiler warns about undefined behavior here.


Ouch..split that:

++f;
f %= cap_;

>> b = ++b % cap_;


++b;
b %= cap_

I did not turn on all the warnings. My compiler produced the correct
code though.

 
Reply With Quote
 
 
 
 
Melzzzzz
Guest
Posts: n/a
 
      07-26-2012
On Thu, 26 Jul 2012 22:15:33 +0200
Luca Risolia <(E-Mail Removed)> wrote:

> On 26/07/2012 21:56, Melzzzzz wrote:
> >> f = ++f % cap_;

> >
> > compiler warns about undefined behavior here.

>
> Ouch..split that:
>
> ++f;
> f %= cap_;


I corrected that already

I have installed gcc-snapshot compiler
gcc version 4.8.0 20120314 (experimental) [trunk revision 185382]
(Ubuntu/Linaro 20120314-0ubuntu2)

Your timings with array assignment and rand included, are much worse
than gcc 4.6.3. Definitely something is wrong if
other thread does something else besides waiting on
spin lock.




 
Reply With Quote
 
 
 
 
Marc
Guest
Posts: n/a
 
      07-26-2012
Luca Risolia wrote:

> On 26/07/2012 21:56, Melzzzzz wrote:
>>> f = ++f % cap_;

>>
>> compiler warns about undefined behavior here.

>
> Ouch..split that:
>
> ++f;
> f %= cap_;


If you want to write it in one line, you can still do:
++f %= cap_;
 
Reply With Quote
 
Luca Risolia
Guest
Posts: n/a
 
      07-26-2012
On 26/07/2012 23:01, Melzzzzz wrote:
> On Thu, 26 Jul 2012 22:15:33 +0200
> Luca Risolia <(E-Mail Removed)> wrote:
>
>> On 26/07/2012 21:56, Melzzzzz wrote:
>>>> f = ++f % cap_;
>>>
>>> compiler warns about undefined behavior here.

>>
>> Ouch..split that:
>>
>> ++f;
>> f %= cap_;

>
> I corrected that already
>
> I have installed gcc-snapshot compiler
> gcc version 4.8.0 20120314 (experimental) [trunk revision 185382]
> (Ubuntu/Linaro 20120314-0ubuntu2)
>
> Your timings with array assignment and rand included, are much worse
> than gcc 4.6.3. Definitely something is wrong if
> other thread does something else besides waiting on
> spin lock.


Yes, it's worth to strace std::rand() to see what it does exactly.
Out of my curiosity, try to yield() after releasing the lock in both
put() and take():

m_.unlock();
boost::this_thread::yield(); // add this line

do you see any improvements?

 
Reply With Quote
 
Melzzzzz
Guest
Posts: n/a
 
      07-26-2012
On Thu, 26 Jul 2012 23:12:54 +0200
Luca Risolia <(E-Mail Removed)> wrote:

> On 26/07/2012 23:01, Melzzzzz wrote:
> > On Thu, 26 Jul 2012 22:15:33 +0200
> > Luca Risolia <(E-Mail Removed)> wrote:
> >
> >> On 26/07/2012 21:56, Melzzzzz wrote:
> >>>> f = ++f % cap_;
> >>>
> >>> compiler warns about undefined behavior here.
> >>
> >> Ouch..split that:
> >>
> >> ++f;
> >> f %= cap_;

> >
> > I corrected that already
> >
> > I have installed gcc-snapshot compiler
> > gcc version 4.8.0 20120314 (experimental) [trunk revision 185382]
> > (Ubuntu/Linaro 20120314-0ubuntu2)
> >
> > Your timings with array assignment and rand included, are much worse
> > than gcc 4.6.3. Definitely something is wrong if
> > other thread does something else besides waiting on
> > spin lock.

>
> Yes, it's worth to strace std::rand() to see what it does exactly.
> Out of my curiosity, try to yield() after releasing the lock in both
> put() and take():
>
> m_.unlock();
> boost::this_thread::yield(); // add this line
>
> do you see any improvements?
>


No, that make it drastically slower (when not assigning array and using
rand, also not any faster when rand is used).
You could speed up your program by 10-30% I think with just
using my original implementation of
++f;
if(f == cap_)f = 0;
instead of
++f;
f %= cap_;
because division is drastically slower on modern CPUs,
than check and assignment (that's why division by constant
is usually implemented by shift and multiply).
In this case I don;t think compiler can be that clever, it
has to use division and that makes performance *much* worse.

take a look at my time now (with if instead of modulo)
bmaxa@maxa:~/examples$ time ./consprod5

real 0m5.234s
user 0m5.912s
sys 0m3.976s

almost 30% improvement!

 
Reply With Quote
 
Melzzzzz
Guest
Posts: n/a
 
      07-26-2012
On Thu, 26 Jul 2012 23:12:54 +0200
Luca Risolia <(E-Mail Removed)> wrote:

> On 26/07/2012 23:01, Melzzzzz wrote:
> > On Thu, 26 Jul 2012 22:15:33 +0200
> > Luca Risolia <(E-Mail Removed)> wrote:
> >
> >> On 26/07/2012 21:56, Melzzzzz wrote:
> >>>> f = ++f % cap_;
> >>>
> >>> compiler warns about undefined behavior here.
> >>
> >> Ouch..split that:
> >>
> >> ++f;
> >> f %= cap_;

> >
> > I corrected that already
> >
> > I have installed gcc-snapshot compiler
> > gcc version 4.8.0 20120314 (experimental) [trunk revision 185382]
> > (Ubuntu/Linaro 20120314-0ubuntu2)
> >
> > Your timings with array assignment and rand included, are much worse
> > than gcc 4.6.3. Definitely something is wrong if
> > other thread does something else besides waiting on
> > spin lock.

>
> Yes, it's worth to strace std::rand() to see what it does exactly.
> Out of my curiosity, try to yield() after releasing the lock in both
> put() and take():
>
> m_.unlock();
> boost::this_thread::yield(); // add this line
>
> do you see any improvements?
>


I have solved your problem. Seems that your program suffers same problem
as my original version. When applied Howards algorithm to your version
I've got fastest time both for rand array assignment and without:

take a look:
(merged version with rand array assignment)
bmaxa@maxa:~/examples$ time ./consprod5

real 0m4.731s
user 0m5.164s
sys 0m2.804s

(merged version without rand array assignment,
wow this is fast
bmaxa@maxa:~/examples$ time ./consprod5

real 0m2.345s
user 0m2.624s
sys 0m0.820s

code:
#include <thread>
#include <cstdlib>
#include <functional>
#include <atomic>
#include <memory>
#include <type_traits>

class spinlock_mutex {
std::atomic_flag flag = ATOMIC_FLAG_INIT;
public:
void lock() {
while (flag.test_and_set(std::memory_order_acquire)) {
std::this_thread::yield();
};
}
bool try_lock(){
if(flag.test_and_set(std::memory_order_acquire))
return false;
else
return true;
}
void unlock() {
flag.clear(std::memory_order_release);
}
};

template <class T, class A = std::allocator<T >>
class BlockingQueue {
const std::size_t cap_;
A alloc_;
T* queue_;
std::size_t n = 0, b = 0 , f = 0;
spinlock_mutex m_;
public:

BlockingQueue(const std::size_t cap, const A& alloc = A())
: cap_(cap), alloc_(alloc), queue_(alloc_.allocate(cap)) { }

~BlockingQueue() {
alloc_.deallocate(queue_, cap_);
}

static_assert(std::is_nothrow_move_constructible<T >::value,
"BlockingQueue cannot be exception-safe");

void put(T t) {
bool locked = m_.try_lock();
int retry = 0;
while(!locked)
{
if(n > cap_ / 4 && ++retry < 1000)
{
std::this_thread::yield();
}
else
{
locked = m_.try_lock();
}
}
alloc_.construct(&queue_[f], std::move(t));
++f;
if(f == cap_)f = 0;
++n;
m_.unlock();
}

T take() {
bool locked = m_.try_lock();
int retry = 0;
while(!locked)
{
if(n < 3 * cap_ / 4 && ++retry < 1000)
{
std::this_thread::yield();
}
else
{
locked = m_.try_lock();
}
}
T t = std::move(queue_[b]);
alloc_.destroy(&queue_[b]);
++b;
if(b == cap_) b = 0;
--n;
m_.unlock();
return t;
}
};

int main() {
BlockingQueue<int> produced(1000);
const int nitems = 100000000;
std::srand(12345);


std::function<void() > f_prod = [&](){
int i = nitems;
while (i-- > 0) {
produced.put(i);
}
};

std::thread producer1(f_prod);

std::function<void() > f_cons = [&](){
const int size = 10000;
int arr[size];
int i = nitems;
while (i-- > 0) {
arr[std::rand() % size] =
produced.take();
}
};

std::thread consumer1(f_cons);

producer1.join();
consumer1.join();
}

 
Reply With Quote
 
Melzzzzz
Guest
Posts: n/a
 
      07-27-2012
On Fri, 27 Jul 2012 00:48:59 +0200
Melzzzzz <(E-Mail Removed)> wrote:

> On Thu, 26 Jul 2012 23:12:54 +0200
> Luca Risolia <(E-Mail Removed)> wrote:
>
> > On 26/07/2012 23:01, Melzzzzz wrote:
> > > On Thu, 26 Jul 2012 22:15:33 +0200
> > > Luca Risolia <(E-Mail Removed)> wrote:
> > >
> > >> On 26/07/2012 21:56, Melzzzzz wrote:
> > >>>> f = ++f % cap_;
> > >>>
> > >>> compiler warns about undefined behavior here.
> > >>
> > >> Ouch..split that:
> > >>
> > >> ++f;
> > >> f %= cap_;
> > >
> > > I corrected that already
> > >
> > > I have installed gcc-snapshot compiler
> > > gcc version 4.8.0 20120314 (experimental) [trunk revision 185382]
> > > (Ubuntu/Linaro 20120314-0ubuntu2)
> > >
> > > Your timings with array assignment and rand included, are much
> > > worse than gcc 4.6.3. Definitely something is wrong if
> > > other thread does something else besides waiting on
> > > spin lock.

> >
> > Yes, it's worth to strace std::rand() to see what it does exactly.
> > Out of my curiosity, try to yield() after releasing the lock in
> > both put() and take():
> >
> > m_.unlock();
> > boost::this_thread::yield(); // add this line
> >
> > do you see any improvements?
> >

>
> I have solved your problem.


Nah, implementation is completely wrong (it was too late),disregard this
post, problem remains... ;(


 
Reply With Quote
 
Melzzzzz
Guest
Posts: n/a
 
      07-27-2012
On Thu, 26 Jul 2012 23:12:54 +0200
Luca Risolia <(E-Mail Removed)> wrote:

> On 26/07/2012 23:01, Melzzzzz wrote:
> > On Thu, 26 Jul 2012 22:15:33 +0200
> > Luca Risolia <(E-Mail Removed)> wrote:
> >
> >> On 26/07/2012 21:56, Melzzzzz wrote:
> >>>> f = ++f % cap_;
> >>>
> >>> compiler warns about undefined behavior here.
> >>
> >> Ouch..split that:
> >>
> >> ++f;
> >> f %= cap_;

> >
> > I corrected that already
> >
> > I have installed gcc-snapshot compiler
> > gcc version 4.8.0 20120314 (experimental) [trunk revision 185382]
> > (Ubuntu/Linaro 20120314-0ubuntu2)
> >
> > Your timings with array assignment and rand included, are much worse
> > than gcc 4.6.3. Definitely something is wrong if
> > other thread does something else besides waiting on
> > spin lock.

>
> Yes, it's worth to strace std::rand() to see what it does exactly.
> Out of my curiosity, try to yield() after releasing the lock in both
> put() and take():
>
> m_.unlock();
> boost::this_thread::yield(); // add this line
>
> do you see any improvements?
>


Finally got it
Problem was that producer thread constantly spins, but consumer does
some work. Queue was constantly full and that slowed it down
significantly.
I have used Howards trick, and this time correctly

now times are fastest (queue size was 1000):
(with rand)
bmaxa@maxa:~/examples$ time ./consprod5

real 0m6.215s
user 0m6.440s
sys 0m5.108s

(without rand)
bmaxa@maxa:~/examples$ time ./consprod5

real 0m3.232s
user 0m3.024s
sys 0m2.968s

code:
void put(T t) {
for(;
{
m_.lock();
if(n < cap_)
{
int retry = 0;
while(n > 3 * cap_ / 4 && ++retry < 3000)
{
m_.unlock();
std::this_thread::yield();
m_.lock();
}
break;
}
m_.unlock();
}
alloc_.construct(&queue_[f], std::move(t));
++f;
if(f == cap_)f = 0;
++n;
m_.unlock();
}

T take() {
for(;
{
m_.lock();
if(n > 0)
{
int retry = 0;
while(n < cap_ / 4 && ++retry < 3000)
{
m_.unlock();
std::this_thread::yield();
m_.lock();
}
break;
}
m_.unlock();
}
T t = std::move(queue_[b]);
alloc_.destroy(&queue_[b]);
++b;
if(b == cap_) b = 0;
--n;
m_.unlock();
return t;
}

 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      07-29-2012
On Sunday, July 22, 2012 10:59:20 PM UTC+1, Melzzzzz wrote:
> I have played little bit with new C++11 features and compared,


> java performance to c++.


> Actually this was meant to be GC vs RAII memory management,
> but boiled down to speed of BlockingQueue class in java,
> and mine in c++.


> It seems that java implementation is so much more efficient
> but I don't know why. I even tried c++ without dynamic
> memory management (except queue itself) and that is *even slower*.
> Must be some quirks with a queue


Just a guess (I don't know anything about the Java
BlockingQueue), but your C++ implementation uses an extensible
queue; the restriction on the queue size is external. From the
code, I'd guess that the Java blocking queue is a fixed length
(established once in the constructor). This means that once the
queue has been constructed, there are no more allocations in
Java; in C++, I think you'll find that std::deque does allocate
and free new memory as it expands and contracts.

You might try replacing std::deque with a simple, std::vector
based circular buffer. Construct the std::vector to its fixed
size once at the beginning, and just keep two iterators into it:
one for writing, and one for reading. Wrap the iterators each
time they reach the end. (Since the limit conditions for full
and empty are sometimes difficult to implement, you'd be better
off searching an existing implementation on the net. I'd be
surprised if Boost didn't have one, for example.)

--
James
 
Reply With Quote
 
Melzzzzz
Guest
Posts: n/a
 
      07-29-2012
On Sun, 29 Jul 2012 03:27:18 -0700 (PDT)
James Kanze <(E-Mail Removed)> wrote:

>
> You might try replacing std::deque with a simple, std::vector
> based circular buffer.


I already did that. If you follow thread, Howard solved problem.




 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Why {} is much faster than Hash.new ? IƱaki Baz Castillo Ruby 5 01-17-2011 02:30 PM
Wow, Python much faster than MatLab Stef Mientki Python 11 01-01-2007 02:05 AM
Is a Java Application much faster than an Applet. Sanny Java 12 12-15-2006 02:47 AM
findcontrol("PlaceHolderPrice") why why why why why why why why why why why Mr. SweatyFinger ASP .Net 2 12-02-2006 03:46 PM
why is perl -e 'unlink(glob("*"))' so much faster than rm ? ewaguespack@gmail.com Perl Misc 13 07-19-2006 10:37 AM



Advertisments