Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Concurrent code performance

Reply
Thread Tools

Concurrent code performance

 
 
Saeed Amrollahi
Guest
Posts: n/a
 
      10-09-2011
Hi All C++ developers

I learned (from theory and practice) the most important benefit of
qualified multi-threaded programming is better performance.
Here it is a (not well-written) concurrent (two worker threads)
and sequential version
of Summation of [0, 2000000[ interval.
Of course my concurrent code is not good. It's for exposition only.

// concurrent code
#include <thread>
#include <iostream>

using namespace std;
long long s1 = 0, s2 = 0;
void Sum(int first, int last, long long& res) // calculate the sum in
[first, last[ interval
{
while (first < last)
res += first++;
}

int main()
{
long long r1 = 0, r2 = 0;
thread t1(Sum, 0, 1000000, std::ref(r1));
thread t2(Sum, 1000000, 2000000, std::ref(r2));
t1.join(); t2.join();
cout << r1 + r2 << '\n';

return 0;
}

// sequential code
#include <iostream>

using namespace std;
void Sum(int first, int last, long long& res) // calculate the sum in
[first, last[ interval
{
while (first < last)
res += first++;
}

int main()
{
long long r = 0;
Sum(0, 2000000, r);

cout << r<< '\n';

return 0;
}

I compiled and ran two codes using the following commands:
$ g++ -std=c++0x -pthread -pedantic -o2 concurrent_sum.c++ -o
concurrent_sum
$ g++ -std=c++0x -pedantic -o2 sequential_sum.c++ -o sequential_sum
$ time ./concurrent_sum
1999999999000
real 0m0.014s
user 0m0.016s
sys 0m0.004s
$ time ./sequential_sum
1999999999000
real 0m0.021s
user 0m0.020s
sys 0m0.000s
Of course the time command differs in several
execution, but honestly, I didn't see so much difference between two
execution.
In addition the generated code for sequential_sum is about 6 KB, but
the size
of concurrent_code is about 45 KB.
Is there any problem in my measurement? Why the concurrent object code
is 7+ times bigger than sequential version? How do you explain it?

FYI, I use GCC 4.6.0 under Ubuntu on a Dell single processor notebook.
Please shed some light.

-- Saeed Amrollahi
 
Reply With Quote
 
 
 
 
Marc
Guest
Posts: n/a
 
      10-09-2011
Saeed Amrollahi wrote:

> Hi All C++ developers
>
> I learned (from theory and practice) the most important benefit of
> qualified multi-threaded programming is better performance.
> Here it is a (not well-written) concurrent (two worker threads)
> and sequential version
> of Summation of [0, 2000000[ interval.
> Of course my concurrent code is not good. It's for exposition only.
>
> // concurrent code
> #include <thread>
> #include <iostream>
>
> using namespace std;
> long long s1 = 0, s2 = 0;
> void Sum(int first, int last, long long& res) // calculate the sum in
> [first, last[ interval
> {
> while (first < last)
> res += first++;
> }
>
> int main()
> {
> long long r1 = 0, r2 = 0;
> thread t1(Sum, 0, 1000000, std::ref(r1));
> thread t2(Sum, 1000000, 2000000, std::ref(r2));


I would make Sum return a long long instead of updating a reference.
If you are unlucky, r1 and r2 end up on the same cache line, you have
false sharing and the concurrent version ends up slower than the
sequential one.

> t1.join(); t2.join();
> cout << r1 + r2 << '\n';
>
> return 0;
> }


> In addition the generated code for sequential_sum is about 6 KB, but
> the size
> of concurrent_code is about 45 KB.
> Is there any problem in my measurement? Why the concurrent object code
> is 7+ times bigger than sequential version? How do you explain it?


Your basic code is very simple: a loop, a couple additions, almost
nothing. It is not really surprising that code meant to handle
multiple threads is several times larger. Now it might be possible for
libstdc++ to factor more of that code into the shared library, but
that's a different, more complicated question.

 
Reply With Quote
 
 
 
 
Saeed Amrollahi
Guest
Posts: n/a
 
      10-09-2011
On Oct 9, 4:29*pm, Marc <(E-Mail Removed)> wrote:
> Saeed Amrollahi *wrote:
> > Hi All C++ developers

>
> > I learned (from theory and practice) the most important benefit of
> > qualified multi-threaded programming is better performance.
> > Here it is a (not well-written) concurrent (two worker threads)
> > and sequential version
> > of Summation of [0, 2000000[ interval.
> > Of course my concurrent code is not good. It's for exposition only.

>
> > // concurrent code
> > #include <thread>
> > #include <iostream>

>
> > using namespace std;
> > long long s1 = 0, s2 = 0;
> > void Sum(int first, int last, long long& res) // calculate the sum in
> > [first, last[ interval
> > {
> > * while (first < last)
> > * * res += first++;
> > }

>
> > int main()
> > {
> > * long long r1 = 0, r2 = 0;
> > * thread t1(Sum, 0, 1000000, std::ref(r1));
> > * thread t2(Sum, 1000000, 2000000, std::ref(r2));

>
> I would make Sum return a long long instead of updating a reference.
> If you are unlucky, r1 and r2 end up on the same cache line, you have
> false sharing and the concurrent version ends up slower than the
> sequential one.


I know. As I noted, my code is not well-written. but I tried
to write two equivalent code.
Can you explain your answer in more detail? the cache line thing
and false sharing ...
>
> > * t1.join(); t2.join();
> > * cout << r1 + r2 << '\n';

>
> > * return 0;
> > }
> > In addition the generated code for sequential_sum is about 6 KB, but
> > the size
> > of concurrent_code is about 45 KB.
> > Is there any problem in my measurement? Why the concurrent object code
> > is 7+ times bigger than sequential version? How do you explain it?

>
> Your basic code is very simple: a loop, a couple additions, almost
> nothing. It is not really surprising that code meant to handle
> multiple threads is several times larger. Now it might be possible for
> libstdc++ to factor more of that code into the shared library, but
> that's a different, more complicated question.


Why it's not surprising? is the size of my threads big?
Would you explain more.

TIA,
-- Saeed Amrollahi

 
Reply With Quote
 
Marc
Guest
Posts: n/a
 
      10-09-2011
Saeed Amrollahi wrote:

> Can you explain your answer in more detail? the cache line thing
> and false sharing ...


There's a wikipedia article on this...

>> Your basic code is very simple: a loop, a couple additions, almost
>> nothing. It is not really surprising that code meant to handle
>> multiple threads is several times larger. Now it might be possible for
>> libstdc++ to factor more of that code into the shared library, but
>> that's a different, more complicated question.

>
> Why it's not surprising? is the size of my threads big?
> Would you explain more.


Size of thread: basically how much you call operator new.
Size of object file: basically length of (used) source code.

The basic code is self contained and fits in less than 10 lines. The
threaded code includes the <thread> header, which is rather long...
 
Reply With Quote
 
Pavel
Guest
Posts: n/a
 
      10-10-2011
Saeed Amrollahi wrote:
> Hi All C++ developers

....

> I compiled and ran two codes using the following commands:
> $ g++ -std=c++0x -pthread -pedantic -o2 concurrent_sum.c++ -o
> concurrent_sum
> $ g++ -std=c++0x -pedantic -o2 sequential_sum.c++ -o sequential_sum

....

> Is there any problem in my measurement?

Not that I can see. You had a decent speed-up in real-time of 50% (or 33%,
depending on how you count). Speed-up factor would almost never be exactly 2 for
2 threads as there are lots of resources in your system that can become
bottlenecks (you may consider reading "What Every Programmer Should Know About
Memory" by Ulrich Drepper, available, for example, at
www.akkadia.org/drepper/cpumemory.pdf); also, context switches would take some
time. Probably your single-CPU notebook has hyper-threading turned on (try 'cat
/proc/cpuinfo' to say for sure); else I can't explain that you saw the reduction
in real time at all.

> Why the concurrent object code
> is 7+ times bigger than sequential version? How do you explain it?

.....
The concurrent code has thread library code linked in whereas the sequential one
doesn't. As you use GCC, you can run 'objdump -d' on your concurrent binary and
see what exactly functions were added as compared to the sequential version.

>
> -- Saeed Amrollahi



Hope this helps,
-Pavel
 
Reply With Quote
 
Goran
Guest
Posts: n/a
 
      10-10-2011
On Oct 9, 2:42*pm, Saeed Amrollahi <(E-Mail Removed)> wrote:
> Hi All C++ developers
>
> I learned (from theory and practice) the most important benefit of
> qualified multi-threaded programming is better performance.
> Here it is a (not well-written) concurrent (two worker threads)
> and sequential version
> of Summation of [0, 2000000[ interval.
> Of course my concurrent code is not good. It's for exposition only.
>
> // concurrent code
> #include <thread>
> #include <iostream>
>
> using namespace std;
> long long s1 = 0, s2 = 0;
> void Sum(int first, int last, long long& res) // calculate the sum in
> [first, last[ interval
> {
> * while (first < last)
> * * res += first++;
>
> }
>
> int main()
> {
> * long long r1 = 0, r2 = 0;
> * thread t1(Sum, 0, 1000000, std::ref(r1));
> * thread t2(Sum, 1000000, 2000000, std::ref(r2));
> * t1.join(); t2.join();
> * cout << r1 + r2 << '\n';
>
> * return 0;
>
> }
>
> // sequential code
> #include <iostream>
>
> using namespace std;
> void Sum(int first, int last, long long& res) // calculate the sum in
> [first, last[ interval
> {
> * while (first < last)
> * *res += first++;
>
> }
>
> int main()
> {
> * long long r = 0;
> * Sum(0, 2000000, r);
>
> * cout << r<< '\n';
>
> * return 0;
>
> }
>
> I compiled and ran two codes using the following commands:
> $ g++ -std=c++0x -pthread -pedantic -o2 concurrent_sum.c++ -o
> concurrent_sum
> $ g++ -std=c++0x *-pedantic -o2 sequential_sum.c++ -o sequential_sum
> $ time ./concurrent_sum
> 1999999999000
> real * 0m0.014s
> user *0m0.016s
> sys * 0m0.004s
> $ time ./sequential_sum
> 1999999999000
> real * 0m0.021s
> user *0m0.020s
> sys * 0m0.000s
> Of course the time command differs in several
> execution, but honestly, I didn't see so much difference between two
> execution.
> In addition the generated code for sequential_sum is about 6 KB, but
> the size
> of concurrent_code is about 45 KB.
> Is there any problem in my measurement? Why the concurrent object code
> is 7+ times bigger than sequential version? How do you explain it?


I don't see a problem. On a single-core machine, I am surprised you
saw even that much of an improvement. I guess, just like Pavel, that
hyper-threading is working. Your code should give better performance
on a multi-core machines. Given what it does, it should hardly touch
memory, if at all, so memory should not be a bottleneck.

Your executable is 7 times bigger because you linked in (part of)
pthreads in it.

Goran.
 
Reply With Quote
 
Saeed Amrollahi
Guest
Posts: n/a
 
      10-10-2011
On Oct 9, 6:41*pm, Marc <(E-Mail Removed)> wrote:
> Saeed Amrollahi *wrote:
> > Can you explain your answer in more detail? the cache line thing
> > and false sharing ...

>
> There's a wikipedia article on this...
>
> >> Your basic code is very simple: a loop, a couple additions, almost
> >> nothing. It is not really surprising that code meant to handle
> >> multiple threads is several times larger. Now it might be possible for
> >> libstdc++ to factor more of that code into the shared library, but
> >> that's a different, more complicated question.

>
> > Why it's not surprising? is the size of my threads big?
> > Would you explain more.

>
> Size of thread: basically how much you call operator new.
> Size of object file: basically length of (used) source code.
>
> The basic code is self contained and fits in less than 10 lines. The
> threaded code includes the <thread> header, which is rather long...


Thank you for your answers Marc.
-- Saeed
 
Reply With Quote
 
Saeed Amrollahi
Guest
Posts: n/a
 
      10-10-2011
On Oct 10, 5:57*am, Pavel
<(E-Mail Removed)> wrote:
> Saeed Amrollahi wrote:
> > Hi All C++ developers

>
> ...
>
> > I compiled and ran two codes using the following commands:
> > $ g++ -std=c++0x -pthread -pedantic -o2 concurrent_sum.c++ -o
> > concurrent_sum
> > $ g++ -std=c++0x *-pedantic -o2 sequential_sum.c++ -o sequential_sum

>
> ...
>
> > Is there any problem in my measurement?

>
> Not that I can see. You had a decent speed-up in real-time of 50% (or 33%,
> depending on how you count). Speed-up factor would almost never be exactly 2 for
> 2 threads as there are lots of resources in your system that can become
> bottlenecks

<Nodded>
(you may consider reading "What Every Programmer Should Know About
> Memory" by Ulrich Drepper, available, for example, atwww.akkadia.org/drepper/cpumemory.pdf);

I know about the paper, but I haven't to read it yet.
also, context switches would take some
> time. Probably your single-CPU notebook has hyper-threading turned on (try 'cat
> /proc/cpuinfo' to say for sure); else I can't explain that you saw the reduction
> in real time at all.
>
> > Why the concurrent object code
> > is 7+ times bigger than sequential version? How do you explain it?

>
> ....
> The concurrent code has thread library code linked in whereas the sequential one
> doesn't. As you use GCC, you can run 'objdump -d' on your concurrent binary and
> see what exactly functions were added as compared to the sequential version.
>
>
>
> > * *-- Saeed Amrollahi

>
> Hope this helps,
> -Pavel


Thank you very much. Besides concurrency concepts, I learned
a few UNIX-related commands.
-- Saeed
 
Reply With Quote
 
Saeed Amrollahi
Guest
Posts: n/a
 
      10-10-2011
On Oct 10, 10:06*am, Goran <(E-Mail Removed)> wrote:
> On Oct 9, 2:42*pm, Saeed Amrollahi <(E-Mail Removed)> wrote:
>
>
>
>
>
>
>
>
>
> > Hi All C++ developers

>
> > I learned (from theory and practice) the most important benefit of
> > qualified multi-threaded programming is better performance.
> > Here it is a (not well-written) concurrent (two worker threads)
> > and sequential version
> > of Summation of [0, 2000000[ interval.
> > Of course my concurrent code is not good. It's for exposition only.

>
> > // concurrent code
> > #include <thread>
> > #include <iostream>

>
> > using namespace std;
> > long long s1 = 0, s2 = 0;
> > void Sum(int first, int last, long long& res) // calculate the sum in
> > [first, last[ interval
> > {
> > * while (first < last)
> > * * res += first++;

>
> > }

>
> > int main()
> > {
> > * long long r1 = 0, r2 = 0;
> > * thread t1(Sum, 0, 1000000, std::ref(r1));
> > * thread t2(Sum, 1000000, 2000000, std::ref(r2));
> > * t1.join(); t2.join();
> > * cout << r1 + r2 << '\n';

>
> > * return 0;

>
> > }

>
> > // sequential code
> > #include <iostream>

>
> > using namespace std;
> > void Sum(int first, int last, long long& res) // calculate the sum in
> > [first, last[ interval
> > {
> > * while (first < last)
> > * *res += first++;

>
> > }

>
> > int main()
> > {
> > * long long r = 0;
> > * Sum(0, 2000000, r);

>
> > * cout << r<< '\n';

>
> > * return 0;

>
> > }

>
> > I compiled and ran two codes using the following commands:
> > $ g++ -std=c++0x -pthread -pedantic -o2 concurrent_sum.c++ -o
> > concurrent_sum
> > $ g++ -std=c++0x *-pedantic -o2 sequential_sum.c++ -o sequential_sum
> > $ time ./concurrent_sum
> > 1999999999000
> > real * 0m0.014s
> > user *0m0.016s
> > sys * 0m0.004s
> > $ time ./sequential_sum
> > 1999999999000
> > real * 0m0.021s
> > user *0m0.020s
> > sys * 0m0.000s
> > Of course the time command differs in several
> > execution, but honestly, I didn't see so much difference between two
> > execution.
> > In addition the generated code for sequential_sum is about 6 KB, but
> > the size
> > of concurrent_code is about 45 KB.
> > Is there any problem in my measurement? Why the concurrent object code
> > is 7+ times bigger than sequential version? How do you explain it?

>
> I don't see a problem. On a single-core machine, I am surprised you
> saw even that much of an improvement. I guess, just like Pavel, that
> hyper-threading is working. Your code should give better performance
> on a multi-core machines. Given what it does, it should hardly touch
> memory, if at all, so memory should not be a bottleneck.
>
> Your executable is 7 times bigger because you linked in (part of)
> pthreads in it.
>
> Goran.


Thank you Goran. I just found devising appropriate Multi-threaded
algorithms is much more difficult than just creating and using several
threads.
-- Saeed
 
Reply With Quote
 
Jorgen Grahn
Guest
Posts: n/a
 
      10-10-2011
On Mon, 2011-10-10, Pavel wrote:
....
> As you use GCC, you can run 'objdump -d' on your concurrent binary


Nitpick: gcc has nothing to do with it. If the binary is on a format
which objdump understands, objdump will understand it. (The object
files generated by gcc happen to be in that category.)

'nm -nC foo.o' might be a simpler tool for this, by the way.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Performance Tutorials Services - Boosting Performance by DisablingUnnecessary Services on Windows XP Home Edition Software Engineer Javascript 0 06-10-2011 02:18 AM
Concurrent Write to a single file (in multi thread code)? Donkey Hot Java 3 04-27-2008 03:33 PM
Issues with concurrent pieces of code rich@zyloid.com C++ 1 04-28-2007 06:07 AM
problems locating the concurrent EDU.oswego.cs.dl.util.concurrent package Pep Java 6 08-16-2005 07:26 AM
Web Form Performance Versus Single File Performance jm ASP .Net 1 12-12-2003 11:14 PM



Advertisments