Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C++ (http://www.velocityreviews.com/forums/f39-c.html)
-   -   Concurrent code performance (http://www.velocityreviews.com/forums/t754721-concurrent-code-performance.html)

 Saeed Amrollahi 10-09-2011 12:42 PM

Concurrent code performance

Hi All C++ developers

I learned (from theory and practice) the most important benefit of
qualified multi-threaded programming is better performance.
Here it is a (not well-written) concurrent (two worker threads)
and sequential version
of Summation of [0, 2000000[ interval.
Of course my concurrent code is not good. It's for exposition only.

// concurrent code
#include <iostream>

using namespace std;
long long s1 = 0, s2 = 0;
void Sum(int first, int last, long long& res) // calculate the sum in
[first, last[ interval
{
while (first < last)
res += first++;
}

int main()
{
long long r1 = 0, r2 = 0;
t1.join(); t2.join();
cout << r1 + r2 << '\n';

return 0;
}

// sequential code
#include <iostream>

using namespace std;
void Sum(int first, int last, long long& res) // calculate the sum in
[first, last[ interval
{
while (first < last)
res += first++;
}

int main()
{
long long r = 0;
Sum(0, 2000000, r);

cout << r<< '\n';

return 0;
}

I compiled and ran two codes using the following commands:
\$ g++ -std=c++0x -pthread -pedantic -o2 concurrent_sum.c++ -o
concurrent_sum
\$ g++ -std=c++0x -pedantic -o2 sequential_sum.c++ -o sequential_sum
\$ time ./concurrent_sum
1999999999000
real 0m0.014s
user 0m0.016s
sys 0m0.004s
\$ time ./sequential_sum
1999999999000
real 0m0.021s
user 0m0.020s
sys 0m0.000s
Of course the time command differs in several
execution, but honestly, I didn't see so much difference between two
execution.
In addition the generated code for sequential_sum is about 6 KB, but
the size
of concurrent_code is about 45 KB.
Is there any problem in my measurement? Why the concurrent object code
is 7+ times bigger than sequential version? How do you explain it?

FYI, I use GCC 4.6.0 under Ubuntu on a Dell single processor notebook.

-- Saeed Amrollahi

 Marc 10-09-2011 01:29 PM

Re: Concurrent code performance

Saeed Amrollahi wrote:

> Hi All C++ developers
>
> I learned (from theory and practice) the most important benefit of
> qualified multi-threaded programming is better performance.
> Here it is a (not well-written) concurrent (two worker threads)
> and sequential version
> of Summation of [0, 2000000[ interval.
> Of course my concurrent code is not good. It's for exposition only.
>
> // concurrent code
> #include <iostream>
>
> using namespace std;
> long long s1 = 0, s2 = 0;
> void Sum(int first, int last, long long& res) // calculate the sum in
> [first, last[ interval
> {
> while (first < last)
> res += first++;
> }
>
> int main()
> {
> long long r1 = 0, r2 = 0;
> thread t1(Sum, 0, 1000000, std::ref(r1));
> thread t2(Sum, 1000000, 2000000, std::ref(r2));

I would make Sum return a long long instead of updating a reference.
If you are unlucky, r1 and r2 end up on the same cache line, you have
false sharing and the concurrent version ends up slower than the
sequential one.

> t1.join(); t2.join();
> cout << r1 + r2 << '\n';
>
> return 0;
> }

> In addition the generated code for sequential_sum is about 6 KB, but
> the size
> of concurrent_code is about 45 KB.
> Is there any problem in my measurement? Why the concurrent object code
> is 7+ times bigger than sequential version? How do you explain it?

Your basic code is very simple: a loop, a couple additions, almost
nothing. It is not really surprising that code meant to handle
multiple threads is several times larger. Now it might be possible for
libstdc++ to factor more of that code into the shared library, but
that's a different, more complicated question.

 Saeed Amrollahi 10-09-2011 01:58 PM

Re: Concurrent code performance

On Oct 9, 4:29*pm, Marc <marc.gli...@gmail.com> wrote:
> Saeed Amrollahi *wrote:
> > Hi All C++ developers

>
> > I learned (from theory and practice) the most important benefit of
> > qualified multi-threaded programming is better performance.
> > Here it is a (not well-written) concurrent (two worker threads)
> > and sequential version
> > of Summation of [0, 2000000[ interval.
> > Of course my concurrent code is not good. It's for exposition only.

>
> > // concurrent code
> > #include <iostream>

>
> > using namespace std;
> > long long s1 = 0, s2 = 0;
> > void Sum(int first, int last, long long& res) // calculate the sum in
> > [first, last[ interval
> > {
> > * while (first < last)
> > * * res += first++;
> > }

>
> > int main()
> > {
> > * long long r1 = 0, r2 = 0;
> > * thread t1(Sum, 0, 1000000, std::ref(r1));
> > * thread t2(Sum, 1000000, 2000000, std::ref(r2));

>
> I would make Sum return a long long instead of updating a reference.
> If you are unlucky, r1 and r2 end up on the same cache line, you have
> false sharing and the concurrent version ends up slower than the
> sequential one.

I know. As I noted, my code is not well-written. but I tried
to write two equivalent code.
Can you explain your answer in more detail? the cache line thing
and false sharing ...
>
> > * t1.join(); t2.join();
> > * cout << r1 + r2 << '\n';

>
> > * return 0;
> > }
> > In addition the generated code for sequential_sum is about 6 KB, but
> > the size
> > of concurrent_code is about 45 KB.
> > Is there any problem in my measurement? Why the concurrent object code
> > is 7+ times bigger than sequential version? How do you explain it?

>
> Your basic code is very simple: a loop, a couple additions, almost
> nothing. It is not really surprising that code meant to handle
> multiple threads is several times larger. Now it might be possible for
> libstdc++ to factor more of that code into the shared library, but
> that's a different, more complicated question.

Why it's not surprising? is the size of my threads big?
Would you explain more.

TIA,
-- Saeed Amrollahi

 Marc 10-09-2011 03:41 PM

Re: Concurrent code performance

Saeed Amrollahi wrote:

> Can you explain your answer in more detail? the cache line thing
> and false sharing ...

There's a wikipedia article on this...

>> Your basic code is very simple: a loop, a couple additions, almost
>> nothing. It is not really surprising that code meant to handle
>> multiple threads is several times larger. Now it might be possible for
>> libstdc++ to factor more of that code into the shared library, but
>> that's a different, more complicated question.

>
> Why it's not surprising? is the size of my threads big?
> Would you explain more.

Size of thread: basically how much you call operator new.
Size of object file: basically length of (used) source code.

The basic code is self contained and fits in less than 10 lines. The

 Pavel 10-10-2011 02:57 AM

Re: Concurrent code performance

Saeed Amrollahi wrote:
> Hi All C++ developers

....

> I compiled and ran two codes using the following commands:
> \$ g++ -std=c++0x -pthread -pedantic -o2 concurrent_sum.c++ -o
> concurrent_sum
> \$ g++ -std=c++0x -pedantic -o2 sequential_sum.c++ -o sequential_sum

....

> Is there any problem in my measurement?

Not that I can see. You had a decent speed-up in real-time of 50% (or 33%,
depending on how you count). Speed-up factor would almost never be exactly 2 for
2 threads as there are lots of resources in your system that can become
Memory" by Ulrich Drepper, available, for example, at
www.akkadia.org/drepper/cpumemory.pdf); also, context switches would take some
/proc/cpuinfo' to say for sure); else I can't explain that you saw the reduction
in real time at all.

> Why the concurrent object code
> is 7+ times bigger than sequential version? How do you explain it?

.....
The concurrent code has thread library code linked in whereas the sequential one
doesn't. As you use GCC, you can run 'objdump -d' on your concurrent binary and
see what exactly functions were added as compared to the sequential version.

>
> -- Saeed Amrollahi

Hope this helps,
-Pavel

 Goran 10-10-2011 07:06 AM

Re: Concurrent code performance

On Oct 9, 2:42*pm, Saeed Amrollahi <amrollahi.sa...@gmail.com> wrote:
> Hi All C++ developers
>
> I learned (from theory and practice) the most important benefit of
> qualified multi-threaded programming is better performance.
> Here it is a (not well-written) concurrent (two worker threads)
> and sequential version
> of Summation of [0, 2000000[ interval.
> Of course my concurrent code is not good. It's for exposition only.
>
> // concurrent code
> #include <iostream>
>
> using namespace std;
> long long s1 = 0, s2 = 0;
> void Sum(int first, int last, long long& res) // calculate the sum in
> [first, last[ interval
> {
> * while (first < last)
> * * res += first++;
>
> }
>
> int main()
> {
> * long long r1 = 0, r2 = 0;
> * thread t1(Sum, 0, 1000000, std::ref(r1));
> * thread t2(Sum, 1000000, 2000000, std::ref(r2));
> * t1.join(); t2.join();
> * cout << r1 + r2 << '\n';
>
> * return 0;
>
> }
>
> // sequential code
> #include <iostream>
>
> using namespace std;
> void Sum(int first, int last, long long& res) // calculate the sum in
> [first, last[ interval
> {
> * while (first < last)
> * *res += first++;
>
> }
>
> int main()
> {
> * long long r = 0;
> * Sum(0, 2000000, r);
>
> * cout << r<< '\n';
>
> * return 0;
>
> }
>
> I compiled and ran two codes using the following commands:
> \$ g++ -std=c++0x -pthread -pedantic -o2 concurrent_sum.c++ -o
> concurrent_sum
> \$ g++ -std=c++0x *-pedantic -o2 sequential_sum.c++ -o sequential_sum
> \$ time ./concurrent_sum
> 1999999999000
> real * 0m0.014s
> user *0m0.016s
> sys * 0m0.004s
> \$ time ./sequential_sum
> 1999999999000
> real * 0m0.021s
> user *0m0.020s
> sys * 0m0.000s
> Of course the time command differs in several
> execution, but honestly, I didn't see so much difference between two
> execution.
> In addition the generated code for sequential_sum is about 6 KB, but
> the size
> of concurrent_code is about 45 KB.
> Is there any problem in my measurement? Why the concurrent object code
> is 7+ times bigger than sequential version? How do you explain it?

I don't see a problem. On a single-core machine, I am surprised you
saw even that much of an improvement. I guess, just like Pavel, that
on a multi-core machines. Given what it does, it should hardly touch
memory, if at all, so memory should not be a bottleneck.

Your executable is 7 times bigger because you linked in (part of)

Goran.

 Saeed Amrollahi 10-10-2011 07:34 AM

Re: Concurrent code performance

On Oct 9, 6:41*pm, Marc <marc.gli...@gmail.com> wrote:
> Saeed Amrollahi *wrote:
> > Can you explain your answer in more detail? the cache line thing
> > and false sharing ...

>
> There's a wikipedia article on this...
>
> >> Your basic code is very simple: a loop, a couple additions, almost
> >> nothing. It is not really surprising that code meant to handle
> >> multiple threads is several times larger. Now it might be possible for
> >> libstdc++ to factor more of that code into the shared library, but
> >> that's a different, more complicated question.

>
> > Why it's not surprising? is the size of my threads big?
> > Would you explain more.

>
> Size of thread: basically how much you call operator new.
> Size of object file: basically length of (used) source code.
>
> The basic code is self contained and fits in less than 10 lines. The

-- Saeed

 Saeed Amrollahi 10-10-2011 07:40 AM

Re: Concurrent code performance

On Oct 10, 5:57*am, Pavel
<pauldontspamt...@removeyourself.dontspam.yahoo> wrote:
> Saeed Amrollahi wrote:
> > Hi All C++ developers

>
> ...
>
> > I compiled and ran two codes using the following commands:
> > \$ g++ -std=c++0x -pthread -pedantic -o2 concurrent_sum.c++ -o
> > concurrent_sum
> > \$ g++ -std=c++0x *-pedantic -o2 sequential_sum.c++ -o sequential_sum

>
> ...
>
> > Is there any problem in my measurement?

>
> Not that I can see. You had a decent speed-up in real-time of 50% (or 33%,
> depending on how you count). Speed-up factor would almost never be exactly 2 for
> 2 threads as there are lots of resources in your system that can become
> bottlenecks

<Nodded>
> Memory" by Ulrich Drepper, available, for example, atwww.akkadia.org/drepper/cpumemory.pdf);

I know about the paper, but I haven't to read it yet.
also, context switches would take some
> time. Probably your single-CPU notebook has hyper-threading turned on (try 'cat
> /proc/cpuinfo' to say for sure); else I can't explain that you saw the reduction
> in real time at all.
>
> > Why the concurrent object code
> > is 7+ times bigger than sequential version? How do you explain it?

>
> ....
> The concurrent code has thread library code linked in whereas the sequential one
> doesn't. As you use GCC, you can run 'objdump -d' on your concurrent binary and
> see what exactly functions were added as compared to the sequential version.
>
>
>
> > * *-- Saeed Amrollahi

>
> Hope this helps,
> -Pavel

Thank you very much. Besides concurrency concepts, I learned
a few UNIX-related commands.
-- Saeed

 Saeed Amrollahi 10-10-2011 07:43 AM

Re: Concurrent code performance

On Oct 10, 10:06*am, Goran <goran.pu...@gmail.com> wrote:
> On Oct 9, 2:42*pm, Saeed Amrollahi <amrollahi.sa...@gmail.com> wrote:
>
>
>
>
>
>
>
>
>
> > Hi All C++ developers

>
> > I learned (from theory and practice) the most important benefit of
> > qualified multi-threaded programming is better performance.
> > Here it is a (not well-written) concurrent (two worker threads)
> > and sequential version
> > of Summation of [0, 2000000[ interval.
> > Of course my concurrent code is not good. It's for exposition only.

>
> > // concurrent code
> > #include <iostream>

>
> > using namespace std;
> > long long s1 = 0, s2 = 0;
> > void Sum(int first, int last, long long& res) // calculate the sum in
> > [first, last[ interval
> > {
> > * while (first < last)
> > * * res += first++;

>
> > }

>
> > int main()
> > {
> > * long long r1 = 0, r2 = 0;
> > * thread t1(Sum, 0, 1000000, std::ref(r1));
> > * thread t2(Sum, 1000000, 2000000, std::ref(r2));
> > * t1.join(); t2.join();
> > * cout << r1 + r2 << '\n';

>
> > * return 0;

>
> > }

>
> > // sequential code
> > #include <iostream>

>
> > using namespace std;
> > void Sum(int first, int last, long long& res) // calculate the sum in
> > [first, last[ interval
> > {
> > * while (first < last)
> > * *res += first++;

>
> > }

>
> > int main()
> > {
> > * long long r = 0;
> > * Sum(0, 2000000, r);

>
> > * cout << r<< '\n';

>
> > * return 0;

>
> > }

>
> > I compiled and ran two codes using the following commands:
> > \$ g++ -std=c++0x -pthread -pedantic -o2 concurrent_sum.c++ -o
> > concurrent_sum
> > \$ g++ -std=c++0x *-pedantic -o2 sequential_sum.c++ -o sequential_sum
> > \$ time ./concurrent_sum
> > 1999999999000
> > real * 0m0.014s
> > user *0m0.016s
> > sys * 0m0.004s
> > \$ time ./sequential_sum
> > 1999999999000
> > real * 0m0.021s
> > user *0m0.020s
> > sys * 0m0.000s
> > Of course the time command differs in several
> > execution, but honestly, I didn't see so much difference between two
> > execution.
> > In addition the generated code for sequential_sum is about 6 KB, but
> > the size
> > of concurrent_code is about 45 KB.
> > Is there any problem in my measurement? Why the concurrent object code
> > is 7+ times bigger than sequential version? How do you explain it?

>
> I don't see a problem. On a single-core machine, I am surprised you
> saw even that much of an improvement. I guess, just like Pavel, that
> on a multi-core machines. Given what it does, it should hardly touch
> memory, if at all, so memory should not be a bottleneck.
>
> Your executable is 7 times bigger because you linked in (part of)
>
> Goran.

Thank you Goran. I just found devising appropriate Multi-threaded
algorithms is much more difficult than just creating and using several
-- Saeed

 Jorgen Grahn 10-10-2011 01:46 PM

Re: Concurrent code performance

On Mon, 2011-10-10, Pavel wrote:
....
> As you use GCC, you can run 'objdump -d' on your concurrent binary

Nitpick: gcc has nothing to do with it. If the binary is on a format
which objdump understands, objdump will understand it. (The object
files generated by gcc happen to be in that category.)

'nm -nC foo.o' might be a simpler tool for this, by the way.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

All times are GMT. The time now is 06:37 AM.