wrote:
[]
> And here is what I got from my workstation ( Dual AMD Opteron machine,
> RHEL 3)
>
> [nan@eudyptula test]$ for t in 1 2; do g++ -DTHRD_NUM=$t
> mt_map_test.cpp -lpthread ; time ./a.out; done
>
> real 0m1.390s
> user 0m1.280s
> sys 0m0.120s
>
> real 0m3.450s
> user 0m5.320s
> sys 0m1.170s
>
>
> I expected that the 2 times should be roughly equal. But clearly I
> experienced significant slowdown with 2 threads.The same also happened
> to a dual Intel Xeon machine. I suspect the internal stl map
> implementation is improper.
>
> I've spent hours googling without any answer. I really need advice from
> a C++ expert. Thanks a lot.
The problem may be in libstdc++ caching allocator. See
http://gcc.gnu.org/onlinedocs/libstd...allocator.html
I ran your code on a Dual Xeon 2.8 box with caching disabled and
enabled. Here are my results:
my@devel:~/src/exp> cat /etc/issue
Welcome to SuSE Linux 9.2 (i586) - Kernel \r (\l).
my@devel:~/src/exp> uname -a
Linux devel 2.6.11.4-20a-smp #1 SMP Wed Mar 23 21:52:37 UTC 2005 i686
i686 i386 GNU/Linux
my@devel:~/src/exp> g++ --version
g++ (GCC) 3.3.4 (pre 3.3.5 20040809)
Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is
NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.
my@devel:~/src/exp> for t in 1 2; do g++ -O3 -DTHRD_NUM=$t exp.cpp
-pthread ; GLIBCPP_FORCE_NEW=1 GLIBCXX_FORCE_NEW=1 time -p ./a.out;
done
real 1.16
user 1.10
sys 0.05
real 1.37
user 2.55
sys 0.14
my@devel:~/src/exp> for t in 1 2; do g++ -O3 -DTHRD_NUM=$t exp.cpp
-pthread ; time -p ./a.out; done
real 1.12
user 1.06
sys 0.05
real 3.48
user 5.28
sys 1.40
In the former case with caching disabled it's clear from the real time
numbers that the task is scaled well on the two processors.
A little boost is gained using hoard allocator:
my@devel:~/src/exp> for t in 1 2; do g++ -O3 -DTHRD_NUM=$t exp.cpp
-pthread ; LD_PRELOAD=/usr/local/lib/libhoard.so:/usr/lib/libdl.so
GLIBCPP_FORCE_NEW=1 GL\IBCXX_FORCE_NEW=1 time -p ./a.out; done
real 1.15
user 1.09
sys 0.05
real 1.31
user 2.46
sys 0.13