In article <edee09a7-fbc2-41fd-84b4-
>,
says...
[ ... using a memory barrier ]
> In practice, it's
> generally not worth it, since the additional assembler generally
> does more or less what the outer mutex (which you're trying to
> avoid) does, and costs about the same in run time.
I have to disagree with both of these. First, a memory barrier is
quite a bit different from a mutex. Consider (for example) a store
fence. It simply says that stores from all previous instructions must
complete before any stores from subsequent instructions (and a read
barrier does the same, but for reads). It's basically equivalent to a
sequence point, but for real hardware instead of a conceptual model.
As far as cost goes: a mutex normally uses kernel data, so virtually
every operation requires a switch from user mode to kernel mode and
back. The cost for that will (of course) vary between systems, but is
almost always fairly high (figure a few thousand CPU cycles as a
reasonable minimum).
A memory barrier will typically just prevent combining a subsequent
write with a previous one. As long as there's room in the write queue
for both pieces of data, there's no cost at all. In the (normally
rare) case that the CPU's write queue is full, a subsequent write has
to wait for a previous write to complete to create an empty spot in
the write queue. Even in this worst case, it's generally going to be
around an order of magnitude faster than a switch to kernel mode and
back.
> > The problem is that C++ (up through the 2003 standard) simply
> > lacks memory barriers. Double-checked locking is one example
> > of code that _needs_ a memory barrier to work correctly -- but
> > it's only one example of many.
>
> It can be made to work with thread local storage as well,
> without memory barriers.
Well, yes -- poorly stated on my part. It requires _some_ sort of
explicit support for threading that's missing from the current and
previous versions of C++, but memory barriers aren't the only
possible one.
[ ... ]
> Yes. The "problem" with DCLP is in fact just a symptom of a
> larger problem, of people not understanding what is and is not
> guaranteed (and to a lesser degree, of people not really
> understanding the costs---acquiring a non-contested mutex is
> really very, very cheap, and usually not worth trying to avoid).
At least under Windows, this does not fit my experience. Of course,
Windows has its own cure (sort of) for the problem -- rather than
using a mutex (with its switch to/from kernel mode) you'd usually use
a critical section instead. Entering a critical section that's not in
use really is very fast.
Then again, a critical section basically is itself just a double-
checked lock (including the necessary memory barriers). They have two
big limitations: first, unlike a normal mutex, they only work between
threads in a single process. Second, they can be quite slow when/if
there's a great deal of contention for the critical section.
--
Later,
Jerry.