Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Please disprove this Double-Checked Locking "fix"

Reply
Thread Tools

Please disprove this Double-Checked Locking "fix"

 
 
Pavel
Guest
Posts: n/a
 
      05-02-2011
Leigh Johnston wrote:
> On 01/05/2011 22:44, Leigh Johnston wrote:
>> On 01/05/2011 22:26, Pavel wrote:
>>> Leigh Johnston wrote:
>>>> On 30/04/2011 23:54, James Kanze wrote:
>>>>> On Apr 26, 5:58 pm, jl_p...@hotmail.com wrote:
>>>>>
>>>>>> Recently I've been reading up on "Double-Checked Locking" in
>>>>>> C++ and how it's often implemented imperfectly.
>>>>>
>>>>> You mean, how it is impossible to implement in standard C++
>>>>> (03).
>>>>>
>>>>>> The Article "C++ and the Perils of Double-Checked Locking" by
>>>>>> Scott Meyers and Andrei Alexandrescu
>>>>>> (http://www.aristeia.com/Papers/DDJ_J...04_revised.pdf)
>>>>>> provides a good overview of how it's usually done and why it's
>>>>>> often inadequate.
>>>>>
>>>>> The standard solution for implementing a singleton works just
>>>>> fine:
>>>>>
>>>>> Singleton* Singleton::instance()
>>>>> {
>>>>> ScopedLock lock(mutex);
>>>>> if ( pInstance == NULL )
>>>>> pInstance = new Singleton;
>>>>> return pInstance;
>>>>> }
>>>>>
>>>>
>>>> How does this prevent CPU reordering of stores to the pInstance pointer
>>>> and the object pInstance points to?
>>> Any system-dependent mutex has to behave like a double-side memory
>>> barrier (it's a follow-up from the mutual exclusion requirement).

>>
>> I didn't realize this was guaranteed for all systems...
>>

>
> The ellipsis was an indication that I find your claim that all mutex
> implementations include memory barriers slightly dubious. From
> Wikipedia:
>
> "These primitives (mutexes) are *usually* implemented with the memory
> barriers required to provide the expected memory visibility semantics.
> In such environments explicit use of memory barriers is not *generally*
> necessary."
>
> I guess a mutex implementation that didn't include memory barriers on a
> system where memory barriers are needed would not be very useful.

Yes, that's what I meant. Wikipedia is a great resource to find an explanation
of a concept but it does not have to tie all ends together with the strictness
of a specification. In our case, the article on Mutual Exclusion says:

"Mutual exclusion .. algorithms are used in concurrent programming to avoid the
simultaneous use of a common resource, such as a global variable, by pieces of
computer code called critical sections. A critical section is a piece of code in
which a process or thread accesses a common resource".

In our case, we want mutual exclusion for threads and our "common resource" is
memory containing pInstance. A mutex that in fact does *not* allow a thread to
access the common resource (e.g. due to not executing a memory barrier) does not
seem to satisfy to the above definition.

Then of course another article (on memory barriers) written by other people does
not agree.. I do not see an issue with that. Probably they saw something called
"mutex" that required the mutex client code to additionally exercise memory
barrier.. in my and the Wikipedia first article's view it would not be a
complete mutex, anyway. I am fully content with being in this sort of
disagreement with one or more articles at Wikipedia.

>
> /Leigh


-Pavel
 
Reply With Quote
 
 
 
 
James Kanze
Guest
Posts: n/a
 
      05-02-2011
On May 2, 12:23 am, Pavel
<pauldontspamt...@removeyourself.dontspam.yahoo> wrote:
> James Kanze wrote:


[...]
> In fact, I have never defended a Singleton pattern. IMHO it
> is the most-overused, the least-understood, the
> vaguest-defined of all GoF patterns; and few people seem to
> have read the end of the "Consequences" sub-section of
> "Singleton" section of their book (actuating Consequences #4,5
> there may invalidate the whole discussion on DCL for Singleton
> irrelevant).


I don't think that the presentation in the GoF was actually that
vague. The problem is that as presented in the GoF, it is a
form of contract enforcement: the design says that there may
only be one instance, so we design the class to enforce the
design. Which per se is probably overkill: the semantics of
most singletons are such that no reasonable programmer would
create more than one instance anyway, so it's not worth the
extra effort to forbid it. In the case of C++, however, the
singleton idiom is most often used primarily to solve order of
initialization issues; once you've created the code necessary to
solve those, making it a singleton (enforcing only one instance)
is merely a question of where you put the "public:".

--
James Kanze
 
Reply With Quote
 
 
 
 
Joshua Maurice
Guest
Posts: n/a
 
      05-02-2011
On May 1, 2:44*pm, Leigh Johnston <le...@i42.co.uk> wrote:
> On 01/05/2011 22:26, Pavel wrote:
> > Leigh Johnston wrote:
> >> How does this prevent CPU reordering of stores to the pInstance pointer
> >> and the object pInstance points to?

> > Any system-dependent mutex has to behave like a double-side memory
> > barrier (it's a follow-up from the mutual exclusion requirement).

>
> I didn't realize this was guaranteed for all systems...


Anything that calls itself a mutex or lock in the C++0x or Java memory
model, or any sane memory model for that matter, must guarantee that.
That's word the terms "mutex", "lock", and "critical section" mean. It
guarantees that properly synchronized code will execute according to
the guarantees of the language, and if some hardware needs a "double-
sided memory barrier", then it needs a "double-sided memory barrier".
 
Reply With Quote
 
Joshua Maurice
Guest
Posts: n/a
 
      05-02-2011
On May 1, 5:14*pm, Scott Meyers <NeverR...@aristeia.com> wrote:
> On 4/26/2011 10:50 AM, Pete Becker wrote:
>
> > pinstance points to. The C++0x solution is:

>
> > #include <atomic>
> > std::atomic<Singleton*> pinstance;

>
> > if (pinstance == 0) {
> > Lock lock;
> > if (pinstance == 0)
> > pinstance = new Singleton;
> > }
> > return pinstance;

>
> > Making pinstance an atomic variable ensures that all writes that "happen
> > before" the assignment to pinstance in one thread are seen before the
> > result of that assignment is seen in any other thread.

>
> True, but that's not enough to make this code work. *You also need to
> know that assignment to pinstance can't take place until the Singleton
> constructor has run to completion. *You have that guarantee in C++0x,
> too, so the code is correct, but it's not just the use of atomics that
> gives you that correctness. *You also need the additional guarantees
> that C++0x offers regarding the behavior of new expressions (compared to
> the weaker guarantees that C++03 gives you).


I don't think I'd describe it that way.

Consider:
pinstance = new Singleton;

For a simple pointer in C++03, there are no threading guarantees
given, so of course it can implement it however it wants, which is
apparently frequently enough something like the following pseudo-code
pinstance = malloc(sizeof(Singleton));
new(pinstance) Singleton();
Even under POSIX, to detect the difference between the above and the
following pseudo-code
temp = malloc(sizeof(Singleton));
new(temp) Singleton();
pinstance = temp;
would require code that has a race condition, which means undefined
behavior, which means that the compiler can do whatever it wants.

In C++0x, the situation is unchanged for a simple pointer. There are
no guarantees specific to "new expressions" which change this. The
compiler is allowed to implement it in exactly the same way. If you
have some code that could detect this, then you still have a race
condition, and thus undefined behavior, and thus all bets are off, so
the compiler free to do as it wants.

What does change is when we throw std::atomic into the mix. When
pinstance is of type std::atomic, the following:
pinstance = new Singleton;
is equivalent to the following ala operator overloading pseudo-code
(forgive me for not knowing the specific function name offhand):
pinstance.set(new Singleton);
And C++0x does have very specific guarantees concerning std::atomic.
It doesn't matter that the argument is a new expression. It could be
any sort of expression or function call. There is no guarantee
specific to new expressions. There are specific guarantees relating to
std::atomic::set and std::atomic::get, and there are guarantees that
describe all expressions, but not new expressions in particular.

Perhaps you were talking about the guarantees specific to function
local static variables? C++0x does guarantee that their initialization
is properly synchronized so that the first access will initialize it,
and all other threads will wait until it's done, and all other threads
will properly see the initialized function local static variable.
However, Pete Becker's code does not use a function local static, and
even then the guarantees about function local statics are not specific
to new expressions. You could initialize one with a normal function
call or some other expression which isn't a new expression, and the
same guarantees apply.
 
Reply With Quote
 
Chris M. Thomasson
Guest
Posts: n/a
 
      05-04-2011
"Leigh Johnston" <> wrote in message
news:aLSdnS3SA-...
[...]
>> On 01/05/2011 22:26, Pavel wrote:

[...]
>>> Any system-dependent mutex has to behave like a double-side memory
>>> barrier (it's a follow-up from the mutual exclusion requirement).

>>
>> I didn't realize this was guaranteed for all systems...
>>

>
> The ellipsis was an indication that I find your claim that all mutex
> implementations include memory barriers slightly dubious. From
> Wikipedia:
>
> "These primitives (mutexes) are *usually* implemented with the memory
> barriers required to provide the expected memory visibility semantics. In
> such environments explicit use of memory barriers is not *generally*
> necessary."
>
> I guess a mutex implementation that didn't include memory barriers on a
> system where memory barriers are needed would not be very useful.


FWIW, there are mutex implementations that can "skip" explicit memory
barrier operations. You basically need to think in term of "Asymmetric
Synchronization". Here is some more information on the subject:

http://blogs.sun.com/dave/resource/A...ronization.txt

http://www.trl.ibm.com/people/kawati...chiya05phd.pdf
(chapter 6)



 
Reply With Quote
 
Pavel
Guest
Posts: n/a
 
      05-06-2011
James Kanze wrote:
> On May 2, 12:23 am, Pavel
> <pauldontspamt...@removeyourself.dontspam.yahoo> wrote:
>> James Kanze wrote:

>
> [...]
>> In fact, I have never defended a Singleton pattern. IMHO it
>> is the most-overused, the least-understood, the
>> vaguest-defined of all GoF patterns; and few people seem to
>> have read the end of the "Consequences" sub-section of
>> "Singleton" section of their book (actuating Consequences #4,5
>> there may invalidate the whole discussion on DCL for Singleton
>> irrelevant).

>
> I don't think that the presentation in the GoF was actually that
> vague. The problem is that as presented in the GoF, it is a
> form of contract enforcement: the design says that there may
> only be one instance,

That's where the vagueness started (in definition of the problem, not the
analysis and especially consequences which most people don't read to the end).

"Only one" -- but in what context?
One per a process?
a thread?
the whole world?
a CPU?
one computer?
corporate network?
a region of the corporate network?

all of the above make sense and are being actively used. 50% of singletons that
were defined in single-threaded programs as one-per-process (and, hence,
per-thread), should be taken to TLS and made "one-per-thread" (and hence,
"many-per-process) when converting to multi-threaded program. The other 50%
should not (but of these 50%, some would gain from changing the requirement to
"only one per machine" and becoming servers and yet others could perfectly
become a pool with many-to-many relationship to the available threads or CPUs --
or cores or who knows what). The techniques for enforcing the "singleness"
contracts above are meaningfully different.

The other vagueness of "only one" is illustrated by the question "during which
period of time"? Should it be the only one intance per life of a process or a
thread? Or it should outlive some processes or threads? Or there should be
exactly one (or no more than one?) instance "at at time" but it can be
re-created several times during the life of a process (or a thread)? All of the
above are useful in particular circumstances and enforcement techniques of
differ significantly on the actual specific requirements (Andrei Alexandrescu
illustrated the latter requirement with his "Phoenix singleton" composed of a
set of different generic policies. Still he did not cover many of the above
variances and I can't imagine a useful implementation that does. Even his
singleton is more proof-of-concept than production-ready code despite relatively
high quality (and of course it does not even try to address the performance
issue of thread-safe singleton that DCL tried to solve. Phoenix singleton does
not work with DCL -- or fixed-with-atomic-DCL -- too well)).

so we design the class to enforce the
> design. Which per se is probably overkill: the semantics of
> most singletons are such that no reasonable programmer would
> create more than one instance anyway, so it's not worth the
> extra effort to forbid it. In the case of C++, however, the
> singleton idiom is most often used primarily to solve order of
> initialization issues; once you've created the code necessary to
> solve those, making it a singleton (enforcing only one instance)
> is merely a question of where you put the "public:".

Yeah, that's another bunch of good questions (and don't even let me start on
order-of-destruction issues). I prefer explicitly give one object to the
constructor of the another when one depends on another or create them within a
manager object that ensures the sequence. Then you end up with one manager for
all interdependent "singletons" for a particular program. Make next step and
create that manager in "main()" (or, if you are a library writer, make user
create it in his "main()", or, if your user is a library writer, too, make him
include your manager in his own manager that he has to make his user to create
in main etc recursively) and there is no need for a singleton. BUT.. this only
works if no one in the chain of library writers decided to put singleness inside
his or her class (that is, s/he did not give his or her users a choice to create
an object of class defined in his/her library at will). I am trying not to be
such a nuisance; that's why I forgot when there was the last time I decided to
encapsulate singleness in my class.

>
> --
> James Kanze


-Pavel
 
Reply With Quote
 
Pavel
Guest
Posts: n/a
 
      05-06-2011
Chris M. Thomasson wrote:
> "Leigh Johnston"<> wrote in message
> news:aLSdnS3SA-...
> [...]
>>> On 01/05/2011 22:26, Pavel wrote:

> [...]
>>>> Any system-dependent mutex has to behave like a double-side memory
>>>> barrier (it's a follow-up from the mutual exclusion requirement).
>>>
>>> I didn't realize this was guaranteed for all systems...
>>>

>>
>> The ellipsis was an indication that I find your claim that all mutex
>> implementations include memory barriers slightly dubious. From
>> Wikipedia:
>>
>> "These primitives (mutexes) are *usually* implemented with the memory
>> barriers required to provide the expected memory visibility semantics. In
>> such environments explicit use of memory barriers is not *generally*
>> necessary."
>>
>> I guess a mutex implementation that didn't include memory barriers on a
>> system where memory barriers are needed would not be very useful.

>
> FWIW, there are mutex implementations that can "skip" explicit memory
> barrier operations. You basically need to think in term of "Asymmetric
> Synchronization". Here is some more information on the subject:
>
> http://blogs.sun.com/dave/resource/A...ronization.txt
>
> http://www.trl.ibm.com/people/kawati...chiya05phd.pdf
> (chapter 6)

Thanks Chris for interesting resources! The techniques described there do show
how to avoid executing memory barriers in both threads and still achieve mutual
exclusion. My statement above still holds: these mutices "behave like a
double-side memory barrier" (quoting myself) without actually executing them.
They ensure the other threads sees the changes in the correct order which
answers the original Leigh's question ("How does this prevent CPU reordering of
stores to the pInstance pointer and the object pInstance points to?").

All that damned C++ "as-if" legalese.. I have never imagined myself using such
language by accident .. but that exactly what happened.

-Pavel
 
Reply With Quote
 
Joshua Maurice
Guest
Posts: n/a
 
      05-12-2011
On Apr 26, 11:16*am, Joshua Maurice <joshuamaur...@gmail.com> wrote:
> Here's how you can break your code. Note again the general problem
> that there are no guarantees that could even make it work, so I want
> you to focus on the above basic problem, and do not spend too much
> time on this, but I present it completeness. For example:
>
> * * Singleton* Singleton::instance() {
> * * * if (pInstance == 0) {
> * * * * Lock lock;
> * * * * if (pInstance == 0) {
> * * * * * Singleton* temp = new Singleton; *// initialize to temp
> * * * * * secondLock.lock();
> * * * * * pInstance = temp; *// assign temp to pInstance
> * * * * * secondLock.unlock();
> * * * * }
> * * * }
> * * * return pInstance;
> * * }
>
> A sufficiently smart compiler is allowed to move things from before a
> lock to after the lock, and from after an unlock to before a lock. It
> can transform the above to:
>
> * * Singleton* Singleton::instance() {
> * * * if (pInstance == 0) {
> * * * * Lock lock;
> * * * * if (pInstance == 0) {
> * * * * * secondLock.lock();
> * * * * * Singleton* temp = new Singleton; *// initialize to temp
> * * * * * pInstance = temp; *// assign temp to pInstance
> * * * * * secondLock.unlock();
> * * * * }
> * * * }
> * * * return pInstance;
> * * }
>
> And once we get that, it's trivial to change it to:
>
> * * Singleton* Singleton::instance() {
> * * * if (pInstance == 0) {
> * * * * Lock lock;
> * * * * if (pInstance == 0) {
> * * * * * secondLock.lock();
> * * * * * pInstance = new Singleton;
> * * * * * secondLock.unlock();
> * * * * }
> * * * }
> * * * return pInstance;
> * * }
>
> Which means we're back to screwed for the reasons known to you. I
> didn't even need to resort to the wonderful DEC Alpha and its split
> cache, but I could have.
>
> You need to reread the paper which you cited. Here's the link again
> for your benefit.http://www.aristeia.com/Papers/DDJ_J...04_revised.pdf
> And pay attention this time.


I was looking over this while searching for interview questions, and I
realized I made a mistake. You cannot make the first reordering so
easily due to the lock nonsense inside the constructor. However, a
sufficiently smart compiler could notice your clever ruse, optimize
away the assert as always true, see a lock and unlock pair guarding
nothing, optimize that away, and then move the assignment to temp past
the mutex acquire, as demonstrated above.

Of course, as I tried to emphasize, and as I will emphasize again now,
that's not the only kind of thing which can break it. Cache coherency
problems can lead to quite bizarre situations, especially with a
processor with a weak memory model. Of course, the most important
detail is that you are not given guarantees that it will work. Thus
implementers, hardware and compiler and linker, and anything new which
hasn't been invented yet, are free to break your code, because it has
undefined behavior, because your program breaks the rules.
 
Reply With Quote
 
Gerhard Fiedler
Guest
Posts: n/a
 
      05-13-2011
Joshua Maurice wrote:

> However, a sufficiently smart compiler could notice your clever ruse,
> optimize away the assert as always true, see a lock and unlock pair
> guarding nothing, optimize that away, and then move the assignment to
> temp past the mutex acquire, as demonstrated above.


Regarding the compiler optimizing away a lock/unlock pair guarding
"nothing": AIUI both lock and unlock need to provide certain fences.
Therefore, again AIUI, they can't be optimized away by the compiler even
if there's nothing in between, because that would remove the fences and
alter the behavior.

Am I missing something here?

Gerhard
 
Reply With Quote
 
Joshua Maurice
Guest
Posts: n/a
 
      05-13-2011
On May 13, 3:56 pm, Gerhard Fiedler <geli...@gmail.com> wrote:
> Joshua Maurice wrote:
> > However, a sufficiently smart compiler could notice your clever ruse,
> > optimize away the assert as always true, see a lock and unlock pair
> > guarding nothing, optimize that away, and then move the assignment to
> > temp past the mutex acquire, as demonstrated above.

>
> Regarding the compiler optimizing away a lock/unlock pair guarding
> "nothing": AIUI both lock and unlock need to provide certain fences.
> Therefore, again AIUI, they can't be optimized away by the compiler even
> if there's nothing in between, because that would remove the fences and
> alter the behavior.
>
> Am I missing something here?


That may be how they're commonly implemented, but that's not the
guaranteed semantics. Two different mutexes may as a matter of fact on
a given implementation give "happens-before" effects between the two
different mutexes, but there's nothing guaranteed about it.

I was ambiguously describing the situation, possibly to the extent of
being wrong. Allow me to correct myself.

Remember the code:

Singleton* Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
Singleton* temp = new Singleton; // initialize to temp
secondLock.lock();
pInstance = temp; // assign temp to pInstance
secondLock.unlock();
}
}
return pInstance;
}

Singleton::Singleton() {
secondLock.lock();
assert( pInstance == 0 );
secondLock.unlock();
}

The compiler can expand inline as follows (pseudo-code):

Singleton* Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
Singleton* temp = (Singleton*) :perator
new(sizeof(Singleton));

secondLock.lock();
assert( pInstance == 0 );
secondLock.unlock();

secondLock.lock();
pInstance = temp; // assign temp to pInstance
secondLock.unlock();
}
}
return pInstance;
}

With that, it sees:
someMutex.lock();
<blah1>
someMutex.unlock();
<blah2>
someMutex.lock();
<blah3>
someMutex.unlock();

The compiler sees a lock, unlock, lock, unlock, in straightline code,
without branching (or exceptions, or volatile (to keep signal handlers
correct)). The compiler is totally free to replace that with:
someMutex.lock();
<blah1>
<blah2>
<blah3>
someMutex.unlock();

It may be bad from a QoI to do that. It depends. It depends heavily on
the situation. I see it as quite reasonable that the compiler could
employ some heuristics to deduce when it's a savings to combine
critical sections. In this case, <blah1> and <blah2> are actually
empty, no-ops, so it's always a savings to combine the two adjacent
critical sections as such.

So, it can transform it to:

Singleton* Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
Singleton* temp = (Singleton*) :perator
new(sizeof(Singleton));

secondLock.lock();
assert( pInstance == 0 );
pInstance = temp; // assign temp to pInstance
secondLock.unlock();
}
}
return pInstance;
}

With some simple data flow analysis, and allowed motions past locks,
it can transform it to:
Singleton* Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
secondLock.lock();
pInstance = (Singleton*) :perator new(sizeof(Singleton));
secondLock.unlock();
}
}
return pInstance;
}

----

Now, back to my original much more controversial statement - can a
compiler simply remove a lock unlock pair? Ex:
mutex.lock();
mutex.unlock();
Maybe. I mentioned "clever ruse" with whole program optimization in
mind. (However, upon thinking about it, I just showed that you don't
even need whole program optimization.) Without whole program
optimization, I think no. Could someone please more educated weigh
in?

Thus far, after 10 minutes of attempts just now to write a conforming
race-free program where you could tell the difference if a compiler
simply removed an "empty" mutex acquire release pair, the only
programs I can find are ones that would deadlock before the change,
and not deadlock after the change. A deadlock is observable behavior,
so a compiler cannot remove it for that reason.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Locking locking resolution Frontpage raiderhawk General Computer Support 0 01-08-2008 01:42 AM
increase in cpu usage on locking and locking the system sowmya.rangineni@gmail.com Computer Support 0 06-15-2007 12:06 PM
Application locking to support optimisitc locking ? Timasmith Java 4 11-01-2006 12:42 AM
Locking Error while Updating an Image---- Urgent!!! Please advice Stephen ASP .Net 1 11-30-2004 07:45 PM
please help... ...me learn C++ please please please :) KK C++ 2 10-14-2003 02:08 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57