Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Subtle difference between C++0x MM and other MMs

Reply
Thread Tools

Subtle difference between C++0x MM and other MMs

 
 
Dmitriy V'jukov
Guest
Posts: n/a
 
      08-24-2008
Consider following Peterson's algorithm implementation from Wikipedia:
http://en.wikipedia.org/wiki/Peterson%27s_algorithm

flag[0] = 0
flag[1] = 0
turn = 0

P0: flag[0] = 1
turn = 1
memory_barrier()
while( flag[1] && turn == 1 );
// do nothing
// critical section
...
// end of critical section
flag[0] = 0

P1: flag[1] = 1
turn = 0
memory_barrier()
while( flag[0] && turn == 0 );
// do nothing
// critical section
...
// end of critical section
flag[1] = 0

We can implement this in Java using volatile variables, and needed
memory_barrier() will be emitted automatically by compiler.
We can implement this in C# using volatile variables, and
Thread.MemoryBarrier() as memory_barrier().
We can implement this in x86 MM using plain loads and stores, and
mfence instruction as memory_barrier().
We can implement this in C++0x using std::atomic<> and issuing loads
with memory_order_acquire, stores with memory_order_release, and
atomic_thread_fence(memory_order_seq_cst) as memory_barrier(). This is
the most straightforward translation of Java/C#/x86 implementations.

The only problem is that C++0x implementation will not work.
Personally for me, it's quite counter-intuitive. And following
question arise. What is the most simple way to translate some existing
Java/C#/x86 algorithm implementation to C++0x? It seems that it's not
so easy...

Dmitriy V'jukov
 
Reply With Quote
 
 
 
 
Peter Dimov
Guest
Posts: n/a
 
      08-24-2008
On Aug 24, 7:46*pm, "Dmitriy V'jukov" <(E-Mail Removed)> wrote:
> Consider following Peterson's algorithm implementation from Wikipedia:http://en.wikipedia.org/wiki/Peterson%27s_algorithm
>
> *flag[0] * = 0
> *flag[1] * = 0
> *turn * * *= 0
>
> *P0: flag[0] = 1
> * * turn = 1
> * * memory_barrier()
> * * while( flag[1] && turn == 1 );
> * * * * * * // do nothing
> * * // critical section
> * * ...
> * * // end of critical section
> * * flag[0] = 0
>
> P1: flag[1] = 1
> * * turn = 0
> * * memory_barrier()
> * * while( flag[0] && turn == 0 );
> * * * * * * // do nothing
> * * // critical section
> * * ...
> * * // end of critical section
> * * flag[1] = 0
>
> We can implement this in Java using volatile variables, and needed
> memory_barrier() will be emitted automatically by compiler.


And the C++ equivalent is to use seq_cst load and stores, which are
equivalent to Java volatiles.

> We can implement this in C# using volatile variables, and
> Thread.MemoryBarrier() as memory_barrier().
> We can implement this in x86 MM using plain loads and stores, and
> mfence instruction as memory_barrier().
> We can implement this in C++0x using std::atomic<> and issuing loads
> with memory_order_acquire, stores with memory_order_release, and
> atomic_thread_fence(memory_order_seq_cst) as memory_barrier(). This is
> the most straightforward translation of Java/C#/x86 implementations.
>
> The only problem is that C++0x implementation will not work.


Why will it not work?
 
Reply With Quote
 
 
 
 
Dmitriy V'jukov
Guest
Posts: n/a
 
      08-24-2008
On 24 , 21:52, Peter Dimov <(E-Mail Removed)> wrote:
> On Aug 24, 7:46 pm, "Dmitriy V'jukov" <(E-Mail Removed)> wrote:
>
>
>
> > Consider following Peterson's algorithm implementation from Wikipedia:http://en.wikipedia.org/wiki/Peterson%27s_algorithm

>
> > flag[0] = 0
> > flag[1] = 0
> > turn = 0

>
> > P0: flag[0] = 1
> > turn = 1
> > memory_barrier()
> > while( flag[1] && turn == 1 );
> > // do nothing
> > // critical section
> > ...
> > // end of critical section
> > flag[0] = 0

>
> > P1: flag[1] = 1
> > turn = 0
> > memory_barrier()
> > while( flag[0] && turn == 0 );
> > // do nothing
> > // critical section
> > ...
> > // end of critical section
> > flag[1] = 0

>
> > We can implement this in Java using volatile variables, and needed
> > memory_barrier() will be emitted automatically by compiler.

>
> And the C++ equivalent is to use seq_cst load and stores, which are
> equivalent to Java volatiles.



Yes, it's possible to implement any algorithm that relying on
sequentially consistent memory model in C++0x using seq_cst atomic
operations. But! Seq_cst atomic operations, especially stores, can be
quite expensive. So, one has general desire to use weaker operations,
like store-release and load-acquire. And in Java/C#/x86 it's possible
to implement Peterson's algorithm using weak operations + 1 strong
fence. In C++0x - NOT.


> > We can implement this in C# using volatile variables, and
> > Thread.MemoryBarrier() as memory_barrier().
> > We can implement this in x86 MM using plain loads and stores, and
> > mfence instruction as memory_barrier().
> > We can implement this in C++0x using std::atomic<> and issuing loads
> > with memory_order_acquire, stores with memory_order_release, and
> > atomic_thread_fence(memory_order_seq_cst) as memory_barrier(). This is
> > the most straightforward translation of Java/C#/x86 implementations.

>
> > The only problem is that C++0x implementation will not work.

>
> Why will it not work?



I mean not every C++0x implementation of Peterson's algorithm, but
particular implementation which uses store-release, load-acquire + 1
seq_cst fence.


Dmitriy V'jukov
 
Reply With Quote
 
Peter Dimov
Guest
Posts: n/a
 
      08-24-2008
On Aug 24, 9:44*pm, "Dmitriy V'jukov" <(E-Mail Removed)> wrote:
> And in Java/C#/x86 it's possible
> to implement Peterson's algorithm using weak operations + 1 strong
> fence. In C++0x - NOT.


How would you implement Peterson's algorithm in Java using weak
operations and a fence? Java doesn't have weak operations or fences.
Its volatile loads and stores are equivalent to C++MM's seq_cst loads
and stores. Both promise sequential consistency (no more and no less).

> I mean not every C++0x implementation of Peterson's algorithm, but
> particular implementation which uses store-release, load-acquire + 1
> seq_cst fence.


Why do you think that this implementation doesn't work?
 
Reply With Quote
 
Peter Dimov
Guest
Posts: n/a
 
      08-24-2008
On Aug 24, 11:53*pm, Peter Dimov <(E-Mail Removed)> wrote:
> On Aug 24, 9:44*pm, "Dmitriy V'jukov" <(E-Mail Removed)> wrote:


> > I mean not every C++0x implementation of Peterson's algorithm, but
> > particular implementation which uses store-release, load-acquire + 1
> > seq_cst fence.

>
> Why do you think that this implementation doesn't work?


I think I see your point. Getting back to

P0: flag[0] = 1
turn = 1
memory_barrier()
while( flag[1] && turn == 1 );
// do nothing
// critical section
...
// end of critical section
flag[0] = 0

P1: flag[1] = 1
turn = 0
memory_barrier()
while( flag[0] && turn == 0 );
// do nothing
// critical section
...
// end of critical section
flag[1] = 0

It's easy to show that P0 and P1 can't block each other forever;
eventually they will agree on the value of 'turn' and one of them will
proceed.

The case where P0 sees flag[1] == 0 and P1 sees flag[0] == 0 is a
classic SC violation example and every reasonable definition of
memory_barrier rules it out.

The interesting case you must have had in mind is the sequence

P1:flag[1] = 1
P1:turn = 0
P0:flag[0] = 1
P0:turn = 1
P0:memory_barrier

Can P0 now see flag[1] == 0? (P1 will later see turn == 1 and enter
the critical section.)

I wonder whether the formal CLR memory model (or even the current
formal x86 memory model) disallows this. (XCHG for turn instead of a
fence should work.)

I think that the C++MM does, if the condition is while( turn == 1 &&
flag[1] ). P0 seeing its own turn=1 doesn't break the release sequence
started by P1:turn=0 because turn=1 is executed by the same thread
(first bullet in 1.10/6). So P1:turn=0 synchronizes-with the read from
'turn' in P0 and ensures that P1:flag[1]=1 is seen.
 
Reply With Quote
 
Dmitriy V'jukov
Guest
Posts: n/a
 
      08-25-2008
On 25 , 02:47, Peter Dimov <(E-Mail Removed)> wrote:
> On Aug 24, 11:53 pm, Peter Dimov <(E-Mail Removed)> wrote:
>
> > On Aug 24, 9:44 pm, "Dmitriy V'jukov" <(E-Mail Removed)> wrote:
> > > I mean not every C++0x implementation of Peterson's algorithm, but
> > > particular implementation which uses store-release, load-acquire + 1
> > > seq_cst fence.

>
> > Why do you think that this implementation doesn't work?

>
> I think I see your point. Getting back to
>
> P0: flag[0] = 1
> turn = 1
> memory_barrier()
> while( flag[1] && turn == 1 );
> // do nothing
> // critical section
> ...
> // end of critical section
> flag[0] = 0
>
> P1: flag[1] = 1
> turn = 0
> memory_barrier()
> while( flag[0] && turn == 0 );
> // do nothing
> // critical section
> ...
> // end of critical section
> flag[1] = 0
>
> It's easy to show that P0 and P1 can't block each other forever;
> eventually they will agree on the value of 'turn' and one of them will
> proceed.
>
> The case where P0 sees flag[1] == 0 and P1 sees flag[0] == 0 is a
> classic SC violation example and every reasonable definition of
> memory_barrier rules it out.
>
> The interesting case you must have had in mind is the sequence
>
> P1:flag[1] = 1
> P1:turn = 0
> P0:flag[0] = 1
> P0:turn = 1
> P0:memory_barrier
>
> Can P0 now see flag[1] == 0? (P1 will later see turn == 1 and enter
> the critical section.)


Exactly! And this behavior is very counter-intuitive for me!

> I wonder whether the formal CLR memory model (or even the current
> formal x86 memory model) disallows this. (XCHG for turn instead of a
> fence should work.)


As for x86 MM, I think that Yes, such behavior is disallowed. x86 MM
defines possible reorderings in one thread. And then intended reading
is that you must try to construct interleaving of threads (taking into
account reorderings in threads). If it's possible to construct
interleaving, then behavior is allowed. If it's impossible - then
disallowed.

For Peterson's algorithm it's impossible to construct interleaving,
which will break algorithm.

For for CLR, it's very informal. But I think intended reading is the
same as for x86... just because Microsoft targets mainly to x86

But for C++0x the fact that it's impossible to construct interleaving
doesn't matter...


> I think that the C++MM does, if the condition is while( turn == 1 &&
> flag[1] ). P0 seeing its own turn=1 doesn't break the release sequence
> started by P1:turn=0 because turn=1 is executed by the same thread
> (first bullet in 1.10/6). So P1:turn=0 synchronizes-with the read from
> 'turn' in P0 and ensures that P1:flag[1]=1 is seen.



"by the same thread" which execute release, not "by the same thread"
which execute acquire.
So this won't work too.


Dmitriy V'jukov
 
Reply With Quote
 
Dmitriy V'jukov
Guest
Posts: n/a
 
      08-25-2008
On 25 , 00:53, Peter Dimov <(E-Mail Removed)> wrote:

> How would you implement Peterson's algorithm in Java using weak
> operations and a fence? Java doesn't have weak operations or fences.
> Its volatile loads and stores are equivalent to C++MM's seq_cst loads
> and stores. Both promise sequential consistency (no more and no less).



Java volatiles promise more. volatile store is release, and volatile
load is acquire. for x86 this means that plain stores and loads will
be used. Well, yes, sometimes heavy membar will be emitted.
C++0x's seq_cst atomic store on x86 will be locked instruction.
So, in my opinion, translating Java volatiles to C++0x's seq_cst
atomics is not fair.
IMVHO more fair way is to translate volatile store to store-release,
volatile load to load-acquire, and manually emit seq_cst fence. At
least this is what initially comes to mind.

Dmitriy V'jukov
 
Reply With Quote
 
Peter Dimov
Guest
Posts: n/a
 
      08-25-2008
On Aug 25, 9:27*am, "Dmitriy V'jukov" <(E-Mail Removed)> wrote:

> "by the same thread" which execute release, not "by the same thread"
> which execute acquire.
> So this won't work too.


Yes, you are right.
 
Reply With Quote
 
Peter Dimov
Guest
Posts: n/a
 
      08-25-2008
On Aug 25, 9:37*am, "Dmitriy V'jukov" <(E-Mail Removed)> wrote:
> On 25 , 00:53, Peter Dimov <(E-Mail Removed)> wrote:
>
> > How would you implement Peterson's algorithm in Java using weak
> > operations and a fence? Java doesn't have weak operations or fences.
> > Its volatile loads and stores are equivalent to C++MM's seq_cst loads
> > and stores. Both promise sequential consistency (no more and no less).

>
> Java volatiles promise more. volatile store is release, and volatile
> load is acquire.


Exactly the same as seq_cst.

> for x86 this means that plain stores and loads will
> be used. Well, yes, sometimes heavy membar will be emitted.
> C++0x's seq_cst atomic store on x86 will be locked instruction.


Java VMs will also (be changed to) emit XCHG for volatile stores,
because plain stores do not guarantee SC, no matter how many MFENCEs
one uses. x86 is now officially not TSO.

T0: x = 1
T1: y = 1
T2: r1 = x; r2 = y; // 1 0 allowed
T3: r3 = y; r4 = x; // 1 0 allowed

 
Reply With Quote
 
Dmitriy V'jukov
Guest
Posts: n/a
 
      08-25-2008
On 25 авг, 11:12, Peter Dimov <(E-Mail Removed)> wrote:
> On Aug 25, 9:37*am, "Dmitriy V'jukov" <(E-Mail Removed)> wrote:
>
> > On 25 Á×Ç, 00:53, Peter Dimov <(E-Mail Removed)> wrote:

>
> > > How would you implement Peterson's algorithm in Java using weak
> > > operations and a fence? Java doesn't have weak operations or fences.
> > > Its volatile loads and stores are equivalent to C++MM's seq_cst loads
> > > and stores. Both promise sequential consistency (no more and no less)..

>
> > Java volatiles promise more. volatile store is release, and volatile
> > load is acquire.

>
> Exactly the same as seq_cst.
>
> > for x86 this means that plain stores and loads will
> > be used. Well, yes, sometimes heavy membar will be emitted.
> > C++0x's seq_cst atomic store on x86 will be locked instruction.

>
> Java VMs will also (be changed to) emit XCHG for volatile stores,
> because plain stores do not guarantee SC, no matter how many MFENCEs
> one uses. x86 is now officially not TSO.
>
> T0: x = 1
> T1: y = 1
> T2: r1 = x; r2 = y; // 1 0 allowed
> T3: r3 = y; r4 = x; // 1 0 allowed



My example is not about total order, it's about ordering just between
2 threads.

Dmitriy V'jukov
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Subtle difference between boolean value and boolean comparison? Metre Meter Javascript 7 08-06-2010 08:40 PM
How to use python-mms to send mms guptha Python 0 02-02-2010 06:05 AM
How to best explain a "subtle" difference between Python and Perl ? Palindrom Python 7 08-13-2008 04:39 PM
Difference between bin and obj directories and difference between project references and dll references jakk ASP .Net 4 03-22-2005 09:23 PM
Subtle difference??? KeithS Firefox 2 03-02-2005 03:18 PM



Advertisments