Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > C++0x memory model and atomics, some questions

Reply
Thread Tools

C++0x memory model and atomics, some questions

 
 
Joshua Maurice
Guest
Posts: n/a
 
      09-01-2010
So, I'm trying to learn more about threading. Which papers are the
most current? Which are the most readable breakdowns which I might
comprehend without delving too deep into standardese?

I've thus far found these links to be of great use:

http://www.hpl.hp.com/techreports/2008/HPL-2008-56.pdf
http://www.open-std.org/jtc1/sc22/wg...2008/n2664.htm
http://www.open-std.org/jtc1/sc22/wg...007/n2338.html

Here's some questions.

1-
So, we now have the stringent requirement that an implementation
cannot "introduce races", including specifically that an
implementation cannot introduce a write to a memory location on a code
path where there was none. This has big implications. No longer is an
implementation allowed to store an 8 bit unit by reading teh current
value of a whole 32 or 64 bit word, use bit shifting and bit masking,
and then write back the whole 32 or 64 bit word to modify the 8 bit
memory location. This means that writing to certain smaller types
might be a lot slower now (though it's my understanding most sane
architectures did this anyway).

It's also my understanding that the "int" type was meant at some point
to be the "fastest" integer type for the implementation. However, is
this the case nowadays when "int" on a lot of 64 bit systems is 32 bit
for "compatibility reasons"? Also specifically for the atomic compare
and exchange, how can I know which ones will be implemented via a
hardware compare and swap, and which ones need to be emulated?
Offhand, I would figure out on each system which is the "efficient"
base integer type, and then I would provide a typedef, and I would use
that typedef in portable code.

2-
Related: Is anyone going to use std::atomic for anything but integer
types? Some of the implementation notes from
http://gcc.gnu.org/wiki/Atomic/GCCMM
give me pause. How efficient would a std::atomic<some_complex_type> be
vs an explicit lock guarding the relevant pieces of code? What's the
expectation of use of std::atomic on complex user types and the
expected implementation? A (static) lock per type? A lock hidden in
each atomic object which is big enough or complex enough?

3-
http://gcc.gnu.org/wiki/Atomic/GCCMM/ImplemSumm
> Optimizer effects


> It is worth noting that the impact on the optimizer is slightly different with the different models. Although the existence of an atomic load or store operations will act as an optimization barrier, some optimizations can be performed on the atomic operations themselves if they are not in sequentially consistent mode.


> For instance, dead store removal can be performed:


> x.store (a, memory_order_release);
> x.store (b, memory_order_release);


> It is perfectly safe to note that storing 'a' into 'x' is in fact a dead store, and remove it. Its also possible to move statements around, subject to the various dependencies imposed by the memory model mode.


This is incorrect, right? Specifically,
x.store (a);
x.store (b);
aka
x.store (a, memory_order_seq_cst);
x.store (b, memory_order_seq_cst);
can be optimized to
x.store (b, memory_order_seq_cst);
right? An implementation does not need to faithfully do every single
atomic load and store with memory_order_seq_cst, right?

4-
Let's consider cooperative thread canceling. Suppose each thread has a
stop flag, a std::atomic<bool>. When you want to cancel a thread, you
set that thread's stop flag. The thread periodically checks its stop
flag's value, and if it's set, it starts returning error codes (or
throwing exceptions) to break out and finish.

As I understand it, with the default memory_order_seq_cst, there is a
total order across all memory_order_seq_cst operations on all memory
locations. So, if the write and reads to the stop flag all used
memory_order_seq_cst, then the thread will notice the stop request
quite quickly, basically as soon as the scheduler schedules it.
However, with any of the other memory_order's, the update may take
arbitrary long to become visible (though QoI should make it visible
"soon"). As I don't need this stop flag to give any other additional
guarantees, I only need to decide between memory_order_relaxed and
memory_order_seq_cst, right? Should I favor one over the other? Why?

5-
I remember this one thread from a while ago on comp.lang.c+
+.moderated.
http://groups.google.com.ng/group/co...9e0dc41f977db7

It was a large discussion over the usefulness and portability of
volatile as a threading primitive. I recall that I very strongly said
don't sprinkle your code with it, though it's perhaps acceptable to
use volatile to implement a more standard primitive on whatever
machines actually guarantee useful threading semantics (like some
windows ones).

I specifically remembered that post by Andy Venikov as a potentially
legitimate use of volatile. I replied to it saying that as long as the
compiler didn't do anything silly like speculatively load some things,
and reloaded all things, then it would work. I also said that it
seemed reasonable, but I'm not sure any machine actually guarantees
any of that. I felt that something was wrong, but I was not and am not
the most knowledgeable about assembly so I couldn't put my finger on
it. Now I think I can. He has a data dependency.

http://www.open-std.org/jtc1/sc22/wg...2008/n2664.htm

What Andy Venikov was trying to do seems to be exactly the use case
for std::memory_order_consume. Right? Without that, his volatile hack
is just that, a hack.

6-
I definitely do not get all of the nuances involved with
std::memory_order_consume. The paper
http://www.open-std.org/jtc1/sc22/wg...2008/n2664.htm
goes into some details, but it's not quite formal enough for my
liking, and/or the standardese is too standardesy for my liking. Any
better sources on this?
 
Reply With Quote
 
 
 
 
Joshua Maurice
Guest
Posts: n/a
 
      09-03-2010
On Sep 1, 2:38*pm, Joshua Maurice <(E-Mail Removed)> wrote:
> 3-
> http://gcc.gnu.org/wiki/Atomic/GCCMM/ImplemSumm
>
> > Optimizer effects
> > It is worth noting that the impact on the optimizer is slightly different with the different models. Although the existence of an atomic load or store operations will act as an optimization barrier, some optimizations can be performed on the atomic operations themselves if they are not in sequentially consistent mode.
> > For instance, dead store removal can be performed:
> > *x.store (a, memory_order_release);
> > *x.store (b, memory_order_release);
> > It is perfectly safe to note that storing 'a' into 'x' is in fact a dead store, and remove it. Its also possible to move statements around, subject to the various dependencies imposed by the memory model mode.

>
> This is incorrect, right? Specifically,
> * x.store (a);
> * x.store (b);
> aka
> * x.store (a, memory_order_seq_cst);
> * x.store (b, memory_order_seq_cst);
> can be optimized to
> * x.store (b, memory_order_seq_cst);
> right? An implementation does not need to faithfully do every single
> atomic load and store with memory_order_seq_cst, right?


My reading of the draft standard
http://www.open-std.org/jtc1/sc22/wg...2010/n3035.pdf
confirms, though it strongly suggests that an implementation should
not do this in an unbounded loop.

> 4-
> Let's consider cooperative thread canceling. Suppose each thread has a
> stop flag, a std::atomic<bool>. When you want to cancel a thread, you
> set that thread's stop flag. The thread periodically checks its stop
> flag's value, and if it's set, it starts returning error codes (or
> throwing exceptions) to break out and finish.
>
> As I understand it, with the default memory_order_seq_cst, there is a
> total order across all memory_order_seq_cst operations on all memory
> locations. So, if the write and reads to the stop flag all used
> memory_order_seq_cst, then the thread will notice the stop request
> quite quickly, basically as soon as the scheduler schedules it.
> However, with any of the other memory_order's, the update may take
> arbitrary long to become visible (though QoI should make it visible
> "soon"). As I don't need this stop flag to give any other additional
> guarantees, I only need to decide between memory_order_relaxed and
> memory_order_seq_cst, right? Should I favor one over the other? Why?


Again, the draft standard
http://www.open-std.org/jtc1/sc22/wg...2010/n3035.pdf
says that an implementation "should" make atomic stores visible to
other threads in a "reasonable" amount of time. If the implementation
follows this note, then I guess memory_order_relaxed is the way to go.
I'm just concerned that the implementation might not.
 
Reply With Quote
 
 
 
 
Alexander Terekhov
Guest
Posts: n/a
 
      09-03-2010

Joshua Maurice wrote:
>
> So, I'm trying to learn more about threading. Which papers are the
> most current?


Try

http://www.cl.cam.ac.uk/~pes20/cpp/tech.pdf

regards,
alexander.
 
Reply With Quote
 
Joshua Maurice
Guest
Posts: n/a
 
      09-03-2010
On Sep 3, 4:50*am, Alexander Terekhov <(E-Mail Removed)> wrote:
> Joshua Maurice wrote:
>
> > So, I'm trying to learn more about threading. Which papers are the
> > most current?

>
> Try
>
> http://www.cl.cam.ac.uk/~pes20/cpp/tech.pdf
>
> regards,
> alexander.


Thank you very much. This is definitely one of the things I was
looking for.
 
Reply With Quote
 
Joshua Maurice
Guest
Posts: n/a
 
      09-04-2010
One other question.

-7
// Initial conditions
(not atomic) int x = 0, y = 0

// Thread 1
if (x == 1)
y = 1;

//Thread 2
if (y == 1)
x = 1;

// ----
So, given those initial conditions, for those two threads started
concurrently, can you end up with x == 1 and y == 1? Does this
(pseudo) C++ program have formally undefined behavior?

Is there real hardware which can end with x == 1 and y == 1 for a
simplistic naive translation to assembly? Or is there hypothesized
useful hardware which can do that? My naive understanding says no, but
I'm learning to question such simplistic sequentially consistent
reasoning.
 
Reply With Quote
 
Alexander Terekhov
Guest
Posts: n/a
 
      09-04-2010

Joshua Maurice wrote:
>
> One other question.
>
> -7
> // Initial conditions
> (not atomic) int x = 0, y = 0
>
> // Thread 1
> if (x == 1)
> y = 1;
>
> //Thread 2
> if (y == 1)
> x = 1;
>
> // ----
> So, given those initial conditions, for those two threads started
> concurrently, can you end up with x == 1 and y == 1? Does this
> (pseudo) C++ program have formally undefined behavior?


No and no.

>
> Is there real hardware which can end with x == 1 and y == 1 for a
> simplistic naive translation to assembly?


No.

> Or is there hypothesized
> useful hardware which can do that? My naive understanding says no, but
> I'm learning to question such simplistic sequentially consistent
> reasoning.


A hypothesized hardware can speculate that x/y is 1 and proceed with an
uncommitted modification (preserving the old value until commit) just to
roll it back (undo modification) a bit later when the speculation turns
out to be false. If such hypothesized hardware would make the
uncommitted modification visible to other threads then there would be a
problem.

regards,
alexander.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Convert Java Model to Java Model without XML erinbot@gmail.com Java 1 10-06-2006 09:00 PM
Some quick c++ memory-model concerns... Chris Thomasson C++ 0 09-03-2006 02:22 PM
some memory questions about two-d arrays Gumby C++ 7 03-21-2005 05:51 AM
Plz send me some good online link for memory model in C PKJ C++ 1 10-22-2004 01:20 AM
Some questions regarding 070-305 and hopefully some right answers. Needs correction... wink, wink ;-) Daniel Walzenbach MCSD 1 11-10-2003 12:25 AM



Advertisments