Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Memory access vs variable access

Reply
Thread Tools

Memory access vs variable access

 
 
Gerhard Fiedler
Guest
Posts: n/a
 
      06-24-2008
Hello,

I'm not sure whether this is a problem or not, or how to determine whether
it is one.

Say memory access (read and write) happens in 64-bit chunks, and I'm
looking at 32-bit variables. This would mean that either some other
variable is also written when writing a 32-bit variable (which means that
all access to 32-bit variables is of the read-modify-write type, affecting
some other variable also), or that all 32-bit variables are stored in their
own 64-bit chunk.

With single-threaded applications, that's a mere performance question. But
with multi-threaded applications, there's no way I can imagine that would
avoid the read-modify-write problems the first alternative would create, as
it is nowhere defined what the other variable is that is also written -- so
it can't be protected by a lock. Without it being protected by a lock,
there's nothing that prevents a thread from altering it while it is in the
middle of the read-modify-write cycle, which means that the end of it will
overwrite the altered value with the old value.

However, there must be a way to deal with this, otherwise multi-threaded
applications in C++ wouldn't be possible.

What am I missing?

Thanks,
Gerhard
 
Reply With Quote
 
 
 
 
gpderetta
Guest
Posts: n/a
 
      06-24-2008
On Jun 24, 3:59*pm, Victor Bazarov <(E-Mail Removed)> wrote:
> Gerhard Fiedler wrote:
> > I'm not sure whether this is a problem or not, or how to determine whether
> > it is one.

>
> > Say memory access (read and write) happens in 64-bit chunks, and I'm
> > looking at 32-bit variables. This would mean that either some other
> > variable is also written when writing a 32-bit variable (which means that
> > all access to 32-bit variables is of the read-modify-write type, affecting
> > some other variable also), or that all 32-bit variables are stored in their
> > own 64-bit chunk.

>
> > With single-threaded applications, that's a mere performance question. But
> > with multi-threaded applications, there's no way I can imagine that would
> > avoid the read-modify-write problems the first alternative would create, as
> > it is nowhere defined what the other variable is that is also written -- so
> > it can't be protected by a lock. Without it being protected by a lock,
> > there's nothing that prevents a thread from altering it while it is in the
> > middle of the read-modify-write cycle, which means that the end of it will
> > overwrite the altered value with the old value.

>
> > However, there must be a way to deal with this, otherwise multi-threaded
> > applications in C++ wouldn't be possible.

>
> > What am I missing?

>
> The fact that C++ does not specify any of that, maybe.
>


But C++0x will. IIRC, accroding to the draft standard, an
implementation is prohibited to do many kind of speculative writes
(with the exception of bitfields) to locations that wouldn't be
written unconditionally anyway (or something like that).

If a specific architecture didn't allow 32 bit load/stores to 32 bit
objects, it would require the implementation to pad every object to
the smaller load/store granularity. Pretty much all common
architectures allow access to memory at least at 8/16/32 bit
granularity (except for DSPs I guess), so it is not a problem.

Current compilers do not implement the rule above, but thread aware
compilers approximate it well enough that, as long as you use correct
locks, things work correctly *most of the time* (some compilers have
been known to miscompile code which used trylocks for example).

> Try 'comp.programming.threads' as your starting point since it's the
> multi-threading that you're concerned about. *The problem does not seem
> to be language-specific, and as such does not belong to a language
> newsgroup.
>


Actually, discussing whether the next C++ standard prohibits
speculative writes, is language specific and definitely on topic.

--
gpd
 
Reply With Quote
 
 
 
 
Gerhard Fiedler
Guest
Posts: n/a
 
      06-24-2008
On 2008-06-24 11:50:26, gpderetta wrote:

> On Jun 24, 3:59*pm, Victor Bazarov <(E-Mail Removed)> wrote:
>> Gerhard Fiedler wrote:
>>> I'm not sure whether this is a problem or not, or how to determine
>>> whether it is one.
>>>
>>> Say memory access (read and write) happens in 64-bit chunks, and I'm
>>> looking at 32-bit variables. This would mean that either some other
>>> variable is also written when writing a 32-bit variable (which means
>>> that all access to 32-bit variables is of the read-modify-write type,
>>> affecting some other variable also), or that all 32-bit variables are
>>> stored in their own 64-bit chunk.
>>>
>>> With single-threaded applications, that's a mere performance question.
>>> But with multi-threaded applications, there's no way I can imagine
>>> that would avoid the read-modify-write problems the first alternative
>>> would create, as it is nowhere defined what the other variable is that
>>> is also written -- so it can't be protected by a lock. Without it
>>> being protected by a lock, there's nothing that prevents a thread from
>>> altering it while it is in the middle of the read-modify-write cycle,
>>> which means that the end of it will overwrite the altered value with
>>> the old value.
>>>
>>> However, there must be a way to deal with this, otherwise
>>> multi-threaded applications in C++ wouldn't be possible.
>>>
>>> What am I missing?

>>
>> The fact that C++ does not specify any of that, maybe.


Just for the record: I didn't really miss that. I just thought that how a
very common problem present in a sizable part of C++ applications is being
handled across compilers and platforms is actually on topic in a group
about the C++ language.

> But C++0x will. IIRC, accroding to the draft standard, an implementation
> is prohibited to do many kind of speculative writes (with the exception
> of bitfields) to locations that wouldn't be written unconditionally
> anyway (or something like that).
>
> If a specific architecture didn't allow 32 bit load/stores to 32 bit
> objects, it would require the implementation to pad every object to the
> smaller load/store granularity. Pretty much all common architectures
> allow access to memory at least at 8/16/32 bit granularity (except for
> DSPs I guess), so it is not a problem.


Ah, I didn't know that. So on common hardware (maybe x86, x64, AMD, AMD64,
IA-64, PowerPC, ARM, Alpha, PA-RISC, MIPS, SPARC), memory access is
possible in byte granularity? Which then means that no common compiler
would write to locations that are not the actual purpose of the write
access?

> Current compilers do not implement the rule above, but thread aware
> compilers approximate it well enough that, as long as you use correct
> locks, things work correctly *most of the time* (some compilers have
> been known to miscompile code which used trylocks for example).


Do you have any links about which compilers specifically don't create code
that works correctly? One objective of mine is to be able to separate this
"most of the time" into two clearly defined subsets, one of which works
"all of the time"

> Actually, discussing whether the next C++ standard prohibits
> speculative writes, is language specific and definitely on topic.


Is "speculative writes" the technical term for the situation I described?

Thanks,
Gerhard
 
Reply With Quote
 
gpderetta
Guest
Posts: n/a
 
      06-24-2008
On Jun 24, 5:51*pm, Gerhard Fiedler <(E-Mail Removed)> wrote:
> On 2008-06-24 11:50:26, gpderetta wrote:
>
> > If a specific architecture didn't allow 32 bit load/stores to 32 bit
> > objects, it would require the implementation to pad every object to the
> > smaller load/store granularity. Pretty much all common architectures
> > allow access to memory at least at 8/16/32 bit granularity (except for
> > DSPs I guess), so it is not a problem.

>
> Ah, I didn't know that. So on common hardware (maybe x86, x64, AMD, AMD64,
> IA-64, PowerPC, ARM, Alpha, PA-RISC, MIPS, SPARC), memory access is
> possible in byte granularity? Which then means that no common compiler
> would write to locations that are not the actual purpose of the write
> access?


All x86 derivatives allow 8/16/32/64 access at any offset. I think
both PowerPC and ARM allows access at any granularity as the access is
properly aligned. IIRC very old Alphas only allowed accessing aligned
32/64 bits (no byte access), but it got fixed because it was extremely
inconvenient. I do not know about IA-64, MIPS, SPARC and PA-RISC, but
I would be extremely surprised if they didn't.

>
> > Current compilers do not implement the rule above, but thread aware
> > compilers approximate it well enough that, as long as you use correct
> > locks, things work correctly *most of the time* (some compilers have
> > been known to miscompile code which used trylocks for example).

>
> Do you have any links about which compilers specifically don't create code
> that works correctly? One objective of mine is to be able to separate this
> "most of the time" into two clearly defined subsets, one of which works
> "all of the time"
>


Many in corner cases do. Usually these are considered bugs and are
fixed when they are encountered.
See for example http://www.airs.com/blog/archives/79

> > Actually, discussing whether the next C++ standard prohibits
> > speculative writes, is language specific and definitely on topic.

>
> Is "speculative writes" the technical term for the situation I described?
>


I'm not sure if it applies to this example. I think that "speculative
store" is defined as the motion of a store outside of its position in
program order (usually sinking it outside of loops or branches). It
doesn't take much to generalize the concept to that of the *addition*
of a store not present in the original program (i.e. adjacent fields
overwrites).

For details see "Concurrency memory model compiler consequences" by
Hans Bohem:

http://www.open-std.org/jtc1/sc22/wg...007/n2338.html

HTH,

--
gpd
 
Reply With Quote
 
acehreli@gmail.com
Guest
Posts: n/a
 
      06-24-2008
On Jun 24, 7:50 am, gpderetta <(E-Mail Removed)> wrote:
> On Jun 24, 3:59 pm, Victor Bazarov <(E-Mail Removed)> wrote:


> > The fact that C++ does not specify any of that, maybe.

>
> But C++0x will.


A search on "hans boehm c++ memory model" should bring further
information on that. Including videos of Hans Boehm's presentations on
the topic.

Here is a start:

http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/

Ali
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      06-24-2008
On Jun 24, 3:48 pm, Gerhard Fiedler <(E-Mail Removed)> wrote:

> I'm not sure whether this is a problem or not, or how to
> determine whether it is one.


It's potentially one.

> Say memory access (read and write) happens in 64-bit chunks,
> and I'm looking at 32-bit variables. This would mean that
> either some other variable is also written when writing a
> 32-bit variable (which means that all access to 32-bit
> variables is of the read-modify-write type, affecting some
> other variable also), or that all 32-bit variables are stored
> in their own 64-bit chunk.


> With single-threaded applications, that's a mere performance
> question. But with multi-threaded applications, there's no way
> I can imagine that would avoid the read-modify-write problems
> the first alternative would create, as it is nowhere defined
> what the other variable is that is also written -- so it can't
> be protected by a lock. Without it being protected by a lock,
> there's nothing that prevents a thread from altering it while
> it is in the middle of the read-modify-write cycle, which
> means that the end of it will overwrite the altered value with
> the old value.


> However, there must be a way to deal with this, otherwise
> multi-threaded applications in C++ wouldn't be possible.


Most hardware provides for single byte writes (even when the
read is always 64 bits), and takes care that it works correctly.
From what I understand, this wasn't the case on some early DEC
Alphas, and it certainly wasn't the case on many older
platforms, where when you wrote a byte, the hardware would read
a word, and rewrite it.

The upcoming version of the standard will address this problem;
if nothing changes, it will require that *most* accesses to a
single "object" work. (The major exception is bit fields. If
you access an object that is declared as a bit field, and any
other thread may modify any object in the containing class, you
need to explicitly synchronize.) Implementations for processors
where the hardware doesn't support this have their work cut out
for them (but better them than us), and byte accesses on such
implementations are likely to be very slow.

--
James Kanze (GABI Software) email:(E-Mail Removed)
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
 
Reply With Quote
 
Gerhard Fiedler
Guest
Posts: n/a
 
      06-24-2008
On 2008-06-24 18:17:52, James Kanze wrote:

>> Say memory access (read and write) happens in 64-bit chunks, and I'm
>> looking at 32-bit variables. This would mean that either some other
>> variable is also written when writing a 32-bit variable (which means
>> that all access to 32-bit variables is of the read-modify-write type,
>> affecting some other variable also), or that all 32-bit variables are
>> stored in their own 64-bit chunk.
>>
>> With single-threaded applications, that's a mere performance question.
>> But with multi-threaded applications, there's no way I can imagine that
>> would avoid the read-modify-write problems the first alternative would
>> create, as it is nowhere defined what the other variable is that is
>> also written -- so it can't be protected by a lock. Without it being
>> protected by a lock, there's nothing that prevents a thread from
>> altering it while it is in the middle of the read-modify-write cycle,
>> which means that the end of it will overwrite the altered value with
>> the old value.
>>
>> However, there must be a way to deal with this, otherwise
>> multi-threaded applications in C++ wouldn't be possible.

>
> Most hardware provides for single byte writes (even when the read is
> always 64 bits), and takes care that it works correctly.


What I find a bit disconcerting is that it seems so difficult to find out
whether a given hardware actually does this. Reality seems to confirm that
it actually is "most" (or otherwise "most" programs would probably crash a
lot more than they do), but I haven't found any documentation about any
specific guarantees of specific compilers on specific platforms. (I'm
mainly interested in VC++ and gcc.) Does somebody have any pointers for me?

Thanks,
Gerhard
 
Reply With Quote
 
Jerry Coffin
Guest
Posts: n/a
 
      06-24-2008
In article <1om696gj5nba5$(E-Mail Removed)>, http://www.velocityreviews.com/forums/(E-Mail Removed)
says...

[ ... ]

> What I find a bit disconcerting is that it seems so difficult to find out
> whether a given hardware actually does this. Reality seems to confirm that
> it actually is "most" (or otherwise "most" programs would probably crash a
> lot more than they do), but I haven't found any documentation about any
> specific guarantees of specific compilers on specific platforms. (I'm
> mainly interested in VC++ and gcc.) Does somebody have any pointers for me?


There are a number of problems with that. The first is that when you get
to exotic multiprocessors, a lot of ideas have been tried, and even
though only a few have really gained much popularity, there are still
some that bend almost any rule you'd like to make.

Another problem is that even on a given piece of hardware, the behavior
can be less predictable than you'd generally like. For example, recent
versions of the Intel x86 processors all have Memory Type and Range
Registers (MTRRs). Using an MTRR, one can adjust the behavior of memory
writes individually for ranges of memory. You can get write-back
caching, write-through caching, write combining, or no caching at all --
all on the same machine at the same time for different ranges of memory.

Also keep in mind that most modern computers use caching. In a typical
case, any read from or write to main memory happens an entire cache line
at a time. Bookkeeping is also done on the basis of entire cache lines,
so the processor doesn't care how many bits in a cache line have been
modified -- from its viewpoint, the cache line as a whole is either
modified or not. If, for example, another processor attempts to read
memory that falls in that cache line, the entire line is written to
memory before the other processor can read it. Even if the two are
entirely disjoint, if they fall in the same cache line, the processor
treats them as a unit.

--
Later,
Jerry.

The universe is a figment of its own imagination.
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      06-25-2008
On Jun 25, 12:53 am, Jerry Coffin <(E-Mail Removed)> wrote:
> In article <1om696gj5nba5$(E-Mail Removed)>, (E-Mail Removed)
> says...


> [ ... ]


> > What I find a bit disconcerting is that it seems so
> > difficult to find out whether a given hardware actually does
> > this. Reality seems to confirm that it actually is "most"
> > (or otherwise "most" programs would probably crash a lot
> > more than they do), but I haven't found any documentation
> > about any specific guarantees of specific compilers on
> > specific platforms. (I'm mainly interested in VC++ and gcc.)
> > Does somebody have any pointers for me?


It depends mostly on the hardware architecture, not the
compiler. The compiler will generate byte, half-word, etc. load
and store machine instructions (assuming they exist, of course);
the problem is what the hardware does with them.

For Sparc architecture, see
http://www.sparc.org/specificationsDocuments.html. I presume
that other architecture providers (e.g. Intel, AMD, etc.) have
similar pages.

[...]
> Also keep in mind that most modern computers use caching. In a
> typical case, any read from or write to main memory happens an
> entire cache line at a time. Bookkeeping is also done on the
> basis of entire cache lines, so the processor doesn't care how
> many bits in a cache line have been modified -- from its
> viewpoint, the cache line as a whole is either modified or
> not. If, for example, another processor attempts to read
> memory that falls in that cache line, the entire line is
> written to memory before the other processor can read it. Even
> if the two are entirely disjoint, if they fall in the same
> cache line, the processor treats them as a unit.


That's true to a point. Most modern architectures also ensure
cache coherence at the hardware level: if one thread writes to
the first byte in a cache line, and a different thread (on a
different core) writes to the second byte, the hardware will
ensure that both writes eventually end up in main memory; that
the write back of the cache line from one core won't overwrite
the changes made by the other core.

This issue was discussed in detail by the committee; in the end,
it was decided that given something like:

struct S { char a; char b; } ;
or
char a[2] ;

one thread could modify S::a or a[0], and the other S::b or
a[1], without any explicit synchronization, and the compiler had
to make it work. This was accepted because in fact, just
emitting store byte instructions is sufficient for all of the
current architectures.

--
James Kanze (GABI Software) email:(E-Mail Removed)
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
 
Reply With Quote
 
Gerhard Fiedler
Guest
Posts: n/a
 
      06-25-2008
On 2008-06-25 04:58:41, James Kanze wrote:

>>> What I find a bit disconcerting is that it seems so difficult to find
>>> out whether a given hardware actually does this. Reality seems to
>>> confirm that it actually is "most" (or otherwise "most" programs would
>>> probably crash a lot more than they do), but I haven't found any
>>> documentation about any specific guarantees of specific compilers on
>>> specific platforms. (I'm mainly interested in VC++ and gcc.) Does
>>> somebody have any pointers for me?

>
> It depends mostly on the hardware architecture, not the compiler. The
> compiler will generate byte, half-word, etc. load and store machine
> instructions (assuming they exist, of course); the problem is what the
> hardware does with them.
>
> For Sparc architecture, see http://www.sparc.org/specificationsDocuments.html.
> I presume that other architecture providers (e.g. Intel, AMD, etc.)
> have similar pages.


Thanks. I thought that it would also depend on how the compiler generates
the code, but I guess you're right in assuming that any (halfway decent)
compiler will generate 8-bit writes for 8-bit variables if that is possible


>> Also keep in mind that most modern computers use caching. In a typical
>> case, any read from or write to main memory happens an entire cache
>> line at a time. Bookkeeping is also done on the basis of entire cache
>> lines, so the processor doesn't care how many bits in a cache line have
>> been modified -- from its viewpoint, the cache line as a whole is
>> either modified or not. If, for example, another processor attempts to
>> read memory that falls in that cache line, the entire line is written
>> to memory before the other processor can read it. Even if the two are
>> entirely disjoint, if they fall in the same cache line, the processor
>> treats them as a unit.

>
> That's true to a point. Most modern architectures also ensure cache
> coherence at the hardware level: if one thread writes to the first byte
> in a cache line, and a different thread (on a different core) writes to
> the second byte, the hardware will ensure that both writes eventually
> end up in main memory; that the write back of the cache line from one
> core won't overwrite the changes made by the other core.


Taken all this together, it seems that on "most modern architectures" cache
coherency is mostly guaranteed by the hardware, and for example it is not
necessary to use memory barriers or locks for access to volatile boolean
variables that are only read or written (never using a read-modify-write
cycle). Is this correct? What is all this talk about different threads
seeing values out of order about, if the cache coherency is maintained by
the hardware in this way?

Gerhard
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[comp.lang.c.moderated]Re: how to put my structure variable into CPU caches to eliminate main memory page access time? Stefan Ram C Programming 3 04-16-2011 12:58 AM
"Variable variable name" or "variable lvalue" mfglinux Python 11 09-12-2007 03:08 AM
Differences between Sony Memory Stick & memory Stick Pro vs Memory Stick Duo? zxcvar Digital Photography 3 11-28-2004 10:48 PM
Convert Character Variable to Integer Variable Brad Smallridge VHDL 2 11-18-2004 01:56 AM
How do I scope a variable if the variable name contains a variable? David Filmer Perl Misc 19 05-21-2004 03:55 PM



Advertisments