Documentation/memory-barriers.txt: Add needed ACCESS_ONCE() calls to memory-barriers.txt
The Documentation/memory-barriers.txt file was written before the need for ACCESS_ONCE() was fully appreciated. It therefore contains no ACCESS_ONCE() calls, which can be a problem when people lift examples from it. This commit therefore adds ACCESS_ONCE() calls. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-arch@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1386799151-2219-1-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
This commit is contained in:
parent
962d9c5757
commit
2ecf810121
@ -194,18 +194,22 @@ There are some minimal guarantees that may be expected of a CPU:
|
|||||||
(*) On any given CPU, dependent memory accesses will be issued in order, with
|
(*) On any given CPU, dependent memory accesses will be issued in order, with
|
||||||
respect to itself. This means that for:
|
respect to itself. This means that for:
|
||||||
|
|
||||||
Q = P; D = *Q;
|
ACCESS_ONCE(Q) = P; smp_read_barrier_depends(); D = ACCESS_ONCE(*Q);
|
||||||
|
|
||||||
the CPU will issue the following memory operations:
|
the CPU will issue the following memory operations:
|
||||||
|
|
||||||
Q = LOAD P, D = LOAD *Q
|
Q = LOAD P, D = LOAD *Q
|
||||||
|
|
||||||
and always in that order.
|
and always in that order. On most systems, smp_read_barrier_depends()
|
||||||
|
does nothing, but it is required for DEC Alpha. The ACCESS_ONCE()
|
||||||
|
is required to prevent compiler mischief. Please note that you
|
||||||
|
should normally use something like rcu_dereference() instead of
|
||||||
|
open-coding smp_read_barrier_depends().
|
||||||
|
|
||||||
(*) Overlapping loads and stores within a particular CPU will appear to be
|
(*) Overlapping loads and stores within a particular CPU will appear to be
|
||||||
ordered within that CPU. This means that for:
|
ordered within that CPU. This means that for:
|
||||||
|
|
||||||
a = *X; *X = b;
|
a = ACCESS_ONCE(*X); ACCESS_ONCE(*X) = b;
|
||||||
|
|
||||||
the CPU will only issue the following sequence of memory operations:
|
the CPU will only issue the following sequence of memory operations:
|
||||||
|
|
||||||
@ -213,7 +217,7 @@ There are some minimal guarantees that may be expected of a CPU:
|
|||||||
|
|
||||||
And for:
|
And for:
|
||||||
|
|
||||||
*X = c; d = *X;
|
ACCESS_ONCE(*X) = c; d = ACCESS_ONCE(*X);
|
||||||
|
|
||||||
the CPU will only issue:
|
the CPU will only issue:
|
||||||
|
|
||||||
@ -224,6 +228,41 @@ There are some minimal guarantees that may be expected of a CPU:
|
|||||||
|
|
||||||
And there are a number of things that _must_ or _must_not_ be assumed:
|
And there are a number of things that _must_ or _must_not_ be assumed:
|
||||||
|
|
||||||
|
(*) It _must_not_ be assumed that the compiler will do what you want with
|
||||||
|
memory references that are not protected by ACCESS_ONCE(). Without
|
||||||
|
ACCESS_ONCE(), the compiler is within its rights to do all sorts
|
||||||
|
of "creative" transformations:
|
||||||
|
|
||||||
|
(-) Repeat the load, possibly getting a different value on the second
|
||||||
|
and subsequent loads. This is especially prone to happen when
|
||||||
|
register pressure is high.
|
||||||
|
|
||||||
|
(-) Merge adjacent loads and stores to the same location. The most
|
||||||
|
familiar example is the transformation from:
|
||||||
|
|
||||||
|
while (a)
|
||||||
|
do_something();
|
||||||
|
|
||||||
|
to something like:
|
||||||
|
|
||||||
|
if (a)
|
||||||
|
for (;;)
|
||||||
|
do_something();
|
||||||
|
|
||||||
|
Using ACCESS_ONCE() as follows prevents this sort of optimization:
|
||||||
|
|
||||||
|
while (ACCESS_ONCE(a))
|
||||||
|
do_something();
|
||||||
|
|
||||||
|
(-) "Store tearing", where a single store in the source code is split
|
||||||
|
into smaller stores in the object code. Note that gcc really
|
||||||
|
will do this on some architectures when storing certain constants.
|
||||||
|
It can be cheaper to do a series of immediate stores than to
|
||||||
|
form the constant in a register and then to store that register.
|
||||||
|
|
||||||
|
(-) "Load tearing", which splits loads in a manner analogous to
|
||||||
|
store tearing.
|
||||||
|
|
||||||
(*) It _must_not_ be assumed that independent loads and stores will be issued
|
(*) It _must_not_ be assumed that independent loads and stores will be issued
|
||||||
in the order given. This means that for:
|
in the order given. This means that for:
|
||||||
|
|
||||||
@ -455,8 +494,8 @@ following sequence of events:
|
|||||||
{ A == 1, B == 2, C = 3, P == &A, Q == &C }
|
{ A == 1, B == 2, C = 3, P == &A, Q == &C }
|
||||||
B = 4;
|
B = 4;
|
||||||
<write barrier>
|
<write barrier>
|
||||||
P = &B
|
ACCESS_ONCE(P) = &B
|
||||||
Q = P;
|
Q = ACCESS_ONCE(P);
|
||||||
D = *Q;
|
D = *Q;
|
||||||
|
|
||||||
There's a clear data dependency here, and it would seem that by the end of the
|
There's a clear data dependency here, and it would seem that by the end of the
|
||||||
@ -482,8 +521,8 @@ between the address load and the data load:
|
|||||||
{ A == 1, B == 2, C = 3, P == &A, Q == &C }
|
{ A == 1, B == 2, C = 3, P == &A, Q == &C }
|
||||||
B = 4;
|
B = 4;
|
||||||
<write barrier>
|
<write barrier>
|
||||||
P = &B
|
ACCESS_ONCE(P) = &B
|
||||||
Q = P;
|
Q = ACCESS_ONCE(P);
|
||||||
<data dependency barrier>
|
<data dependency barrier>
|
||||||
D = *Q;
|
D = *Q;
|
||||||
|
|
||||||
@ -509,16 +548,17 @@ access:
|
|||||||
{ M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 }
|
{ M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 }
|
||||||
M[1] = 4;
|
M[1] = 4;
|
||||||
<write barrier>
|
<write barrier>
|
||||||
P = 1
|
ACCESS_ONCE(P) = 1
|
||||||
Q = P;
|
Q = ACCESS_ONCE(P);
|
||||||
<data dependency barrier>
|
<data dependency barrier>
|
||||||
D = M[Q];
|
D = M[Q];
|
||||||
|
|
||||||
|
|
||||||
The data dependency barrier is very important to the RCU system, for example.
|
The data dependency barrier is very important to the RCU system,
|
||||||
See rcu_dereference() in include/linux/rcupdate.h. This permits the current
|
for example. See rcu_assign_pointer() and rcu_dereference() in
|
||||||
target of an RCU'd pointer to be replaced with a new modified target, without
|
include/linux/rcupdate.h. This permits the current target of an RCU'd
|
||||||
the replacement target appearing to be incompletely initialised.
|
pointer to be replaced with a new modified target, without the replacement
|
||||||
|
target appearing to be incompletely initialised.
|
||||||
|
|
||||||
See also the subsection on "Cache Coherency" for a more thorough example.
|
See also the subsection on "Cache Coherency" for a more thorough example.
|
||||||
|
|
||||||
@ -530,22 +570,23 @@ A control dependency requires a full read memory barrier, not simply a data
|
|||||||
dependency barrier to make it work correctly. Consider the following bit of
|
dependency barrier to make it work correctly. Consider the following bit of
|
||||||
code:
|
code:
|
||||||
|
|
||||||
q = &a;
|
q = ACCESS_ONCE(a);
|
||||||
if (p) {
|
if (p) {
|
||||||
<data dependency barrier>
|
<data dependency barrier>
|
||||||
q = &b;
|
q = ACCESS_ONCE(b);
|
||||||
}
|
}
|
||||||
x = *q;
|
x = *q;
|
||||||
|
|
||||||
This will not have the desired effect because there is no actual data
|
This will not have the desired effect because there is no actual data
|
||||||
dependency, but rather a control dependency that the CPU may short-circuit by
|
dependency, but rather a control dependency that the CPU may short-circuit
|
||||||
attempting to predict the outcome in advance. In such a case what's actually
|
by attempting to predict the outcome in advance, so that other CPUs see
|
||||||
required is:
|
the load from b as having happened before the load from a. In such a
|
||||||
|
case what's actually required is:
|
||||||
|
|
||||||
q = &a;
|
q = ACCESS_ONCE(a);
|
||||||
if (p) {
|
if (p) {
|
||||||
<read barrier>
|
<read barrier>
|
||||||
q = &b;
|
q = ACCESS_ONCE(b);
|
||||||
}
|
}
|
||||||
x = *q;
|
x = *q;
|
||||||
|
|
||||||
@ -563,11 +604,11 @@ write barrier, though, again, a general barrier is viable:
|
|||||||
|
|
||||||
CPU 1 CPU 2
|
CPU 1 CPU 2
|
||||||
=============== ===============
|
=============== ===============
|
||||||
a = 1;
|
ACCESS_ONCE(a) = 1;
|
||||||
<write barrier>
|
<write barrier>
|
||||||
b = 2; x = b;
|
ACCESS_ONCE(b) = 2; x = ACCESS_ONCE(b);
|
||||||
<read barrier>
|
<read barrier>
|
||||||
y = a;
|
y = ACCESS_ONCE(a);
|
||||||
|
|
||||||
Or:
|
Or:
|
||||||
|
|
||||||
@ -575,7 +616,7 @@ Or:
|
|||||||
=============== ===============================
|
=============== ===============================
|
||||||
a = 1;
|
a = 1;
|
||||||
<write barrier>
|
<write barrier>
|
||||||
b = &a; x = b;
|
ACCESS_ONCE(b) = &a; x = ACCESS_ONCE(b);
|
||||||
<data dependency barrier>
|
<data dependency barrier>
|
||||||
y = *x;
|
y = *x;
|
||||||
|
|
||||||
@ -587,12 +628,12 @@ match the loads after the read barrier or the data dependency barrier, and vice
|
|||||||
versa:
|
versa:
|
||||||
|
|
||||||
CPU 1 CPU 2
|
CPU 1 CPU 2
|
||||||
=============== ===============
|
=================== ===================
|
||||||
a = 1; }---- --->{ v = c
|
ACCESS_ONCE(a) = 1; }---- --->{ v = ACCESS_ONCE(c);
|
||||||
b = 2; } \ / { w = d
|
ACCESS_ONCE(b) = 2; } \ / { w = ACCESS_ONCE(d);
|
||||||
<write barrier> \ <read barrier>
|
<write barrier> \ <read barrier>
|
||||||
c = 3; } / \ { x = a;
|
ACCESS_ONCE(c) = 3; } / \ { x = ACCESS_ONCE(a);
|
||||||
d = 4; }---- --->{ y = b;
|
ACCESS_ONCE(d) = 4; }---- --->{ y = ACCESS_ONCE(b);
|
||||||
|
|
||||||
|
|
||||||
EXAMPLES OF MEMORY BARRIER SEQUENCES
|
EXAMPLES OF MEMORY BARRIER SEQUENCES
|
||||||
@ -1435,12 +1476,12 @@ three CPUs; then should the following sequence of events occur:
|
|||||||
|
|
||||||
CPU 1 CPU 2
|
CPU 1 CPU 2
|
||||||
=============================== ===============================
|
=============================== ===============================
|
||||||
*A = a; *E = e;
|
ACCESS_ONCE(*A) = a; ACCESS_ONCE(*E) = e;
|
||||||
LOCK M LOCK Q
|
LOCK M LOCK Q
|
||||||
*B = b; *F = f;
|
ACCESS_ONCE(*B) = b; ACCESS_ONCE(*F) = f;
|
||||||
*C = c; *G = g;
|
ACCESS_ONCE(*C) = c; ACCESS_ONCE(*G) = g;
|
||||||
UNLOCK M UNLOCK Q
|
UNLOCK M UNLOCK Q
|
||||||
*D = d; *H = h;
|
ACCESS_ONCE(*D) = d; ACCESS_ONCE(*H) = h;
|
||||||
|
|
||||||
Then there is no guarantee as to what order CPU 3 will see the accesses to *A
|
Then there is no guarantee as to what order CPU 3 will see the accesses to *A
|
||||||
through *H occur in, other than the constraints imposed by the separate locks
|
through *H occur in, other than the constraints imposed by the separate locks
|
||||||
@ -1460,17 +1501,17 @@ However, if the following occurs:
|
|||||||
|
|
||||||
CPU 1 CPU 2
|
CPU 1 CPU 2
|
||||||
=============================== ===============================
|
=============================== ===============================
|
||||||
*A = a;
|
ACCESS_ONCE(*A) = a;
|
||||||
LOCK M [1]
|
LOCK M [1]
|
||||||
*B = b;
|
ACCESS_ONCE(*B) = b;
|
||||||
*C = c;
|
ACCESS_ONCE(*C) = c;
|
||||||
UNLOCK M [1]
|
UNLOCK M [1]
|
||||||
*D = d; *E = e;
|
ACCESS_ONCE(*D) = d; ACCESS_ONCE(*E) = e;
|
||||||
LOCK M [2]
|
LOCK M [2]
|
||||||
*F = f;
|
ACCESS_ONCE(*F) = f;
|
||||||
*G = g;
|
ACCESS_ONCE(*G) = g;
|
||||||
UNLOCK M [2]
|
UNLOCK M [2]
|
||||||
*H = h;
|
ACCESS_ONCE(*H) = h;
|
||||||
|
|
||||||
CPU 3 might see:
|
CPU 3 might see:
|
||||||
|
|
||||||
@ -2177,11 +2218,11 @@ A programmer might take it for granted that the CPU will perform memory
|
|||||||
operations in exactly the order specified, so that if the CPU is, for example,
|
operations in exactly the order specified, so that if the CPU is, for example,
|
||||||
given the following piece of code to execute:
|
given the following piece of code to execute:
|
||||||
|
|
||||||
a = *A;
|
a = ACCESS_ONCE(*A);
|
||||||
*B = b;
|
ACCESS_ONCE(*B) = b;
|
||||||
c = *C;
|
c = ACCESS_ONCE(*C);
|
||||||
d = *D;
|
d = ACCESS_ONCE(*D);
|
||||||
*E = e;
|
ACCESS_ONCE(*E) = e;
|
||||||
|
|
||||||
they would then expect that the CPU will complete the memory operation for each
|
they would then expect that the CPU will complete the memory operation for each
|
||||||
instruction before moving on to the next one, leading to a definite sequence of
|
instruction before moving on to the next one, leading to a definite sequence of
|
||||||
@ -2228,12 +2269,12 @@ However, it is guaranteed that a CPU will be self-consistent: it will see its
|
|||||||
_own_ accesses appear to be correctly ordered, without the need for a memory
|
_own_ accesses appear to be correctly ordered, without the need for a memory
|
||||||
barrier. For instance with the following code:
|
barrier. For instance with the following code:
|
||||||
|
|
||||||
U = *A;
|
U = ACCESS_ONCE(*A);
|
||||||
*A = V;
|
ACCESS_ONCE(*A) = V;
|
||||||
*A = W;
|
ACCESS_ONCE(*A) = W;
|
||||||
X = *A;
|
X = ACCESS_ONCE(*A);
|
||||||
*A = Y;
|
ACCESS_ONCE(*A) = Y;
|
||||||
Z = *A;
|
Z = ACCESS_ONCE(*A);
|
||||||
|
|
||||||
and assuming no intervention by an external influence, it can be assumed that
|
and assuming no intervention by an external influence, it can be assumed that
|
||||||
the final result will appear to be:
|
the final result will appear to be:
|
||||||
@ -2250,7 +2291,12 @@ accesses:
|
|||||||
|
|
||||||
in that order, but, without intervention, the sequence may have almost any
|
in that order, but, without intervention, the sequence may have almost any
|
||||||
combination of elements combined or discarded, provided the program's view of
|
combination of elements combined or discarded, provided the program's view of
|
||||||
the world remains consistent.
|
the world remains consistent. Note that ACCESS_ONCE() is -not- optional
|
||||||
|
in the above example, as there are architectures where a given CPU might
|
||||||
|
interchange successive loads to the same location. On such architectures,
|
||||||
|
ACCESS_ONCE() does whatever is necessary to prevent this, for example, on
|
||||||
|
Itanium the volatile casts used by ACCESS_ONCE() cause GCC to emit the
|
||||||
|
special ld.acq and st.rel instructions that prevent such reordering.
|
||||||
|
|
||||||
The compiler may also combine, discard or defer elements of the sequence before
|
The compiler may also combine, discard or defer elements of the sequence before
|
||||||
the CPU even sees them.
|
the CPU even sees them.
|
||||||
@ -2264,13 +2310,13 @@ may be reduced to:
|
|||||||
|
|
||||||
*A = W;
|
*A = W;
|
||||||
|
|
||||||
since, without a write barrier, it can be assumed that the effect of the
|
since, without either a write barrier or an ACCESS_ONCE(), it can be
|
||||||
storage of V to *A is lost. Similarly:
|
assumed that the effect of the storage of V to *A is lost. Similarly:
|
||||||
|
|
||||||
*A = Y;
|
*A = Y;
|
||||||
Z = *A;
|
Z = *A;
|
||||||
|
|
||||||
may, without a memory barrier, be reduced to:
|
may, without a memory barrier or an ACCESS_ONCE(), be reduced to:
|
||||||
|
|
||||||
*A = Y;
|
*A = Y;
|
||||||
Z = Y;
|
Z = Y;
|
||||||
|
Loading…
Reference in New Issue
Block a user