docs/memory-barriers.txt: Fixup long lines
Substitution of "data dependency barrier" with "address-dependency barrier" left quite a lot of lines exceeding 80 columns. Reflow those lines as well as a few short ones not related to the substitution. No changes in documentation text. Signed-off-by: Akira Yokosawa <akiyks@gmail.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Will Deacon <will@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Daniel Lustig <dlustig@nvidia.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit is contained in:
parent
203185f6b1
commit
f556082dd7
@ -187,9 +187,9 @@ As a further example, consider this sequence of events:
|
|||||||
B = 4; Q = P;
|
B = 4; Q = P;
|
||||||
P = &B; D = *Q;
|
P = &B; D = *Q;
|
||||||
|
|
||||||
There is an obvious address dependency here, as the value loaded into D depends on
|
There is an obvious address dependency here, as the value loaded into D depends
|
||||||
the address retrieved from P by CPU 2. At the end of the sequence, any of the
|
on the address retrieved from P by CPU 2. At the end of the sequence, any of
|
||||||
following results are possible:
|
the following results are possible:
|
||||||
|
|
||||||
(Q == &A) and (D == 1)
|
(Q == &A) and (D == 1)
|
||||||
(Q == &B) and (D == 2)
|
(Q == &B) and (D == 2)
|
||||||
@ -397,25 +397,25 @@ Memory barriers come in four basic varieties:
|
|||||||
|
|
||||||
(2) Address-dependency barriers (historical).
|
(2) Address-dependency barriers (historical).
|
||||||
|
|
||||||
An address-dependency barrier is a weaker form of read barrier. In the case
|
An address-dependency barrier is a weaker form of read barrier. In the
|
||||||
where two loads are performed such that the second depends on the result
|
case where two loads are performed such that the second depends on the
|
||||||
of the first (eg: the first load retrieves the address to which the second
|
result of the first (eg: the first load retrieves the address to which
|
||||||
load will be directed), an address-dependency barrier would be required to
|
the second load will be directed), an address-dependency barrier would
|
||||||
make sure that the target of the second load is updated after the address
|
be required to make sure that the target of the second load is updated
|
||||||
obtained by the first load is accessed.
|
after the address obtained by the first load is accessed.
|
||||||
|
|
||||||
An address-dependency barrier is a partial ordering on interdependent loads
|
An address-dependency barrier is a partial ordering on interdependent
|
||||||
only; it is not required to have any effect on stores, independent loads
|
loads only; it is not required to have any effect on stores, independent
|
||||||
or overlapping loads.
|
loads or overlapping loads.
|
||||||
|
|
||||||
As mentioned in (1), the other CPUs in the system can be viewed as
|
As mentioned in (1), the other CPUs in the system can be viewed as
|
||||||
committing sequences of stores to the memory system that the CPU being
|
committing sequences of stores to the memory system that the CPU being
|
||||||
considered can then perceive. An address-dependency barrier issued by the CPU
|
considered can then perceive. An address-dependency barrier issued by
|
||||||
under consideration guarantees that for any load preceding it, if that
|
the CPU under consideration guarantees that for any load preceding it,
|
||||||
load touches one of a sequence of stores from another CPU, then by the
|
if that load touches one of a sequence of stores from another CPU, then
|
||||||
time the barrier completes, the effects of all the stores prior to that
|
by the time the barrier completes, the effects of all the stores prior to
|
||||||
touched by the load will be perceptible to any loads issued after the address-
|
that touched by the load will be perceptible to any loads issued after
|
||||||
dependency barrier.
|
the address-dependency barrier.
|
||||||
|
|
||||||
See the "Examples of memory barrier sequences" subsection for diagrams
|
See the "Examples of memory barrier sequences" subsection for diagrams
|
||||||
showing the ordering constraints.
|
showing the ordering constraints.
|
||||||
@ -437,16 +437,16 @@ Memory barriers come in four basic varieties:
|
|||||||
|
|
||||||
(3) Read (or load) memory barriers.
|
(3) Read (or load) memory barriers.
|
||||||
|
|
||||||
A read barrier is an address-dependency barrier plus a guarantee that all the
|
A read barrier is an address-dependency barrier plus a guarantee that all
|
||||||
LOAD operations specified before the barrier will appear to happen before
|
the LOAD operations specified before the barrier will appear to happen
|
||||||
all the LOAD operations specified after the barrier with respect to the
|
before all the LOAD operations specified after the barrier with respect to
|
||||||
other components of the system.
|
the other components of the system.
|
||||||
|
|
||||||
A read barrier is a partial ordering on loads only; it is not required to
|
A read barrier is a partial ordering on loads only; it is not required to
|
||||||
have any effect on stores.
|
have any effect on stores.
|
||||||
|
|
||||||
Read memory barriers imply address-dependency barriers, and so can substitute
|
Read memory barriers imply address-dependency barriers, and so can
|
||||||
for them.
|
substitute for them.
|
||||||
|
|
||||||
[!] Note that read barriers should normally be paired with write barriers;
|
[!] Note that read barriers should normally be paired with write barriers;
|
||||||
see the "SMP barrier pairing" subsection.
|
see the "SMP barrier pairing" subsection.
|
||||||
@ -584,8 +584,8 @@ following sequence of events:
|
|||||||
[!] READ_ONCE_OLD() corresponds to READ_ONCE() of pre-4.15 kernel, which
|
[!] READ_ONCE_OLD() corresponds to READ_ONCE() of pre-4.15 kernel, which
|
||||||
doesn't imply an address-dependency barrier.
|
doesn't imply an address-dependency barrier.
|
||||||
|
|
||||||
There's a clear address dependency here, and it would seem that by the end of the
|
There's a clear address dependency here, and it would seem that by the end of
|
||||||
sequence, Q must be either &A or &B, and that:
|
the sequence, Q must be either &A or &B, and that:
|
||||||
|
|
||||||
(Q == &A) implies (D == 1)
|
(Q == &A) implies (D == 1)
|
||||||
(Q == &B) implies (D == 4)
|
(Q == &B) implies (D == 4)
|
||||||
@ -599,8 +599,8 @@ While this may seem like a failure of coherency or causality maintenance, it
|
|||||||
isn't, and this behaviour can be observed on certain real CPUs (such as the DEC
|
isn't, and this behaviour can be observed on certain real CPUs (such as the DEC
|
||||||
Alpha).
|
Alpha).
|
||||||
|
|
||||||
To deal with this, READ_ONCE() provides an implicit address-dependency
|
To deal with this, READ_ONCE() provides an implicit address-dependency barrier
|
||||||
barrier since kernel release v4.15:
|
since kernel release v4.15:
|
||||||
|
|
||||||
CPU 1 CPU 2
|
CPU 1 CPU 2
|
||||||
=============== ===============
|
=============== ===============
|
||||||
@ -627,12 +627,12 @@ but the old value of the variable B (2).
|
|||||||
|
|
||||||
|
|
||||||
An address-dependency barrier is not required to order dependent writes
|
An address-dependency barrier is not required to order dependent writes
|
||||||
because the CPUs that the Linux kernel supports don't do writes
|
because the CPUs that the Linux kernel supports don't do writes until they
|
||||||
until they are certain (1) that the write will actually happen, (2)
|
are certain (1) that the write will actually happen, (2) of the location of
|
||||||
of the location of the write, and (3) of the value to be written.
|
the write, and (3) of the value to be written.
|
||||||
But please carefully read the "CONTROL DEPENDENCIES" section and the
|
But please carefully read the "CONTROL DEPENDENCIES" section and the
|
||||||
Documentation/RCU/rcu_dereference.rst file: The compiler can and does
|
Documentation/RCU/rcu_dereference.rst file: The compiler can and does break
|
||||||
break dependencies in a great many highly creative ways.
|
dependencies in a great many highly creative ways.
|
||||||
|
|
||||||
CPU 1 CPU 2
|
CPU 1 CPU 2
|
||||||
=============== ===============
|
=============== ===============
|
||||||
@ -678,8 +678,8 @@ not understand them. The purpose of this section is to help you prevent
|
|||||||
the compiler's ignorance from breaking your code.
|
the compiler's ignorance from breaking your code.
|
||||||
|
|
||||||
A load-load control dependency requires a full read memory barrier, not
|
A load-load control dependency requires a full read memory barrier, not
|
||||||
simply an (implicit) address-dependency barrier to make it work correctly. Consider the
|
simply an (implicit) address-dependency barrier to make it work correctly.
|
||||||
following bit of code:
|
Consider the following bit of code:
|
||||||
|
|
||||||
q = READ_ONCE(a);
|
q = READ_ONCE(a);
|
||||||
<implicit address-dependency barrier>
|
<implicit address-dependency barrier>
|
||||||
@ -691,8 +691,8 @@ following bit of code:
|
|||||||
This will not have the desired effect because there is no actual address
|
This will not have the desired effect because there is no actual address
|
||||||
dependency, but rather a control dependency that the CPU may short-circuit
|
dependency, but rather a control dependency that the CPU may short-circuit
|
||||||
by attempting to predict the outcome in advance, so that other CPUs see
|
by attempting to predict the outcome in advance, so that other CPUs see
|
||||||
the load from b as having happened before the load from a. In such a
|
the load from b as having happened before the load from a. In such a case
|
||||||
case what's actually required is:
|
what's actually required is:
|
||||||
|
|
||||||
q = READ_ONCE(a);
|
q = READ_ONCE(a);
|
||||||
if (q) {
|
if (q) {
|
||||||
@ -980,8 +980,8 @@ Basically, the read barrier always has to be there, even though it can be of
|
|||||||
the "weaker" type.
|
the "weaker" type.
|
||||||
|
|
||||||
[!] Note that the stores before the write barrier would normally be expected to
|
[!] Note that the stores before the write barrier would normally be expected to
|
||||||
match the loads after the read barrier or the address-dependency barrier, and vice
|
match the loads after the read barrier or the address-dependency barrier, and
|
||||||
versa:
|
vice versa:
|
||||||
|
|
||||||
CPU 1 CPU 2
|
CPU 1 CPU 2
|
||||||
=================== ===================
|
=================== ===================
|
||||||
@ -1033,8 +1033,8 @@ STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E
|
|||||||
V
|
V
|
||||||
|
|
||||||
|
|
||||||
Secondly, address-dependency barriers act as partial orderings on address-dependent
|
Secondly, address-dependency barriers act as partial orderings on address-
|
||||||
loads. Consider the following sequence of events:
|
dependent loads. Consider the following sequence of events:
|
||||||
|
|
||||||
CPU 1 CPU 2
|
CPU 1 CPU 2
|
||||||
======================= =======================
|
======================= =======================
|
||||||
@ -1079,8 +1079,8 @@ effectively random order, despite the write barrier issued by CPU 1:
|
|||||||
In the above example, CPU 2 perceives that B is 7, despite the load of *C
|
In the above example, CPU 2 perceives that B is 7, despite the load of *C
|
||||||
(which would be B) coming after the LOAD of C.
|
(which would be B) coming after the LOAD of C.
|
||||||
|
|
||||||
If, however, an address-dependency barrier were to be placed between the load of C
|
If, however, an address-dependency barrier were to be placed between the load
|
||||||
and the load of *C (ie: B) on CPU 2:
|
of C and the load of *C (ie: B) on CPU 2:
|
||||||
|
|
||||||
CPU 1 CPU 2
|
CPU 1 CPU 2
|
||||||
======================= =======================
|
======================= =======================
|
||||||
@ -2761,7 +2761,8 @@ is discarded from the CPU's cache and reloaded. To deal with this, the
|
|||||||
appropriate part of the kernel must invalidate the overlapping bits of the
|
appropriate part of the kernel must invalidate the overlapping bits of the
|
||||||
cache on each CPU.
|
cache on each CPU.
|
||||||
|
|
||||||
See Documentation/core-api/cachetlb.rst for more information on cache management.
|
See Documentation/core-api/cachetlb.rst for more information on cache
|
||||||
|
management.
|
||||||
|
|
||||||
|
|
||||||
CACHE COHERENCY VS MMIO
|
CACHE COHERENCY VS MMIO
|
||||||
@ -2901,8 +2902,8 @@ AND THEN THERE'S THE ALPHA
|
|||||||
The DEC Alpha CPU is one of the most relaxed CPUs there is. Not only that,
|
The DEC Alpha CPU is one of the most relaxed CPUs there is. Not only that,
|
||||||
some versions of the Alpha CPU have a split data cache, permitting them to have
|
some versions of the Alpha CPU have a split data cache, permitting them to have
|
||||||
two semantically-related cache lines updated at separate times. This is where
|
two semantically-related cache lines updated at separate times. This is where
|
||||||
the address-dependency barrier really becomes necessary as this synchronises both
|
the address-dependency barrier really becomes necessary as this synchronises
|
||||||
caches with the memory coherence system, thus making it seem like pointer
|
both caches with the memory coherence system, thus making it seem like pointer
|
||||||
changes vs new data occur in the right order.
|
changes vs new data occur in the right order.
|
||||||
|
|
||||||
The Alpha defines the Linux kernel's memory model, although as of v4.15
|
The Alpha defines the Linux kernel's memory model, although as of v4.15
|
||||||
|
Loading…
Reference in New Issue
Block a user