docs/memory-barriers.txt: Update I/O section to be clearer about CPU vs thread

The revised I/O ordering section of memory-barriers.txt introduced in 4614bbdee3 ("docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section") loosely refers to "the CPU", whereas the ordering guarantees generally apply within a thread of execution that can migrate between cores, with the scheduler providing the relevant barrier semantics. Reword the section to refer to "CPU thread" and call out ordering of MMIO writes separately from ordering of writes to memory. Ben also spotted that the string accessors are native-endian, so fix that up too. Link: https://lkml.kernel.org/r/080d1ec73e3e29d6ffeeeb50b39b613da28afb37.camel@kernel.crashing.org Fixes: 4614bbdee3 ("docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section") Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-12 13:42:18 +01:00 · 2019-04-12 13:42:18 +01:00 · 9726840d9c
commit 9726840d9c
parent 0cde62a46e
1 changed files with 37 additions and 24 deletions
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@ -2523,27 +2523,37 @@ guarantees:
 	ioremap()), the ordering guarantees are as follows:
 	1. All readX() and writeX() accesses to the same peripheral are ordered
-	   with respect to each other. This ensures that MMIO register writes by
+	   with respect to each other. This ensures that MMIO register accesses
-	   the CPU to a particular device will arrive in program order.
+	   by the same CPU thread to a particular device will arrive in program
 	   order.
-	2. A writeX() by the CPU to the peripheral will first wait for the
+	2. A writeX() issued by a CPU thread holding a spinlock is ordered
-	   completion of all prior CPU writes to memory. This ensures that
+	   before a writeX() to the same peripheral from another CPU thread
-	   writes by the CPU to an outbound DMA buffer allocated by
+	   issued after a later acquisition of the same spinlock. This ensures
-	   dma_alloc_coherent() will be visible to a DMA engine when the CPU
+	   that MMIO register writes to a particular device issued while holding
-	   writes to its MMIO control register to trigger the transfer.
+	   a spinlock will arrive in an order consistent with acquisitions of
 	   the lock.
-	3. A readX() by the CPU from the peripheral will complete before any
+	3. A writeX() by a CPU thread to the peripheral will first wait for the
-	   subsequent CPU reads from memory can begin. This ensures that reads
+	   completion of all prior writes to memory either issued by, or
-	   by the CPU from an incoming DMA buffer allocated by
+	   propagated to, the same thread. This ensures that writes by the CPU
-	   dma_alloc_coherent() will not see stale data after reading from the
+	   to an outbound DMA buffer allocated by dma_alloc_coherent() will be
-	   DMA engine's MMIO status register to establish that the DMA transfer
+	   visible to a DMA engine when the CPU writes to its MMIO control
-	   has completed.
+	   register to trigger the transfer.
-	4. A readX() by the CPU from the peripheral will complete before any
+	4. A readX() by a CPU thread from the peripheral will complete before
-	   subsequent delay() loop can begin execution. This ensures that two
+	   any subsequent reads from memory by the same thread can begin. This
-	   MMIO register writes by the CPU to a peripheral will arrive at least
+	   ensures that reads by the CPU from an incoming DMA buffer allocated
-	   1us apart if the first write is immediately read back with readX()
+	   by dma_alloc_coherent() will not see stale data after reading from
-	   and udelay(1) is called prior to the second writeX():
+	   the DMA engine's MMIO status register to establish that the DMA
 	   transfer has completed.
 	5. A readX() by a CPU thread from the peripheral will complete before
 	   any subsequent delay() loop can begin execution on the same thread.
 	   This ensures that two MMIO register writes by the CPU to a peripheral
 	   will arrive at least 1us apart if the first write is immediately read
 	   back with readX() and udelay(1) is called prior to the second
 	   writeX():
 		writel(42, DEVICE_REGISTER_0); // Arrives at the device...
 		readl(DEVICE_REGISTER_0);
@ -2559,10 +2569,11 @@ guarantees:
 	These are similar to readX() and writeX(), but provide weaker memory
 	ordering guarantees. Specifically, they do not guarantee ordering with
-	respect to normal memory accesses or delay() loops (i.e. bullets 2-4
+	respect to locking, normal memory accesses or delay() loops (i.e.
-	above) but they are still guaranteed to be ordered with respect to other
+	bullets 2-5 above) but they are still guaranteed to be ordered with
-	accesses to the same peripheral when operating on __iomem pointers
+	respect to other accesses from the same CPU thread to the same
-	mapped with the default I/O attributes.
+	peripheral when operating on __iomem pointers mapped with the default
 	I/O attributes.
 (*) readsX(), writesX():
@ -2600,8 +2611,10 @@ guarantees:
 	These will perform appropriately for the type of access they're actually
 	doing, be it inX()/outX() or readX()/writeX().
-All of these accessors assume that the underlying peripheral is little-endian,
+With the exception of the string accessors (insX(), outsX(), readsX() and
-and will therefore perform byte-swapping operations on big-endian architectures.
+writesX()), all of the above assume that the underlying peripheral is
 little-endian and will therefore perform byte-swapping operations on big-endian
 architectures.
 ========================================