linux/arch
Will Deacon 22ec71615d arm64: io: Relax implicit barriers in default I/O accessors
The arm64 implementation of the default I/O accessors requires barrier
instructions to satisfy the memory ordering requirements documented in
memory-barriers.txt [1], which are largely derived from the behaviour of
I/O accesses on x86.

Of particular interest are the requirements that a write to a device
must be ordered against prior writes to memory, and a read from a device
must be ordered against subsequent reads from memory. We satisfy these
requirements using various flavours of DSB: the most expensive barrier
we have, since it implies completion of prior accesses. This was deemed
necessary when we first implemented the accessors, since accesses to
different endpoints could propagate independently and therefore the only
way to enforce order is to rely on completion guarantees [2].

Since then, the Armv8 memory model has been retrospectively strengthened
to require "other-multi-copy atomicity", a property that requires memory
accesses from an observer to become visible to all other observers
simultaneously [3]. In other words, propagation of accesses is limited
to transitioning from locally observed to globally observed. It recently
became apparent that this change also has a subtle impact on our I/O
accessors for shared peripherals, allowing us to use the cheaper DMB
instruction instead.

As a concrete example, consider the following:

	memcpy(dma_buffer, data, bufsz);
	writel(DMA_START, dev->ctrl_reg);

A DMB ST instruction between the final write to the DMA buffer and the
write to the control register will ensure that the writes to the DMA
buffer are observed before the write to the control register by all
observers. Put another way, if an observer can see the write to the
control register, it can also see the writes to memory. This has always
been the case and is not sufficient to provide the ordering required by
Linux, since there is no guarantee that the master interface of the
DMA-capable device has observed either of the accesses. However, in an
other-multi-copy atomic world, we can infer two things:

  1. A write arriving at an endpoint shared between multiple CPUs is
     visible to all CPUs

  2. A write that is visible to all CPUs is also visible to all other
     observers in the shareability domain

Pieced together, this allows us to use DMB OSHST for our default I/O
write accessors and DMB OSHLD for our default I/O read accessors (the
outer-shareability is for handling non-cacheable mappings) for shared
devices. Memory-mapped, DMA-capable peripherals that are private to a
CPU (i.e. inaccessible to other CPUs) still require the DSB, however
these are few and far between and typically require special treatment
anyway which is outside of the scope of the portable driver API (e.g.
GIC, page-table walker, SPE profiler).

Note that our mandatory barriers remain as DSBs, since there are cases
where they are used to flush the store buffer of the CPU, e.g. when
publishing page table updates to the SMMU.

[1] https://git.kernel.org/linus/4614bbdee357
[2] https://www.youtube.com/watch?v=i6DayghhA8Q
[3] https://www.cl.cam.ac.uk/~pes20/armv8-mca/

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-08-05 12:35:23 +01:00
..
alpha Merge branch 'akpm' (patches from Andrew) 2019-07-17 08:58:04 -07:00
arc Merge branch 'akpm' (patches from Andrew) 2019-07-17 08:58:04 -07:00
arm add swiotlb support to arm 2019-08-02 08:44:33 -07:00
arm64 arm64: io: Relax implicit barriers in default I/O accessors 2019-08-05 12:35:23 +01:00
c6x Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu 2019-07-10 21:42:03 -07:00
csky treewide: add "WITH Linux-syscall-note" to SPDX tag of uapi headers 2019-07-25 11:05:10 +02:00
h8300 h8300 update for 5.3 2019-07-17 09:36:38 -07:00
hexagon hexagon: switch to generic version of pte allocation 2019-07-21 09:53:00 -07:00
ia64 Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
m68k arch: mark syscall number 435 reserved for clone3 2019-07-15 00:39:33 +02:00
microblaze clone3-v5.3 2019-07-11 10:09:44 -07:00
mips page flags: prioritize kasan bits over last-cpuid 2019-08-03 07:02:01 -07:00
nds32 treewide: add "WITH Linux-syscall-note" to SPDX tag of uapi headers 2019-07-25 11:05:10 +02:00
nios2 nios2 update for v5.3-rc1 2019-07-12 15:38:05 -07:00
openrisc dma-mapping updates for Linux 5.3 2019-07-12 15:13:55 -07:00
parisc parisc: Add archclean Makefile target 2019-08-01 14:20:55 +02:00
powerpc powerpc/kasan: fix early boot failure on PPC32 2019-07-31 22:02:52 +10:00
riscv riscv: defconfig: align RV64 defconfig to the output of "make savedefconfig" 2019-07-31 12:26:10 -07:00
s390 s390/mm: add fallthrough annotations 2019-07-29 18:05:03 +02:00
sh treewide: add "WITH Linux-syscall-note" to SPDX tag of uapi headers 2019-07-25 11:05:10 +02:00
sparc treewide: add "WITH Linux-syscall-note" to SPDX tag of uapi headers 2019-07-25 11:05:10 +02:00
um This pull request contains the following changes for UML: 2019-07-14 17:17:34 -07:00
unicore32 Kconfig updates for v5.3 2019-07-12 16:06:27 -07:00
x86 x86/vdso/32: Use 32bit syscall fallback 2019-07-31 00:09:10 +02:00
xtensa xtensa: fix build for cores with coprocessors 2019-07-24 17:44:42 -07:00
.gitignore
Kconfig Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-07-20 10:33:44 -07:00