linux/Documentation
Jérôme Glisse 0f10851ea4 mm/mmu_notifier: avoid double notification when it is useless
This patch only affects users of mmu_notifier->invalidate_range callback
which are device drivers related to ATS/PASID, CAPI, IOMMUv2, SVM ...
and it is an optimization for those users.  Everyone else is unaffected
by it.

When clearing a pte/pmd we are given a choice to notify the event under
the page table lock (notify version of *_clear_flush helpers do call the
mmu_notifier_invalidate_range).  But that notification is not necessary
in all cases.

This patch removes almost all cases where it is useless to have a call
to mmu_notifier_invalidate_range before
mmu_notifier_invalidate_range_end.  It also adds documentation in all
those cases explaining why.

Below is a more in depth analysis of why this is fine to do this:

For secondary TLB (non CPU TLB) like IOMMU TLB or device TLB (when
device use thing like ATS/PASID to get the IOMMU to walk the CPU page
table to access a process virtual address space).  There is only 2 cases
when you need to notify those secondary TLB while holding page table
lock when clearing a pte/pmd:

  A) page backing address is free before mmu_notifier_invalidate_range_end
  B) a page table entry is updated to point to a new page (COW, write fault
     on zero page, __replace_page(), ...)

Case A is obvious you do not want to take the risk for the device to write
to a page that might now be used by something completely different.

Case B is more subtle. For correctness it requires the following sequence
to happen:
  - take page table lock
  - clear page table entry and notify (pmd/pte_huge_clear_flush_notify())
  - set page table entry to point to new page

If clearing the page table entry is not followed by a notify before setting
the new pte/pmd value then you can break memory model like C11 or C++11 for
the device.

Consider the following scenario (device use a feature similar to ATS/
PASID):

Two address addrA and addrB such that |addrA - addrB| >= PAGE_SIZE we
assume they are write protected for COW (other case of B apply too).

[Time N] -----------------------------------------------------------------
CPU-thread-0  {try to write to addrA}
CPU-thread-1  {try to write to addrB}
CPU-thread-2  {}
CPU-thread-3  {}
DEV-thread-0  {read addrA and populate device TLB}
DEV-thread-2  {read addrB and populate device TLB}
[Time N+1] ---------------------------------------------------------------
CPU-thread-0  {COW_step0: {mmu_notifier_invalidate_range_start(addrA)}}
CPU-thread-1  {COW_step0: {mmu_notifier_invalidate_range_start(addrB)}}
CPU-thread-2  {}
CPU-thread-3  {}
DEV-thread-0  {}
DEV-thread-2  {}
[Time N+2] ---------------------------------------------------------------
CPU-thread-0  {COW_step1: {update page table point to new page for addrA}}
CPU-thread-1  {COW_step1: {update page table point to new page for addrB}}
CPU-thread-2  {}
CPU-thread-3  {}
DEV-thread-0  {}
DEV-thread-2  {}
[Time N+3] ---------------------------------------------------------------
CPU-thread-0  {preempted}
CPU-thread-1  {preempted}
CPU-thread-2  {write to addrA which is a write to new page}
CPU-thread-3  {}
DEV-thread-0  {}
DEV-thread-2  {}
[Time N+3] ---------------------------------------------------------------
CPU-thread-0  {preempted}
CPU-thread-1  {preempted}
CPU-thread-2  {}
CPU-thread-3  {write to addrB which is a write to new page}
DEV-thread-0  {}
DEV-thread-2  {}
[Time N+4] ---------------------------------------------------------------
CPU-thread-0  {preempted}
CPU-thread-1  {COW_step3: {mmu_notifier_invalidate_range_end(addrB)}}
CPU-thread-2  {}
CPU-thread-3  {}
DEV-thread-0  {}
DEV-thread-2  {}
[Time N+5] ---------------------------------------------------------------
CPU-thread-0  {preempted}
CPU-thread-1  {}
CPU-thread-2  {}
CPU-thread-3  {}
DEV-thread-0  {read addrA from old page}
DEV-thread-2  {read addrB from new page}

So here because at time N+2 the clear page table entry was not pair with a
notification to invalidate the secondary TLB, the device see the new value
for addrB before seing the new value for addrA.  This break total memory
ordering for the device.

When changing a pte to write protect or to point to a new write protected
page with same content (KSM) it is ok to delay invalidate_range callback
to mmu_notifier_invalidate_range_end() outside the page table lock.  This
is true even if the thread doing page table update is preempted right
after releasing page table lock before calling
mmu_notifier_invalidate_range_end

Thanks to Andrea for thinking of a problematic scenario for COW.

[jglisse@redhat.com: v2]
  Link: http://lkml.kernel.org/r/20171017031003.7481-2-jglisse@redhat.com
Link: http://lkml.kernel.org/r/20170901173011.10745-1-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alistair Popple <alistair@popple.id.au>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:03 -08:00
..
ABI This is the bulk of GPIO changes for the v4.15 kernel cycle: 2017-11-14 17:23:44 -08:00
accounting
acpi ACPI / LPIT: Add Low Power Idle Table (LPIT) support 2017-10-11 15:38:10 +02:00
admin-guide Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-11-13 17:33:11 -08:00
aoe
arm Documentation: arm: Replace use of virt_to_phys with __pa_symbol 2017-07-17 13:43:58 -06:00
arm64 arm64 updates for 4.15 2017-11-15 10:56:56 -08:00
auxdisplay
backlight
blackfin
block block: remove __bio_kmap_atomic 2017-11-10 19:53:25 -07:00
blockdev SCSI misc on 20170907 2017-09-07 21:11:05 -07:00
bus-devices
cdrom documentation: Update ide-cd documentation to reflect CONFIG_BLK_DEV_HD_IDE removal 2017-10-12 11:25:31 -06:00
cgroup-v1 mm, vmpressure: pass-through notification support 2017-07-10 16:32:31 -07:00
cma
connector
console
core-api A relatively calm cycle for the docs tree again. 2017-11-13 08:25:06 -08:00
cpu-freq cpufreq: stats: Handle the case when trans_table goes beyond PAGE_SIZE 2017-11-08 23:41:25 +01:00
cpuidle
cris
crypto crypto: doc - adapt api sample to use async. op wait 2017-11-03 22:11:23 +08:00
dev-tools docs: dev-tools: correct Coccinelle version number 2017-10-19 12:55:24 -06:00
device-mapper dm raid: fix incorrect status output at the end of a "recover" process 2017-10-05 16:21:30 -04:00
devicetree arm64 updates for 4.15 2017-11-15 10:56:56 -08:00
doc-guide Documentation: fix ref to sphinx/kerneldoc.py 2017-10-19 12:57:10 -06:00
driver-api SCSI misc on 20171114 2017-11-14 16:23:44 -08:00
driver-model driver core: remove DRIVER_ATTR 2017-09-19 09:20:33 +02:00
early-userspace
EDID
extcon extcon: Remove porting compatibility of swich class 2017-04-06 10:55:24 +09:00
fault-injection cpu/hotplug: Get rid of CPU hotplug notifier leftovers 2017-11-13 10:03:53 +01:00
fb documentation: fb: update list of available compiled-in fonts 2017-11-08 03:39:52 -07:00
features Documentation/features/KASAN: mark KASAN as supported only on 64-bit on x86 2017-10-03 14:40:22 -06:00
filesystems Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2017-11-14 14:13:11 -08:00
firmware_class
fmc
fpga
frv
gpio gpio: Move irq_valid_mask into struct gpio_irq_chip 2017-11-08 14:10:18 +01:00
gpu Merge tag 'drm-intel-next-2017-08-18' of git://anongit.freedesktop.org/git/drm-intel into drm-next 2017-08-22 10:03:07 +10:00
hid Documentation: fix input related doc refs 2017-10-12 11:14:06 -06:00
hwmon pmbus: Add driver for Maxim MAX31785 Intelligent Fan Controller 2017-11-05 06:06:33 -08:00
i2c i2c: i801: Add support for Intel Cedar Fork 2017-10-05 14:44:56 +02:00
ia64
ide
iio iio: adc: New driver for Cirrus Logic EP93xx ADC 2017-07-25 19:56:23 +01:00
infiniband Documentation: Hardware tag matching 2017-08-29 08:30:21 -04:00
input Documentation: fix input related doc refs 2017-10-12 11:14:06 -06:00
ioctl scsi: cxlflash: Introduce host ioctl support 2017-06-26 15:01:11 -04:00
isdn
kbuild DeviceTree for 4.15: 2017-11-14 18:25:40 -08:00
kdump kexec/kdump: minor Documentation updates for arm64 and Image 2017-07-12 16:26:00 -07:00
kernel-hacking There has been a fair amount of activity in the docs tree this time 2017-07-03 21:13:25 -07:00
laptops Documentation: fix admin-guide doc refs 2017-10-12 11:13:28 -06:00
leds Documentation: leds: Update 00-INDEX file 2017-10-23 20:17:03 +02:00
lightnvm lightnvm: physical block device (pblk) target 2017-04-16 10:06:33 -06:00
livepatch livepatch: add (un)patch callbacks 2017-10-19 10:08:56 +02:00
locking Documentation: fix locking rt-mutex doc refs 2017-10-19 12:56:44 -06:00
m68k
md md-cluster: update document for raid10 2017-11-01 21:32:25 -07:00
media Documentation: fix media related doc refs 2017-10-12 11:15:06 -06:00
memory-devices
metag
mic
mips
misc-devices Documentation: misc-devices: Add Documentation for pci-endpoint-test driver 2017-04-28 10:23:19 -05:00
mmc MMC core: 2017-05-02 17:34:32 -07:00
mn10300
mtd
namespaces
netlabel
networking A relatively calm cycle for the docs tree again. 2017-11-13 08:25:06 -08:00
nfc
nios2
nvdimm
nvmem NVMEM documentation fix: A minor typo 2017-08-24 13:31:58 -06:00
openrisc Documentation: openrisc: Updates to README 2017-10-30 21:37:53 +09:00
parisc
PCI docs: update old references for DocBook from the documentation 2017-05-16 08:44:19 -03:00
pcmcia
perf Documentation: perf: hisi: Documentation for HiSilicon SoC PMU driver 2017-10-19 17:06:34 +01:00
phy
platform
power Power management updates for v4.15-rc1 2017-11-13 19:43:50 -08:00
powerpc powerpc updates for 4.13 2017-07-07 13:55:45 -07:00
pps drivers/pps: aesthetic tweaks to PPS-related content 2017-09-08 18:26:51 -07:00
process A relatively calm cycle for the docs tree again. 2017-11-13 08:25:06 -08:00
pti
ptp
rapidio
RCU Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-11-13 12:18:10 -08:00
s390
scheduler sched/deadline: Add documentation about GRUB reclaiming 2017-06-08 10:31:56 +02:00
scsi scsi: smartpqi: correct spelling error in documentation 2017-10-23 04:15:17 -04:00
security Documentation: fix security related doc refs 2017-10-12 11:14:40 -06:00
serial tty: n_gsm: do not send/receive in ldisc close path 2017-06-03 18:48:52 +09:00
sh docs-rst: convert sh book to ReST 2017-05-16 08:44:18 -03:00
sound sound updates for 4.15-rc1 2017-11-14 18:01:46 -08:00
sparc
sphinx Documentation/sphinx: fix kernel-doc decode for non-utf-8 locale 2017-08-31 13:36:28 -06:00
sphinx-static docs RTD theme: code-block with line nos - lines and line numbers don't line up. 2017-07-17 13:48:45 -06:00
spi spi: Document SPI slave controller support 2017-05-26 13:11:00 +01:00
sysctl mm/page-writeback.c: print a warning if the vm dirtiness settings are illogical 2017-11-15 18:21:03 -08:00
target Documentation/target: add an example script to configure an iSCSI target 2017-05-01 22:21:35 -07:00
thermal Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux 2017-05-12 11:58:45 -07:00
timers docs: highres: fix broken urls 2017-09-26 14:58:23 -06:00
trace ftrace/docs: Add documentation on how to use ftrace from within the kernel 2017-11-05 09:55:29 -07:00
translations locking/barriers: Kill lockless_dereference() 2017-10-24 13:17:33 +02:00
usb Documentation: fix usb related doc refs 2017-10-12 11:15:48 -06:00
userspace-api seccomp: Implement SECCOMP_RET_KILL_PROCESS action 2017-08-14 13:46:50 -07:00
virtual KVM/ARM Changes for v4.14 2017-09-07 18:22:04 +02:00
vm mm/mmu_notifier: avoid double notification when it is useless 2017-11-15 18:21:03 -08:00
w1
watchdog Documentation: fix selftests related file refs 2017-10-19 12:58:21 -06:00
wimax
x86 Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-11-13 19:05:19 -08:00
xtensa of: update ePAPR references to point to Devicetree Specification 2017-06-22 11:22:06 -05:00
.gitignore
00-INDEX linux-kselftest-4.13-rc1-update 2017-07-07 14:04:47 -07:00
atomic_bitops.txt Documentation/locking/atomic: Add documents for new atomic_t APIs 2017-08-10 12:29:00 +02:00
atomic_t.txt Documentation/locking/atomic: Finish the document... 2017-08-25 11:06:33 +02:00
bcache.txt bcache.txt: standardize document format 2017-07-14 13:51:27 -06:00
bt8xxgpio.txt bt8xxgpio.txt: standardize document format 2017-07-14 13:51:27 -06:00
btmrvl.txt btmrvl.txt: standardize document format 2017-07-14 13:51:27 -06:00
bus-virt-phys-mapping.txt bus-virt-phys-mapping.txt: standardize document format 2017-07-14 13:51:28 -06:00
cachetlb.txt cachetlb.txt: standardize document format 2017-07-14 13:51:28 -06:00
cgroup-v2.txt cgroup: add cgroup.stat interface with basic hierarchy stats 2017-08-02 12:05:20 -07:00
Changes
circular-buffers.txt circular-buffers.txt: standardize document format 2017-07-14 13:51:29 -06:00
clk.txt clk.txt: standardize document format 2017-07-14 13:51:29 -06:00
CodingStyle
conf.py docs-rst: don't require adjustbox anymore 2017-09-08 10:02:55 -06:00
cpu-load.txt cpu-load: standardize document format 2017-07-14 13:51:30 -06:00
cputopology.txt cputopology.txt: standardize document format 2017-07-14 13:51:30 -06:00
crc32.txt crc32.txt: standardize document format 2017-07-14 13:51:30 -06:00
dcdbas.txt dcdbas.txt: standardize document format 2017-07-14 13:51:31 -06:00
debugging-modules.txt
debugging-via-ohci1394.txt debugging-via-ohci1394.txt: standardize document format 2017-07-14 13:51:34 -06:00
dell_rbu.txt dell_rbu.txt: standardize document format 2017-07-14 13:58:12 -06:00
digsig.txt digsig.txt: standardize document format 2017-07-14 13:51:31 -06:00
DMA-API-HOWTO.txt DMA-API-HOWTO.txt: standardize document format 2017-07-14 13:51:32 -06:00
DMA-API.txt dma-coherent: remove the DMA_MEMORY_MAP and DMA_MEMORY_IO flags 2017-09-01 11:59:17 +02:00
DMA-attributes.txt DMA-attributes.txt: standardize document format 2017-07-14 13:51:33 -06:00
DMA-ISA-LPC.txt DMA-ISA-LPC.txt: standardize document format 2017-07-14 13:51:33 -06:00
docutils.conf
dontdiff Remove gperf usage from toolchain 2017-08-19 11:02:53 -07:00
efi-stub.txt efi-stub.txt: standardize document format 2017-07-14 13:51:34 -06:00
eisa.txt eisa.txt: standardize document format 2017-07-14 13:51:34 -06:00
errseq.rst Documentation: add some docs for errseq_t 2017-07-29 09:01:02 -04:00
flexible-arrays.txt flexible-arrays.txt: standardize document format 2017-07-14 13:51:35 -06:00
futex-requeue-pi.txt futex-requeue-pi.txt: standardize document format 2017-07-14 13:51:35 -06:00
gcc-plugins.txt gcc-plugins.txt: standardize document format 2017-07-14 13:51:36 -06:00
highuid.txt highuid.txt: standardize document format 2017-07-14 13:51:36 -06:00
hw_random.txt hw_random.txt: standardize document format 2017-07-14 13:51:37 -06:00
hwspinlock.txt hwspinlock.txt: standardize document format 2017-07-14 13:51:37 -06:00
index.rst Make the main documentation title less Geocities 2017-06-23 14:02:27 -06:00
intel_txt.txt intel_txt.txt: standardize document format 2017-07-14 13:51:38 -06:00
Intel-IOMMU.txt Intel-IOMMU.txt: standardize document format 2017-07-14 13:51:38 -06:00
io_ordering.txt io_ordering.txt: standardize document format 2017-07-14 13:51:39 -06:00
io-mapping.txt io-mapping.txt: standardize document format 2017-07-14 13:51:38 -06:00
iostats.txt iostats.txt: update it to cover recent Kernels 2017-07-14 13:51:40 -06:00
IPMI.txt IPMI.txt: standardize document format 2017-07-14 13:51:40 -06:00
IRQ-affinity.txt IRQ-affinity.txt: standardize document format 2017-07-14 13:51:41 -06:00
IRQ-domain.txt IRQ-domain.txt: standardize document format 2017-07-14 13:51:41 -06:00
IRQ.txt IRQ.txt: add a markup for its title 2017-07-14 13:51:42 -06:00
irqflags-tracing.txt irqflags-tracing.txt: standardize document format 2017-07-14 13:51:42 -06:00
isa.txt isa.txt: standardize document format 2017-07-14 13:51:43 -06:00
isapnp.txt isapnp.txt: promote title level 2017-07-14 13:51:43 -06:00
kernel-doc-nano-HOWTO.txt docs: update old references for DocBook from the documentation 2017-05-16 08:44:19 -03:00
kernel-per-CPU-kthreads.txt kernel-per-CPU-kthreads.txt: standardize document format 2017-07-14 13:51:43 -06:00
kobject.txt kobject.txt: standardize document format 2017-07-14 13:51:44 -06:00
kprobes.txt kprobes/docs: Remove jprobes related documents 2017-10-20 11:02:55 +02:00
kref.txt kref.txt: standardize document format 2017-07-14 13:51:45 -06:00
ldm.txt ldm.txt: standardize document format 2017-07-14 13:51:45 -06:00
lockup-watchdogs.txt lockup-watchdogs.txt: standardize document format 2017-07-14 13:51:46 -06:00
logo.gif
logo.txt
lsm.txt docs-rst: convert lsm from DocBook to ReST 2017-05-16 08:44:19 -03:00
lzo.txt lzo.txt: standardize document format 2017-07-14 13:51:46 -06:00
mailbox.txt mailbox.txt: standardize document format 2017-07-14 13:51:47 -06:00
Makefile Documentation: add script and build target to check for broken file references 2017-10-12 11:07:42 -06:00
memory-barriers.txt Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-11-13 12:38:26 -08:00
memory-hotplug.txt memory-hotplug.txt: standardize document format 2017-07-14 13:57:53 -06:00
men-chameleon-bus.txt men-chameleon-bus.txt: standardize document format 2017-07-14 13:57:54 -06:00
nommu-mmap.txt nommu-mmap.txt: don't use all upper case on titles 2017-07-14 13:57:55 -06:00
ntb.txt This series converts a number of top-level documents to the RST format 2017-07-15 12:58:58 -07:00
numastat.txt numastat.txt: standardize document format 2017-07-14 13:57:56 -06:00
padata.txt padata.txt: standardize document format 2017-07-14 13:57:56 -06:00
parport-lowlevel.txt parport-lowlevel.txt: standardize document format 2017-07-14 13:57:57 -06:00
percpu-rw-semaphore.txt percpu-rw-semaphore.txt: standardize document format 2017-07-14 13:57:58 -06:00
phy.txt phy.txt: standardize document format 2017-07-14 13:57:58 -06:00
pi-futex.txt Documentation: fix locking rt-mutex doc refs 2017-10-19 12:56:44 -06:00
pnp.txt pnp.txt: standardize document format 2017-07-14 13:57:59 -06:00
preempt-locking.txt preempt-locking.txt: standardize document format 2017-07-14 13:58:00 -06:00
printk-formats.txt printk-formats.txt: Add examples for %pF and %pS usage 2017-08-24 18:49:52 +02:00
pwm.txt pwm: Standardize document format 2017-07-06 08:23:30 +02:00
rbtree.txt rbtree: cache leftmost node internally 2017-09-08 18:26:48 -07:00
remoteproc.txt remoteproc.txt: standardize document format 2017-07-14 13:58:02 -06:00
rfkill.txt rfkill.txt: standardize document format 2017-07-14 13:58:02 -06:00
robust-futex-ABI.txt robust-futex-ABI.txt: standardize document format 2017-07-14 13:58:03 -06:00
robust-futexes.txt robust-futexes.txt: standardize document format 2017-07-14 13:58:03 -06:00
rpmsg.txt rpmsg.txt: standardize document format 2017-07-14 13:58:04 -06:00
rtc.txt rtc: add generic nvmem support 2017-07-07 13:14:14 +02:00
SAK.txt SAK.txt: standardize document format 2017-07-14 13:58:04 -06:00
sgi-ioc4.txt sgi-ioc4.txt: standardize document format 2017-07-14 13:58:05 -06:00
siphash.txt siphash.txt: standardize document format 2017-07-14 13:58:06 -06:00
SM501.txt SM501.txt: standardize document format 2017-07-14 13:58:06 -06:00
smsc_ece1099.txt smsc_ece1099.txt: standardize document format 2017-07-14 13:58:07 -06:00
static-keys.txt jump_label: Provide hotplug context variants 2017-08-10 12:28:59 +02:00
SubmittingPatches
svga.txt svga.txt: standardize document format 2017-07-14 13:58:08 -06:00
switchtec.txt switchtec: Add IOCTLs to the Switchtec driver 2017-04-12 12:23:37 -05:00
sync_file.txt sync_file.txt: standardize document format 2017-05-24 13:01:27 -03:00
tee.txt tee.txt: standardize document format 2017-07-14 13:58:14 -06:00
this_cpu_ops.txt this_cpu_ops.txt: standardize document format 2017-07-14 13:58:08 -06:00
unaligned-memory-access.txt unaligned-memory-access.txt: standardize document format 2017-07-14 13:58:09 -06:00
vfio-mediated-device.txt vfio-mediated-device.txt: standardize document format 2017-07-14 13:58:10 -06:00
vfio.txt vfio.txt: standardize document format 2017-07-14 13:58:10 -06:00
video-output.txt
xillybus.txt xillybus.txt: standardize document format 2017-07-14 13:58:11 -06:00
xz.txt xz.txt: standardize document format 2017-07-14 13:58:11 -06:00
zorro.txt zorro.txt: standardize document format 2017-07-14 13:58:12 -06:00