linux/fs
Jérôme Glisse 0f10851ea4 mm/mmu_notifier: avoid double notification when it is useless
This patch only affects users of mmu_notifier->invalidate_range callback
which are device drivers related to ATS/PASID, CAPI, IOMMUv2, SVM ...
and it is an optimization for those users.  Everyone else is unaffected
by it.

When clearing a pte/pmd we are given a choice to notify the event under
the page table lock (notify version of *_clear_flush helpers do call the
mmu_notifier_invalidate_range).  But that notification is not necessary
in all cases.

This patch removes almost all cases where it is useless to have a call
to mmu_notifier_invalidate_range before
mmu_notifier_invalidate_range_end.  It also adds documentation in all
those cases explaining why.

Below is a more in depth analysis of why this is fine to do this:

For secondary TLB (non CPU TLB) like IOMMU TLB or device TLB (when
device use thing like ATS/PASID to get the IOMMU to walk the CPU page
table to access a process virtual address space).  There is only 2 cases
when you need to notify those secondary TLB while holding page table
lock when clearing a pte/pmd:

  A) page backing address is free before mmu_notifier_invalidate_range_end
  B) a page table entry is updated to point to a new page (COW, write fault
     on zero page, __replace_page(), ...)

Case A is obvious you do not want to take the risk for the device to write
to a page that might now be used by something completely different.

Case B is more subtle. For correctness it requires the following sequence
to happen:
  - take page table lock
  - clear page table entry and notify (pmd/pte_huge_clear_flush_notify())
  - set page table entry to point to new page

If clearing the page table entry is not followed by a notify before setting
the new pte/pmd value then you can break memory model like C11 or C++11 for
the device.

Consider the following scenario (device use a feature similar to ATS/
PASID):

Two address addrA and addrB such that |addrA - addrB| >= PAGE_SIZE we
assume they are write protected for COW (other case of B apply too).

[Time N] -----------------------------------------------------------------
CPU-thread-0  {try to write to addrA}
CPU-thread-1  {try to write to addrB}
CPU-thread-2  {}
CPU-thread-3  {}
DEV-thread-0  {read addrA and populate device TLB}
DEV-thread-2  {read addrB and populate device TLB}
[Time N+1] ---------------------------------------------------------------
CPU-thread-0  {COW_step0: {mmu_notifier_invalidate_range_start(addrA)}}
CPU-thread-1  {COW_step0: {mmu_notifier_invalidate_range_start(addrB)}}
CPU-thread-2  {}
CPU-thread-3  {}
DEV-thread-0  {}
DEV-thread-2  {}
[Time N+2] ---------------------------------------------------------------
CPU-thread-0  {COW_step1: {update page table point to new page for addrA}}
CPU-thread-1  {COW_step1: {update page table point to new page for addrB}}
CPU-thread-2  {}
CPU-thread-3  {}
DEV-thread-0  {}
DEV-thread-2  {}
[Time N+3] ---------------------------------------------------------------
CPU-thread-0  {preempted}
CPU-thread-1  {preempted}
CPU-thread-2  {write to addrA which is a write to new page}
CPU-thread-3  {}
DEV-thread-0  {}
DEV-thread-2  {}
[Time N+3] ---------------------------------------------------------------
CPU-thread-0  {preempted}
CPU-thread-1  {preempted}
CPU-thread-2  {}
CPU-thread-3  {write to addrB which is a write to new page}
DEV-thread-0  {}
DEV-thread-2  {}
[Time N+4] ---------------------------------------------------------------
CPU-thread-0  {preempted}
CPU-thread-1  {COW_step3: {mmu_notifier_invalidate_range_end(addrB)}}
CPU-thread-2  {}
CPU-thread-3  {}
DEV-thread-0  {}
DEV-thread-2  {}
[Time N+5] ---------------------------------------------------------------
CPU-thread-0  {preempted}
CPU-thread-1  {}
CPU-thread-2  {}
CPU-thread-3  {}
DEV-thread-0  {read addrA from old page}
DEV-thread-2  {read addrB from new page}

So here because at time N+2 the clear page table entry was not pair with a
notification to invalidate the secondary TLB, the device see the new value
for addrB before seing the new value for addrA.  This break total memory
ordering for the device.

When changing a pte to write protect or to point to a new write protected
page with same content (KSM) it is ok to delay invalidate_range callback
to mmu_notifier_invalidate_range_end() outside the page table lock.  This
is true even if the thread doing page table update is preempted right
after releasing page table lock before calling
mmu_notifier_invalidate_range_end

Thanks to Andrea for thinking of a problematic scenario for COW.

[jglisse@redhat.com: v2]
  Link: http://lkml.kernel.org/r/20171017031003.7481-2-jglisse@redhat.com
Link: http://lkml.kernel.org/r/20170901173011.10745-1-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alistair Popple <alistair@popple.id.au>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:03 -08:00
..
9p License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
adfs License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
affs License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
afs License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
autofs4 Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-09-14 18:13:32 -07:00
befs License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
bfs License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
btrfs Merge branch 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux 2017-11-14 13:35:29 -08:00
cachefiles License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ceph License cleanup: add SPDX license identifiers to some files 2017-11-02 10:04:46 -07:00
cifs Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2017-11-14 10:52:09 -08:00
coda License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
configfs configfs: make ci_type field, some pointers and function arguments const 2017-10-19 16:15:16 +02:00
cramfs License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
crypto fscrypt: lots of cleanups, mostly courtesy by Eric Biggers 2017-11-14 11:35:15 -08:00
debugfs Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
devpts pty: Repair TIOCGPTPEER 2017-08-24 13:23:03 -07:00
dlm A couple of configfs cleanups: 2017-11-14 14:44:04 -08:00
ecryptfs slab, slub, slob: add slab_flags_t 2017-11-15 18:21:01 -08:00
efivarfs VFS: Kill off s_options and helpers 2017-07-11 06:09:21 -04:00
efs License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
exofs Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
exportfs
ext2 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2017-11-14 14:13:11 -08:00
ext4 Add support for online resizing of file systems with bigalloc. Fix a 2017-11-14 12:59:42 -08:00
f2fs fscrypt: lots of cleanups, mostly courtesy by Eric Biggers 2017-11-14 11:35:15 -08:00
fat License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
freevxfs
fscache License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
fuse Merge branch 'linus' into locking/core, to resolve conflicts 2017-11-07 10:32:44 +01:00
gfs2 We've got a total of 17 GFS2 patches for this merge window. The 2017-11-14 13:55:51 -08:00
hfs License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
hfsplus License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
hostfs License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
hpfs License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
hugetlbfs fs/hugetlbfs/inode.c: remove redundant -ENIVAL return from hugetlbfs_setattr() 2017-11-15 18:21:03 -08:00
isofs Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2017-11-14 14:13:11 -08:00
jbd2 jbd2: convert timers to use timer_setup() 2017-10-18 12:40:28 -04:00
jffs2 License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
jfs A couple small fixes for jfs 2017-11-14 13:53:18 -08:00
kernfs Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
lockd License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
minix License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ncpfs Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-11-13 17:56:58 -08:00
nfs Merge branch 'linus' into locking/core, to resolve conflicts 2017-11-07 10:32:44 +01:00
nfs_common
nfsd Add support for online resizing of file systems with bigalloc. Fix a 2017-11-14 12:59:42 -08:00
nilfs2 License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nls License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
notify Merge branch 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2017-11-15 10:14:11 -08:00
ntfs Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block 2017-11-14 15:32:19 -08:00
ocfs2 ocfs2: remove unneeded goto in ocfs2_reserve_cluster_bitmap_bits() 2017-11-15 18:21:01 -08:00
omfs License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
openpromfs
orangefs License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
overlayfs Merge branch 'linus' into locking/core, to resolve conflicts 2017-11-07 10:32:44 +01:00
proc TTY/Serial patches for 4.15-rc1 2017-11-13 21:05:31 -08:00
pstore Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-11-13 17:56:58 -08:00
qnx4 License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
qnx6 License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
quota Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2017-11-14 14:13:11 -08:00
ramfs mm: make pagevec_lookup() update index 2017-09-06 17:27:26 -07:00
reiserfs License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
romfs License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
squashfs License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sysfs
sysv License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
tracefs
ubifs fscrypt: lots of cleanups, mostly courtesy by Eric Biggers 2017-11-14 11:35:15 -08:00
udf Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2017-11-14 14:13:11 -08:00
ufs License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
xfs slab, slub, slob: add slab_flags_t 2017-11-15 18:21:01 -08:00
aio.c locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE() 2017-10-25 11:01:08 +02:00
anon_inodes.c
attr.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
bad_inode.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
binfmt_aout.c fs: fix kernel_read prototype 2017-09-04 19:05:15 -04:00
binfmt_elf_fdpic.c Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-09-14 18:13:32 -07:00
binfmt_elf.c regset: Add support for dynamically sized regsets 2017-11-03 15:24:11 +00:00
binfmt_em86.c
binfmt_flat.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
binfmt_misc.c fs/binfmt_misc.c: node could be NULL when evicting inode 2017-10-13 16:18:33 -07:00
binfmt_script.c exec: load_script: kill the onstack interp[BINPRM_BUF_SIZE] array 2017-10-03 17:54:25 -07:00
block_dev.c Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block 2017-11-14 15:32:19 -08:00
buffer.c Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block 2017-11-14 15:32:19 -08:00
char_dev.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
compat_binfmt_elf.c
compat_ioctl.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
compat.c
coredump.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dax.c mm/mmu_notifier: avoid double notification when it is useless 2017-11-15 18:21:03 -08:00
dcache.c locking/atomics, fs/dcache: Convert ACCESS_ONCE() to READ_ONCE()/WRITE_ONCE() 2017-10-25 11:00:57 +02:00
dcookies.c
direct-io.c Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block 2017-11-14 15:32:19 -08:00
drop_caches.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
eventfd.c
eventpoll.c fs/epoll: use faster rb_first_cached() 2017-09-08 18:26:49 -07:00
exec.c locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE() 2017-10-25 11:01:08 +02:00
fcntl.c Merge branch 'linus' into locking/core, to resolve conflicts 2017-11-07 10:32:44 +01:00
fhandle.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
file_table.c ima: call ima_file_free() prior to calling fasync 2017-11-08 15:16:36 -05:00
file.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
filesystems.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
fs_pin.c Merge branch 'linus' into locking/core, to resolve conflicts 2017-11-07 10:32:44 +01:00
fs_struct.c
fs-writeback.c writeback: merge try_to_writeback_inodes_sb_nr() into caller 2017-10-10 08:14:37 -06:00
inode.c locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE() 2017-10-25 11:01:08 +02:00
internal.h Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs 2017-09-13 09:11:44 -07:00
ioctl.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
iomap.c Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block 2017-11-14 15:32:19 -08:00
Kconfig fs/Kconfig: kill CONFIG_PERCPU_RWSEM some more 2017-07-12 16:26:00 -07:00
Kconfig.binfmt
libfs.c fs: convert __generic_file_fsync to use errseq_t based reporting 2017-07-06 07:02:29 -04:00
locks.c locks: restore a warn for leaked locks on close 2017-07-21 13:57:31 -04:00
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mbcache.c
mount.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mpage.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
namei.c Merge branch 'linus' into locking/core, to resolve conflicts 2017-11-07 10:32:44 +01:00
namespace.c vfs: fix mounting a filesystem with i_version 2017-11-08 15:16:36 -05:00
no-block.c
nsfs.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
open.c ovl: don't allow writing ioctl on lower layer 2017-09-05 12:53:12 +02:00
pipe.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
pnode.c
pnode.h
posix_acl.c
proc_namespace.c Merge branch 'linus' into locking/core, to resolve conflicts 2017-11-07 10:32:44 +01:00
read_write.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
readdir.c Merge branch 'linus' into locking/core, to resolve conflicts 2017-11-07 10:32:44 +01:00
select.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
seq_file.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
signalfd.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
splice.c locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE() 2017-10-25 11:01:08 +02:00
stack.c
stat.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
statfs.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
super.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sync.c Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block 2017-11-14 15:32:19 -08:00
timerfd.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
userfaultfd.c locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE() 2017-10-25 11:01:08 +02:00
utimes.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
xattr.c lsm: fix smack_inode_removexattr and xattr_getsecurity memleak 2017-10-04 18:03:15 +11:00