linux/mm
Rik van Riel 2084140594 mm: fix TLB flush race between migration, and change_protection_range
There are a few subtle races, between change_protection_range (used by
mprotect and change_prot_numa) on one side, and NUMA page migration and
compaction on the other side.

The basic race is that there is a time window between when the PTE gets
made non-present (PROT_NONE or NUMA), and the TLB is flushed.

During that time, a CPU may continue writing to the page.

This is fine most of the time, however compaction or the NUMA migration
code may come in, and migrate the page away.

When that happens, the CPU may continue writing, through the cached
translation, to what is no longer the current memory location of the
process.

This only affects x86, which has a somewhat optimistic pte_accessible.
All other architectures appear to be safe, and will either always flush,
or flush whenever there is a valid mapping, even with no permissions
(SPARC).

The basic race looks like this:

CPU A			CPU B			CPU C

						load TLB entry
make entry PTE/PMD_NUMA
			fault on entry
						read/write old page
			start migrating page
			change PTE/PMD to new page
						read/write old page [*]
flush TLB
						reload TLB from new entry
						read/write new page
						lose data

[*] the old page may belong to a new user at this point!

The obvious fix is to flush remote TLB entries, by making sure that
pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may
still be accessible if there is a TLB flush pending for the mm.

This should fix both NUMA migration and compaction.

[mgorman@suse.de: fix build]
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18 19:04:51 -08:00
..
backing-dev.c mm/backing-dev.c: check user buffer length before copying data to the related user buffer 2013-09-11 15:58:03 -07:00
balloon_compaction.c mm: introduce a common interface for balloon pages mobility 2012-12-11 17:22:26 -08:00
bootmem.c mm/bootmem.c: remove unused local `map' 2013-11-13 12:09:09 +09:00
bounce.c mm/bounce.c: fix a regression where MS_SNAP_STABLE (stable pages snapshotting) was ignored 2013-09-30 14:31:02 -07:00
cleancache.c mm: cleancache: clean up cleancache_enabled 2013-04-30 17:04:01 -07:00
compaction.c mm/compaction.c: update comment about zone lock in isolate_freepages_block 2013-11-13 12:09:03 +09:00
debug-pagealloc.c
dmapool.c dmapool: make DMAPOOL_DEBUG detect corruption of free marker 2012-12-11 17:22:24 -08:00
fadvise.c teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long 2013-03-03 22:46:22 -05:00
failslab.c
filemap_xip.c seqcount: Add lockdep functionality to seqcount/seqlock structures 2013-11-06 12:40:26 +01:00
filemap.c mm: drop actor argument of do_generic_file_read() 2013-11-15 09:32:13 +09:00
fremap.c mm: save soft-dirty bits on file pages 2013-08-13 17:57:48 -07:00
frontswap.c frontswap: fix incorrect zeroing and allocation size for frontswap_map 2013-06-12 16:29:46 -07:00
highmem.c Some nice cleanups, and even a patch my wife did as a "live" demo for 2012-12-20 08:37:05 -08:00
huge_memory.c mm: fix TLB flush race between migration, and change_protection_range 2013-12-18 19:04:51 -08:00
hugetlb_cgroup.c cgroup: pass around cgroup_subsys_state instead of cgroup in file methods 2013-08-08 20:11:24 -04:00
hugetlb.c mm: hugetlbfs: fix hugetlbfs optimization 2013-11-21 16:42:27 -08:00
hwpoison-inject.c mm/hwpoison: fix the lack of one reference count against poisoned page 2013-09-30 14:31:03 -07:00
init-mm.c
internal.h mm: vmscan: fix do_try_to_free_pages() livelock 2013-09-11 15:58:01 -07:00
interval_tree.c mm: add CONFIG_DEBUG_VM_RB build option 2012-10-09 16:22:42 +09:00
Kconfig Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2013-11-15 16:47:22 -08:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c mm: kmemleak: avoid false negatives on vmalloc'ed objects 2013-11-13 12:09:07 +09:00
ksm.c ksm: remove redundant __GFP_ZERO from kcalloc 2013-11-13 12:09:02 +09:00
list_lru.c mm: list_lru: fix almost infinite loop causing effective livelock 2013-10-30 12:57:46 -07:00
maccess.c
madvise.c mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood 2013-09-30 14:31:02 -07:00
Makefile list: add a new LRU list type 2013-09-10 18:56:30 -04:00
memblock.c mm/memblock.c: introduce bottom-up allocation mode 2013-11-13 12:09:08 +09:00
memcontrol.c mm: memcg: do not allow task about to OOM kill to bypass the limit 2013-12-12 18:19:26 -08:00
memory_hotplug.c mem-hotplug: introduce movable_node boot option 2013-11-13 12:09:09 +09:00
memory-failure.c kfifo API type safety 2013-11-15 09:32:23 +09:00
memory.c Revert "mm: create a separate slab for page->ptl allocation" 2013-11-20 14:41:47 -08:00
mempolicy.c mm, mempolicy: silence gcc warning 2013-11-21 16:42:27 -08:00
mempool.c mm/mempool.c: convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...) 2013-09-11 15:58:14 -07:00
migrate.c mm: numa: avoid unnecessary disruption of NUMA hinting during migration 2013-12-18 19:04:51 -08:00
mincore.c swap: make each swap partition have one address_space 2013-02-23 17:50:17 -08:00
mlock.c mm/mlock.c: prevent walking off the end of a pagetable in no-pmd configuration 2013-09-30 14:31:02 -07:00
mm_init.c mm: numa: Change page last {nid,pid} into {cpu,pid} 2013-10-09 14:47:45 +02:00
mmap.c mm: convert mm->nr_ptes to atomic_long_t 2013-11-15 09:32:14 +09:00
mmu_context.c mm: remove old aio use_mm() comment 2013-05-07 18:38:27 -07:00
mmu_notifier.c treewide: relase -> release 2013-06-28 14:34:33 +02:00
mmzone.c mm: numa: Change page last {nid,pid} into {cpu,pid} 2013-10-09 14:47:45 +02:00
mprotect.c mm: fix TLB flush race between migration, and change_protection_range 2013-12-18 19:04:51 -08:00
mremap.c mm: revert mremap pud_free anti-fix 2013-10-16 21:35:53 -07:00
msync.c
nobootmem.c mm/nobootmem.c: have __free_pages_memory() free in larger chunks. 2013-11-13 12:09:04 +09:00
nommu.c Merge branch 'akpm' (patches from Andrew Morton) 2013-11-13 15:45:43 +09:00
oom_kill.c mm: convert mm->nr_ptes to atomic_long_t 2013-11-15 09:32:14 +09:00
page_alloc.c mm/page_alloc.c: fix comment in zlc_setup() 2013-11-13 12:09:11 +09:00
page_cgroup.c memcontrol: use N_MEMORY instead N_HIGH_MEMORY 2012-12-12 17:38:32 -08:00
page_io.c aio: Kill aio_rw_vect_retry() 2013-07-30 11:53:12 -04:00
page_isolation.c mm: memory-hotplug: enable memory hotplug to handle hugepage 2013-09-11 15:57:48 -07:00
page-writeback.c writeback: fix negative bdi max pause 2013-10-16 21:35:53 -07:00
pagewalk.c mm/pagewalk.c: fix walk_page_range() access of wrong PTEs 2013-10-30 14:27:03 -07:00
percpu-km.c
percpu-vm.c
percpu.c percpu: fix bootmem error handling in pcpu_page_first_chunk() 2013-09-23 10:51:45 -04:00
pgtable-generic.c mm: fix TLB flush race between migration, and change_protection_range 2013-12-18 19:04:51 -08:00
process_vm_access.c Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys 2013-03-12 11:05:45 -07:00
quicklist.c
readahead.c readahead: fix sequential read cache miss detection 2013-11-13 12:09:09 +09:00
rmap.c mm, hugetlb: convert hugetlbfs to use split pmd lock 2013-11-15 09:32:14 +09:00
shmem.c security: shmem: implement kernel private shmem inodes 2013-12-02 11:24:19 +00:00
slab_common.c memcg, kmem: rename cache_from_memcg to cache_from_memcg_idx 2013-11-13 12:09:10 +09:00
slab.c Merge branch 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux 2013-11-22 08:10:34 -08:00
slab.h memcg, kmem: rename cache_from_memcg to cache_from_memcg_idx 2013-11-13 12:09:10 +09:00
slob.c mm/sl[aou]b: Move kmallocXXX functions to common code 2013-09-04 20:51:33 +03:00
slub.c Merge branch 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux 2013-11-22 08:10:34 -08:00
sparse-vmemmap.c sparse-vmemmap: specify vmemmap population range in bytes 2013-04-29 15:54:35 -07:00
sparse.c mm/sparsemem: fix a bug in free_map_bootmem when CONFIG_SPARSEMEM_VMEMMAP 2013-11-13 12:09:06 +09:00
swap_state.c lib/radix-tree.c: make radix_tree_node_alloc() work correctly within interrupt 2013-09-11 15:59:36 -07:00
swap.c mm: hugetlbfs: fix hugetlbfs optimization 2013-11-21 16:42:27 -08:00
swapfile.c frontswap: enable call to invalidate area on swapoff 2013-11-13 12:09:07 +09:00
truncate.c truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
util.c mm: factor commit limit calculation 2013-11-13 12:09:11 +09:00
vmalloc.c mm: kmemleak: avoid false negatives on vmalloc'ed objects 2013-11-13 12:09:07 +09:00
vmpressure.c Merge branch 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2013-09-03 18:25:03 -07:00
vmscan.c mm/vmscan.c: don't forget to free shrinker->nr_deferred 2013-10-16 21:35:52 -07:00
vmstat.c mm: numa: return the number of base pages altered by protection changes 2013-11-13 12:09:11 +09:00
zbud.c mm/zbud: fix some trivial typos in comments 2013-09-11 15:57:35 -07:00
zswap.c mm/zswap: refactor the get/put routines 2013-11-13 12:09:11 +09:00