linux/arch
Andrea Arcangeli 127393fbe5 mm: thp: kvm: fix memory corruption in KVM with THP enabled
After the THP refcounting change, obtaining a compound pages from
get_user_pages() no longer allows us to assume the entire compound page
is immediately mappable from a secondary MMU.

A secondary MMU doesn't want to call get_user_pages() more than once for
each compound page, in order to know if it can map the whole compound
page.  So a secondary MMU needs to know from a single get_user_pages()
invocation when it can map immediately the entire compound page to avoid
a flood of unnecessary secondary MMU faults and spurious
atomic_inc()/atomic_dec() (pages don't have to be pinned by MMU notifier
users).

Ideally instead of the page->_mapcount < 1 check, get_user_pages()
should return the granularity of the "page" mapping in the "mm" passed
to get_user_pages().  However it's non trivial change to pass the "pmd"
status belonging to the "mm" walked by get_user_pages up the stack (up
to the caller of get_user_pages).  So the fix just checks if there is
not a single pte mapping on the page returned by get_user_pages, and in
turn if the caller can assume that the whole compound page is mapped in
the current "mm" (in a pmd_trans_huge()).  In such case the entire
compound page is safe to map into the secondary MMU without additional
get_user_pages() calls on the surrounding tail/head pages.  In addition
of being faster, not having to run other get_user_pages() calls also
reduces the memory footprint of the secondary MMU fault in case the pmd
split happened as result of memory pressure.

Without this fix after a MADV_DONTNEED (like invoked by QEMU during
postcopy live migration or balloning) or after generic swapping (with a
failure in split_huge_page() that would only result in pmd splitting and
not a physical page split), KVM would map the whole compound page into
the shadow pagetables, despite regular faults or userfaults (like
UFFDIO_COPY) may map regular pages into the primary MMU as result of the
pte faults, leading to the guest mode and userland mode going out of
sync and not working on the same memory at all times.

Any other secondary MMU notifier manager (KVM is just one of the many
MMU notifier users) will need the same information if it doesn't want to
run a flood of get_user_pages_fast and it can support multiple
granularity in the secondary MMU mappings, so I think it is justified to
be exposed not just to KVM.

The other option would be to move transparent_hugepage_adjust to
mm/huge_memory.c but that currently has all kind of KVM data structures
in it, so it's definitely not a cut-and-paste work, so I couldn't do a
fix as cleaner as this one for 4.6.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: "Li, Liang Z" <liang.z.li@intel.com>
Cc: Amit Shah <amit.shah@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-05 17:38:53 -07:00
..
alpha alpha/extable: use generic search and sort routines 2016-03-22 15:36:02 -07:00
arc ARC: add support for reserved memory defined by device tree 2016-04-27 17:06:56 +05:30
arm mm: thp: kvm: fix memory corruption in KVM with THP enabled 2016-05-05 17:38:53 -07:00
arm64 Here are the latest bug fixes for ARM SoCs, mostly addressing 2016-04-26 16:17:01 -07:00
avr32 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-03-19 10:05:34 -07:00
blackfin arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections 2016-03-25 16:37:42 -07:00
c6x arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections 2016-03-25 16:37:42 -07:00
cris Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-20 19:08:56 -07:00
frv asm-generic changes for 4.6 2016-03-24 23:13:48 -07:00
h8300 h8300: switch EARLYCON 2016-03-25 01:45:19 +09:00
hexagon Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-03-19 10:05:34 -07:00
ia64 [IA64] Enable preadv2 and pwritev2 syscalls for ia64 2016-03-25 14:37:32 -07:00
m32r Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-03-19 10:05:34 -07:00
m68k Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu 2016-04-13 08:58:32 -07:00
metag arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections 2016-03-25 16:37:42 -07:00
microblaze arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections 2016-03-25 16:37:42 -07:00
mips MIPS: traps.c: Verify the ISA for microMIPS RDHWR emulation 2016-04-04 15:25:34 +02:00
mn10300 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-03-19 10:05:34 -07:00
nios2 nios2: memset: use the right constraint modifier for the %4 output operand 2016-04-27 16:35:55 +08:00
openrisc arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections 2016-03-25 16:37:42 -07:00
parisc Merge branch 'parisc-4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux 2016-04-15 14:51:45 -07:00
powerpc powerpc fixes for 4.6 #3 2016-04-29 18:50:08 -07:00
s390 s390/mm: fix asce_bits handling with dynamic pagetable levels 2016-04-21 09:50:09 +02:00
score Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-03-19 10:05:34 -07:00
sh sh: fix function signature of cpu_coregroup_mask to match pointer type 2016-03-30 00:47:49 +00:00
sparc sparc64: Fix bootup regressions on some Kconfig combinations. 2016-04-27 17:27:37 -04:00
tile Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile 2016-03-28 15:04:55 -05:00
um fs/coredump: prevent fsuid=0 dumps into user-controlled directories 2016-03-22 15:36:02 -07:00
unicore32 Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-20 19:08:56 -07:00
x86 mm: thp: kvm: fix memory corruption in KVM with THP enabled 2016-05-05 17:38:53 -07:00
xtensa Xtensa improvements for 4.6: 2016-03-20 12:22:07 -07:00
.gitignore
Kconfig objtool: Add CONFIG_STACK_VALIDATION option 2016-02-29 08:35:13 +01:00