Commit Graph

5 Commits

Author SHA1 Message Date
Vitaly Kuznetsov
9e52fc2b50 x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)
There's a subtle bug in how some of the paravirt guest code handles
page table freeing on x86:

On x86 software page table walkers depend on the fact that remote TLB flush
does an IPI: walk is performed lockless but with interrupts disabled and in
case the page table is freed the freeing CPU will get blocked as remote TLB
flush is required. On other architectures which don't require an IPI to do
remote TLB flush we have an RCU-based mechanism (see
include/asm-generic/tlb.h for more details).

In virtualized environments we may want to override the ->flush_tlb_others
callback in pv_mmu_ops and use a hypercall asking the hypervisor to do a
remote TLB flush for us. This breaks the assumption about IPIs. Xen PV has
been doing this for years and the upcoming remote TLB flush for Hyper-V will
do it too.

This is not safe, as software page table walkers may step on an already
freed page.

Fix the bug by enabling the RCU-based page table freeing mechanism,
CONFIG_HAVE_RCU_TABLE_FREE=y.

Testing with kernbench and mmap/munmap microbenchmarks, and neither showed
any noticeable performance impact.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: KY Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170828082251.5562-1-vkuznets@redhat.com
[ Rewrote/fixed/clarified the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-31 11:07:07 +02:00
Dave Hansen
1de14c3c5c x86-32: Fix possible incomplete TLB invalidate with PAE pagetables
This patch attempts to fix:

	https://bugzilla.kernel.org/show_bug.cgi?id=56461

The symptom is a crash and messages like this:

	chrome: Corrupted page table at address 34a03000
	*pdpt = 0000000000000000 *pde = 0000000000000000
	Bad pagetable: 000f [#1] PREEMPT SMP

Ingo guesses this got introduced by commit 611ae8e3f5 ("x86/tlb:
enable tlb flush range support for x86") since that code started to free
unused pagetables.

On x86-32 PAE kernels, that new code has the potential to free an entire
PMD page and will clear one of the four page-directory-pointer-table
(aka pgd_t entries).

The hardware aggressively "caches" these top-level entries and invlpg
does not actually affect the CPU's copy.  If we clear one we *HAVE* to
do a full TLB flush, otherwise we might continue using a freed pmd page.
(note, we do this properly on the population side in pud_populate()).

This patch tracks whenever we clear one of these entries in the 'struct
mmu_gather', and ensures that we follow up with a full tlb flush.

BTW, I disassembled and checked that:

	if (tlb->fullmm == 0)
and
	if (!tlb->fullmm && !tlb->need_flush_all)

generate essentially the same code, so there should be zero impact there
to the !PAE case.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Artem S Tashkinov <t.artem@mailcity.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-12 16:56:47 -07:00
Alex Shi
611ae8e3f5 x86/tlb: enable tlb flush range support for x86
Not every tlb_flush execution moment is really need to evacuate all
TLB entries, like in munmap, just few 'invlpg' is better for whole
process performance, since it leaves most of TLB entries for later
accessing.

This patch also rewrite flush_tlb_range for 2 purposes:
1, split it out to get flush_blt_mm_range function.
2, clean up to reduce line breaking, thanks for Borislav's input.

My micro benchmark 'mummap' http://lkml.org/lkml/2012/5/17/59
show that the random memory access on other CPU has 0~50% speed up
on a 2P * 4cores * HT NHM EP while do 'munmap'.

Thanks Yongjie's testing on this patch:
-------------
I used Linux 3.4-RC6 w/ and w/o his patches as Xen dom0 and guest
kernel.
After running two benchmarks in Xen HVM guest, I found his patches
brought about 1%~3% performance gain in 'kernel build' and 'netperf'
testing, though the performance gain was not very stable in 'kernel
build' testing.

Some detailed testing results are below.

Testing Environment:
	Hardware: Romley-EP platform
	Xen version: latest upstream
	Linux kernel: 3.4-RC6
	Guest vCPU number: 8
	NIC: Intel 82599 (10GB bandwidth)

In 'kernel build' testing in guest:
	Command line  |  performance gain
    make -j 4      |    3.81%
    make -j 8      |    0.37%
    make -j 16     |    -0.52%

In 'netperf' testing, we tested TCP_STREAM with default socket size
16384 byte as large packet and 64 byte as small packet.
I used several clients to add networking pressure, then 'netperf' server
automatically generated several threads to response them.
I also used large-size packet and small-size packet in the testing.
	Packet size  |  Thread number | performance gain
	16384 bytes  |      4       |   0.02%
	16384 bytes  |      8       |   2.21%
	16384 bytes  |      16      |   2.04%
	64 bytes     |      4       |   1.07%
	64 bytes     |      8       |   3.31%
	64 bytes     |      16      |   0.71%

Signed-off-by: Alex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-8-git-send-email-alex.shi@intel.com
Tested-by: Ren, Yongjie <yongjie.ren@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-27 19:29:11 -07:00
H. Peter Anvin
1965aae3c9 x86: Fix ASM_X86__ header guards
Change header guards named "ASM_X86__*" to "_ASM_X86_*" since:

a. the double underscore is ugly and pointless.
b. no leading underscore violates namespace constraints.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-10-22 22:55:23 -07:00
Al Viro
bb8985586b x86, um: ... and asm-x86 move
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-10-22 22:55:20 -07:00