94b1b03b51
x86's lazy TLB mode used to be fairly weak -- it would switch to
init_mm the first time it tried to flush a lazy TLB. This meant an
unnecessary CR3 write and, if the flush was remote, an unnecessary
IPI.
Rewrite it entirely. When we enter lazy mode, we simply remove the
CPU from mm_cpumask. This means that we need a way to figure out
whether we've missed a flush when we switch back out of lazy mode.
I use the tlb_gen machinery to track whether a context is up to
date.
Note to reviewers: this patch, my itself, looks a bit odd. I'm
using an array of length 1 containing (ctx_id, tlb_gen) rather than
just storing tlb_gen, and making it at array isn't necessary yet.
I'm doing this because the next few patches add PCID support, and,
with PCID, we need ctx_id, and the array will end up with a length
greater than 1. Making it an array now means that there will be
less churn and therefore less stress on your eyeballs.
NB: This is dubious but, AFAICT, still correct on Xen and UV.
xen_exit_mmap() uses mm_cpumask() for nefarious purposes and this
patch changes the way that mm_cpumask() works. This should be okay,
since Xen *also* iterates all online CPUs to find all the CPUs it
needs to twiddle.
The UV tlbflush code is rather dated and should be changed.
Here are some benchmark results, done on a Skylake laptop at 2.3 GHz
(turbo off, intel_pstate requesting max performance) under KVM with
the guest using idle=poll (to avoid artifacts when bouncing between
CPUs). I haven't done any real statistics here -- I just ran them
in a loop and picked the fastest results that didn't look like
outliers. Unpatched means commit
|
||
---|---|---|
.. | ||
kmemcheck | ||
amdtopology.c | ||
debug_pagetables.c | ||
dump_pagetables.c | ||
extable.c | ||
fault.c | ||
highmem_32.c | ||
hugetlbpage.c | ||
ident_map.c | ||
init_32.c | ||
init_64.c | ||
init.c | ||
iomap_32.c | ||
ioremap.c | ||
kasan_init_64.c | ||
kaslr.c | ||
kmmio.c | ||
Makefile | ||
mm_internal.h | ||
mmap.c | ||
mmio-mod.c | ||
mpx.c | ||
numa_32.c | ||
numa_64.c | ||
numa_emulation.c | ||
numa_internal.h | ||
numa.c | ||
pageattr-test.c | ||
pageattr.c | ||
pat_internal.h | ||
pat_rbtree.c | ||
pat.c | ||
pf_in.c | ||
pf_in.h | ||
pgtable_32.c | ||
pgtable.c | ||
physaddr.c | ||
physaddr.h | ||
pkeys.c | ||
setup_nx.c | ||
srat.c | ||
testmmiotrace.c | ||
tlb.c |