linux/arch/x86
Aaron Lu 82ba4faca1 x86/irq: Do not substract irq_tlb_count from irq_call_count
Since commit:

  52aec3308d ("x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR")

the TLB remote shootdown is done through call function vector. That
commit didn't take care of irq_tlb_count, which a later commit:

  fd0f586972 ("x86: Distinguish TLB shootdown interrupts from other functions call interrupts")

... tried to fix.

The fix assumes every increase of irq_tlb_count has a corresponding
increase of irq_call_count. So the irq_call_count is always bigger than
irq_tlb_count and we could substract irq_tlb_count from irq_call_count.

Unfortunately this is not true for the smp_call_function_single() case.
The IPI is only sent if the target CPU's call_single_queue is empty when
adding a csd into it in generic_exec_single. That means if two threads
are both adding flush tlb csds to the same CPU's call_single_queue, only
one IPI is sent. In other words, the irq_call_count is incremented by 1
but irq_tlb_count is incremented by 2. Over time, irq_tlb_count will be
bigger than irq_call_count and the substract will produce a very large
irq_call_count value due to overflow.

Considering that:

  1) it's not worth to send more IPIs for the sake of accurate counting of
     irq_call_count in generic_exec_single();

  2) it's not easy to tell if the call function interrupt is for TLB
     shootdown in __smp_call_function_single_interrupt().

Not to exclude TLB shootdown from call function count seems to be the
simplest fix and this patch just does that.

This bug was found by LKP's cyclic performance regression tracking recently
with the vm-scalability test suite. I have bisected to commit:

  3dec0ba0be ("mm/rmap: share the i_mmap_rwsem")

This commit didn't do anything wrong but revealed the irq_call_count
problem. IIUC, the commit makes rwc->remap_one in rmap_walk_file
concurrent with multiple threads.  When remap_one is try_to_unmap_one(),
then multiple threads could queue flush TLB to the same CPU but only
one IPI will be sent.

Since the commit was added in Linux v3.19, the counting problem only
shows up from v3.19 onwards.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
Link: http://lkml.kernel.org/r/20160811074430.GA18163@aaronlu.sh.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-11 11:14:59 +02:00
..
boot Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild 2016-08-02 16:37:12 -04:00
configs arch/defconfig: remove CONFIG_RESOURCE_COUNTERS 2016-05-23 17:04:14 -07:00
crypto x86, crypto: Restore MODULE_LICENSE() to glue_helper.c so it loads 2016-07-20 09:39:50 +02:00
entry x86/build: Reduce the W=1 warnings noise when compiling x86 syscall tables 2016-08-10 16:05:16 +02:00
events Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-08-06 09:10:36 -04:00
ia32 mm: remove more IS_ERR_VALUE abuses 2016-05-27 15:57:31 -07:00
include x86/irq: Do not substract irq_tlb_count from irq_call_count 2016-08-11 11:14:59 +02:00
kernel x86/irq: Do not substract irq_tlb_count from irq_call_count 2016-08-11 11:14:59 +02:00
kvm nvmx: mark ept single context invalidation as supported 2016-08-04 14:21:52 +02:00
lguest lguest: Read offset of device_cap later 2016-06-10 11:39:09 +02:00
lib x86/mm/kaslr: Fix -Wformat-security warning 2016-08-11 10:58:12 +02:00
math-emu
mm x86/mm/KASLR: Increase BRK pages for KASLR memory randomization 2016-08-10 14:45:19 +02:00
net bpf, x86: add support for constant blinding 2016-05-16 13:49:32 -04:00
oprofile x86/cpufeature: Replace cpu_has_apic with boot_cpu_has() usage 2016-04-13 11:37:41 +02:00
pci dma-mapping: use unsigned long for dma_attrs 2016-08-04 08:50:07 -04:00
platform x86/platform/UV: Fix bug with iounmap() of the UV4 EFI System Table causing a crash 2016-08-10 15:55:38 +02:00
power x86/power/64: Do not refer to __PAGE_OFFSET from assembly code 2016-08-03 01:35:38 +02:00
purgatory Add sancov plugin 2016-06-07 22:57:10 +02:00
ras x86/RAS/AMD: Reduce the number of IPIs when prepping error injection 2016-07-08 11:29:26 +02:00
realmode Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild 2016-08-02 16:37:12 -04:00
tools x86/insn: Add AVX-512 support to the instruction decoder 2016-07-21 09:37:11 -03:00
um Merge branch 'for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml 2016-08-04 19:37:59 -04:00
video
xen kexec: allow kdump with crash_kexec_post_notifiers 2016-08-02 19:35:30 -04:00
.gitignore
Kbuild
Kconfig Implements HARDENED_USERCOPY verification of copy_to_user/copy_from_user 2016-08-08 14:48:14 -07:00
Kconfig.cpu
Kconfig.debug
Makefile kbuild: abort build on bad stack protector flag 2016-07-26 16:19:19 -07:00
Makefile_32.cpu
Makefile.um