linux/arch/x86/kernel
Raghavendra K T d6abfdb202 x86/spinlocks/paravirt: Fix memory corruption on unlock
Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:

                prev = *lock;
                add_smp(&lock->tickets.head, TICKET_LOCK_INC);

                /* add_smp() is a full mb() */

                if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
                        __ticket_unlock_slowpath(lock, prev);

which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the "add_smp()" and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:

 1. Moving slowpath flag to head (Oleg):
    Unlocked locks don't care about the slowpath flag; therefore we can keep
    it set after the last unlock, and clear it again on the first (try)lock.
    -- this removes the write after unlock. note that keeping slowpath flag would
    result in unnecessary kicks.
    By moving the slowpath flag from the tail to the head ticket we also avoid
    the need to access both the head and tail tickets on unlock.

 2. use xadd to avoid read/write after unlock that checks the need for
    unlock_kick (Linus):
    We further avoid the need for a read-after-release by using xadd;
    the prev head value will include the slowpath flag and indicate if we
    need to do PV kicking of suspended spinners -- on modern chips xadd
    isn't (much) more expensive than an add + load.

Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
 benchmark overcommit %improve
 kernbench  1x           -0.13
 kernbench  2x            0.02
 dbench     1x           -1.77
 dbench     2x           -0.63

[Jeremy: Hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moved slowpath flag to head, ticket_equals idea]
[PeterZ: Added detailed changelog]

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Jones <drjones@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Fernando Luis Vázquez Cao <fernando_b1@lab.ntt.co.jp>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Waiman Long <Waiman.Long@hp.com>
Cc: a.ryabinin@samsung.com
Cc: dave@stgolabs.net
Cc: hpa@zytor.com
Cc: jasowang@redhat.com
Cc: jeremy@goop.org
Cc: paul.gortmaker@windriver.com
Cc: riel@redhat.com
Cc: tglx@linutronix.de
Cc: waiman.long@hp.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20150215173043.GA7471@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18 14:53:49 +01:00
..
acpi Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-02-09 16:57:56 -08:00
apic x86: Consolidate boot cpu timer setup 2015-01-22 15:10:56 +01:00
cpu Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-02-09 18:22:04 -08:00
kprobes ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracing 2015-01-15 09:39:18 -05:00
.gitignore
alternative.c kprobes, x86: Allow kprobes on text_poke/hw_breakpoint 2014-04-24 10:03:02 +02:00
amd_gart_64.c x86: enable DMA CMA with swiotlb 2014-06-04 16:53:57 -07:00
amd_nb.c x86, amd_nb: Add device IDs to NB tables for F15h M60h 2014-10-20 14:18:45 +02:00
apb_timer.c x86/platform: Remove unused function from apb_timer.c 2014-12-23 10:43:35 +01:00
aperture_64.c x86/gart: Tidy messages and add bridge device info 2014-05-23 10:47:19 -06:00
apm_32.c cpuidle: Invert CPUIDLE_FLAG_TIME_VALID logic 2014-11-12 21:17:27 +01:00
asm-offsets_32.c x86/asm: Guard against building the 32/64-bit versions of the asm-offsets*.c file directly 2014-12-11 11:43:56 +01:00
asm-offsets_64.c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-12-14 11:51:50 -08:00
asm-offsets.c sched, x86: Provide a per-cpu preempt_count implementation 2013-09-25 14:07:57 +02:00
audit_64.c x86: hook up execveat system call 2014-12-13 12:42:51 -08:00
bootflag.c
check.c x86/mm: memblock: switch to use NUMA_NO_NODE 2014-01-21 16:19:47 -08:00
cpuid.c x86, cpuid: Use PTR_ERR_OR_ZERO 2014-10-17 13:40:51 -07:00
crash_dump_32.c
crash_dump_64.c
crash.c x86, irq: Move IOAPIC related declarations from hw_irq.h into io_apic.h 2014-12-16 14:08:17 +01:00
devicetree.c x86, irq, devicetree: Release IOAPIC pin when PCI device is disabled 2014-06-21 23:05:44 +02:00
doublefault.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
dumpstack_32.c Merge branch 'x86-threadinfo-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-01 10:17:18 -07:00
dumpstack_64.c x86_64, traps: Stop using IST for #SS 2014-11-23 13:56:19 -08:00
dumpstack.c kprobes, x86: Use NOKPROBE_SYMBOL() instead of __kprobes annotation 2014-04-24 10:26:38 +02:00
e820.c x86, e820: Clean up sanitize_e820_map() users 2015-01-23 16:14:27 +01:00
early_printk.c Merge branch 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-12 11:12:22 +09:00
early-quirks.c drm/i915/skl: Add the additional graphics stolen sizes 2014-09-24 14:47:39 +02:00
entry_32.S Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-12-19 14:02:02 -08:00
entry_64.S x86_64, entry: Remove the syscall exit audit and schedule optimizations 2015-02-01 04:03:02 -08:00
espfix_64.c x86, espfix: Remove stale ptemask 2014-11-11 17:57:46 +01:00
ftrace.c module: remove mod arg from module_free, rename module_memfree(). 2015-01-20 11:38:33 +10:30
head32.c asmlinkage, x86: Add explicit __visible to arch/x86/* 2014-05-05 16:07:44 -07:00
head64.c kernel/printk: use symbolic defines for console loglevels 2014-06-04 16:54:17 -07:00
head_32.S x86: fix compile error due to X86_TRAP_NMI use in asm files 2014-03-07 18:58:40 -08:00
head_64.S x86: fix compile error due to X86_TRAP_NMI use in asm files 2014-03-07 18:58:40 -08:00
head.c x86: Make sure we can boot in the case the BDA contains pure garbage 2013-02-27 13:38:57 -08:00
hpet.c Merge branch 'x86/vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next 2014-06-05 08:05:29 -07:00
hw_breakpoint.c perf/x86: Remove get_hbp_len and replace with bp_len 2014-12-03 15:14:30 +01:00
i386_ksyms_32.c sched, x86: Optimize the preempt_schedule() call 2013-09-25 14:23:07 +02:00
i387.c x86, fpu: Fix math_state_restore() race with kernel_fpu_begin() 2015-01-20 13:53:07 +01:00
i8237.c
i8253.c
i8259.c x86/irq: Fix XT-PIC-XT-PIC in /proc/interrupts 2014-10-28 12:01:08 +01:00
io_delay.c
ioport.c x86: get rid of pt_regs argument of iopl(2) 2013-02-03 18:16:24 -05:00
iosf_mbi.c x86/platform/intel/iosf: Add debugfs config option for IOSF 2014-09-19 13:08:43 +02:00
irq_32.c x86: Clean up current_stack_pointer 2015-01-02 10:22:46 -08:00
irq_64.c x86: Replace __get_cpu_var uses 2014-08-26 13:45:49 -04:00
irq_work.c x86: Tell irq work about self IPI support 2014-09-13 18:38:29 +02:00
irq.c x86, irq: Properly tag virtualization entry in /proc/interrupts 2015-01-20 12:37:23 +01:00
irqinit.c x86, irq: Move local APIC related code from io_apic.c into vector.c 2014-12-16 14:08:16 +01:00
jump_label.c x86/jump_label: expect default_nop if static_key gets enabled on boot-up 2013-10-19 19:45:35 -04:00
kdebugfs.c
kexec-bzimage64.c kexec-bzimage64: fix sparse warnings 2014-10-14 02:18:21 +02:00
kgdb.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
ksysfs.c x86: ksysfs.c build fix 2014-01-03 14:37:13 +00:00
kvm.c x86/spinlocks/paravirt: Fix memory corruption on unlock 2015-02-18 14:53:49 +01:00
kvmclock.c x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit 2014-12-10 12:49:39 +01:00
ldt.c Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option" 2014-05-21 10:22:59 -07:00
machine_kexec_32.c x86, irq: Move IOAPIC related declarations from hw_irq.h into io_apic.h 2014-12-16 14:08:17 +01:00
machine_kexec_64.c x86, irq: Move IOAPIC related declarations from hw_irq.h into io_apic.h 2014-12-16 14:08:17 +01:00
Makefile x86_64,vsyscall: Make vsyscall emulation configurable 2014-11-03 21:44:57 +01:00
mcount_64.S ftrace/fgraph/x86: Have prepare_ftrace_return() take ip as first parameter 2014-12-01 14:08:58 -05:00
mmconf-fam10h_64.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
module.c x86, kaslr: fix module lock ordering problem 2014-03-24 10:18:26 -07:00
mpparse.c x86, apic: Remove mps_oem_check callback 2014-07-31 08:05:42 -07:00
msr.c x86, msr: Use seek definitions instead of hard-coded values 2014-10-17 13:40:55 -07:00
nmi_selftest.c
nmi.c kprobes, x86: Use NOKPROBE_SYMBOL() instead of __kprobes annotation 2014-04-24 10:26:38 +02:00
paravirt_patch_32.c
paravirt_patch_64.c x86_64/entry/xen: Do not invoke espfix64 on Xen 2014-07-28 15:25:40 -07:00
paravirt-spinlocks.c x86, ticketlock: Add slowpath logic 2013-08-09 07:54:00 -07:00
paravirt.c kprobes, x86: Prohibit probing on native_set_debugreg()/load_idt() 2014-04-24 10:02:58 +02:00
pci-calgary_64.c x86, calgary: Use 8M TCE table size by default 2014-04-10 19:51:32 -07:00
pci-dma.c arch/x86/kernel/pci-dma.c: fix dma_generic_alloc_coherent() when CONFIG_DMA_CMA is enabled 2014-06-04 16:53:57 -07:00
pci-iommu_table.c
pci-nommu.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
pci-swiotlb.c x86: enable DMA CMA with swiotlb 2014-06-04 16:53:57 -07:00
pcspeaker.c
perf_regs.c perf/x86_64: Improve user regs sampling 2015-01-09 11:12:29 +01:00
pmc_atom.c x86: pmc_atom: Expose contents of PSS 2015-01-20 12:50:14 +01:00
probe_roms.c
process_32.c x86: copy_thread: Don't nullify ->ptrace_bps twice 2014-09-02 14:51:17 -07:00
process_64.c x86_64, switch_to(): Load TLS descriptors before switching DS and ES 2014-12-11 11:40:08 +01:00
process.c x86, fpu: Shift "fpu_counter = 0" from copy_thread() to arch_dup_task_struct() 2014-09-02 14:51:16 -07:00
ptrace.c x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1 2014-11-20 23:01:53 +01:00
pvclock.c hung_task: add method to reset detector 2013-11-06 09:49:02 +02:00
quirks.c x86: HPET force enable for e6xx based systems 2014-09-15 17:53:35 -07:00
reboot_fixups_32.c
reboot.c x86, irq: Move IOAPIC related declarations from hw_irq.h into io_apic.h 2014-12-16 14:08:17 +01:00
relocate_kernel_32.S x86, asm, cleanup: Replace open-coded control register values with symbolic 2013-06-25 16:26:06 -07:00
relocate_kernel_64.S x86, reloc: Use xorl instead of xorq in relocate_kernel_64.S 2013-06-20 21:30:04 -07:00
resource.c x86: don't exclude low BIOS area when allocating address space for non-PCI cards 2014-07-16 12:29:36 -06:00
rtc.c x86/rtc: Remove duplicate const specifier 2015-01-23 10:35:51 +01:00
setup_percpu.c x86: Convert a few more per-CPU items to read-mostly ones 2014-11-04 20:13:28 +01:00
setup.c x86, setup: Let early_memremap() handle page alignment 2015-01-23 16:14:26 +01:00
signal.c x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks 2015-01-07 07:47:42 -08:00
smp.c asmlinkage, x86: Add explicit __visible to arch/x86/* 2014-05-05 16:07:44 -07:00
smpboot.c x86: Consolidate boot cpu timer setup 2015-01-22 15:10:56 +01:00
stacktrace.c
step.c ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL 2013-01-22 10:08:00 -08:00
sys_x86_64.c x86 get_unmapped_area: Access mmap_legacy_base through mm_struct member 2013-08-22 10:19:35 -07:00
syscall_32.c x86, asmlinkage: Make syscall tables visible 2013-08-06 14:20:18 -07:00
syscall_64.c x86, asmlinkage: Make syscall tables visible 2013-08-06 14:20:18 -07:00
sysfb_efi.c x86: sysfb: move EFI quirks from efifb to sysfb 2013-08-02 16:17:47 -07:00
sysfb_simplefb.c x86/simplefb: Use PTR_ERR_OR_ZERO 2014-10-17 13:40:52 -07:00
sysfb.c x86/sysfb: Use PTR_ERR_OR_ZERO 2014-10-17 13:40:52 -07:00
tboot.c x86 / tboot / ACPI: Fail extended mode reduced hardware sleep 2013-07-31 14:25:51 +02:00
tce_64.c
test_nx.c
test_rodata.c
time.c x86_64/vdso: Remove jiffies from the vvar page 2014-10-28 11:22:13 +01:00
tls.c x86, tls: Interpret an all-zero struct user_desc as "no segment" 2015-01-22 21:45:07 +01:00
tls.h
topology.c hotplug, powerpc, x86: Remove cpu_hotplug_driver_lock() 2013-09-30 19:55:51 +02:00
trace_clock.c tracing,x86: Add a TSC trace_clock 2012-11-13 15:48:27 -05:00
tracepoint.c x86: Make sure IDT is page aligned 2013-07-16 15:14:48 -07:00
traps.c Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-02-09 18:01:52 -08:00
tsc_msr.c x86: tsc: Add missing Baytrail frequency to the table 2014-02-19 17:12:24 +01:00
tsc_sync.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
tsc.c x86/tsc: Change Fast TSC calibration failed from error to info 2015-01-23 10:53:52 +01:00
uprobes.c x86: Remove arbitrary instruction size limit in instruction decoder 2014-11-18 00:58:52 +01:00
verify_cpu.S
vm86_32.c x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...) 2013-05-02 20:36:32 -04:00
vmlinux.lds.S x86-64: Use RIP-relative addressing for most per-CPU accesses 2014-11-04 20:43:14 +01:00
vsmp_64.c x86/apic/vsmp: Make is_vsmp_box() static 2014-08-01 15:09:45 -07:00
vsyscall_64.c x86_64/vsyscall: Restore orig_ax after vsyscall seccomp 2014-11-10 10:46:35 +01:00
vsyscall_emu_64.S
vsyscall_gtod.c timekeeping: Create struct tk_read_base and use it in struct timekeeper 2014-07-23 15:01:53 -07:00
vsyscall_trace.h
x86_init.c Revert "PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()" 2014-11-11 15:14:30 -07:00
x8664_ksyms_64.c sched, x86: Optimize the preempt_schedule() call 2013-09-25 14:23:07 +02:00
xsave.c x86: export get_xsave_addr 2014-12-05 13:55:44 +01:00