linux/kernel/locking
Pan Xinhui 5aff60a191 locking/osq: Break out of spin-wait busy waiting loop for a preempted vCPU in osq_lock()
An over-committed guest with more vCPUs than pCPUs has a heavy overload
in osq_lock().

This is because if vCPU-A holds the osq lock and yields out, vCPU-B ends
up waiting for per_cpu node->locked to be set. IOW, vCPU-B waits for
vCPU-A to run and unlock the osq lock.

Use the new vcpu_is_preempted(cpu) interface to detect if a vCPU is
currently running or not, and break out of the spin-loop if so.

test case:

 $ perf record -a perf bench sched messaging -g 400 -p && perf report

 before patch:
 18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
 12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
  5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
  3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
  3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
  3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
  2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call

 after patch:
 20.68%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner
  8.45%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
  4.12%  sched-messaging  [kernel.vmlinux]  [k] system_call
  3.01%  sched-messaging  [kernel.vmlinux]  [k] system_call_common
  2.83%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
  2.64%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
  2.00%  sched-messaging  [kernel.vmlinux]  [k] osq_lock

Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Tested-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: benh@kernel.crashing.org
Cc: bsingharora@gmail.com
Cc: dave@stgolabs.net
Cc: kernellwp@gmail.com
Cc: konrad.wilk@oracle.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: mpe@ellerman.id.au
Cc: paulmck@linux.vnet.ibm.com
Cc: paulus@samba.org
Cc: rkrcmar@redhat.com
Cc: virtualization@lists.linux-foundation.org
Cc: will.deacon@arm.com
Cc: xen-devel-request@lists.xenproject.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1478077718-37424-3-git-send-email-xinhui.pan@linux.vnet.ibm.com
[ Translated to English. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-22 12:48:10 +01:00
..
lockdep_internals.h lockdep: Limit static allocations if PROVE_LOCKING_SMALL is defined 2016-11-18 11:33:19 -08:00
lockdep_proc.c lockdep: Fix lock_chain::base size 2016-04-23 13:53:03 +02:00
lockdep_states.h
lockdep.c locking/lockdep: Remove unused parameter from the add_lock_to_list() function 2016-11-11 08:25:20 +01:00
locktorture.c lcoking/locktorture: Simplify the torture_runnable computation 2016-04-28 10:57:51 +02:00
Makefile locking/lglock: Remove lglock implementation 2016-09-22 15:25:56 +02:00
mcs_spinlock.h locking/core: Remove cpu_relax_lowlatency() users 2016-11-16 10:15:10 +01:00
mutex-debug.c locking/mutex: Rework mutex::owner 2016-10-25 11:31:50 +02:00
mutex-debug.h locking/mutex: Rework mutex::owner 2016-10-25 11:31:50 +02:00
mutex.c sched/wake_q: Rename WAKE_Q to DEFINE_WAKE_Q 2016-11-21 10:29:01 +01:00
mutex.h locking/mutex: Rework mutex::owner 2016-10-25 11:31:50 +02:00
osq_lock.c locking/osq: Break out of spin-wait busy waiting loop for a preempted vCPU in osq_lock() 2016-11-22 12:48:10 +01:00
percpu-rwsem.c locking/percpu-rwsem: Optimize readers and reduce global impact 2016-08-10 14:34:01 +02:00
qrwlock.c locking/core: Remove cpu_relax_lowlatency() users 2016-11-16 10:15:10 +01:00
qspinlock_paravirt.h locking/pv-qspinlock: Use cmpxchg_release() in __pv_queued_spin_unlock() 2016-09-22 15:25:51 +02:00
qspinlock_stat.h locking/pvstat: Separate wait_again and spurious wakeup stats 2016-08-10 14:16:02 +02:00
qspinlock.c locking/qspinlock: Use __this_cpu_dec() instead of full-blown this_cpu_dec() 2016-06-27 11:37:41 +02:00
rtmutex_common.h rtmutex: Delete scriptable tester 2015-07-20 11:45:45 +02:00
rtmutex-debug.c rtmutex: Cleanup deadlock detector debug logic 2014-06-21 22:05:30 +02:00
rtmutex-debug.h rtmutex: Cleanup deadlock detector debug logic 2014-06-21 22:05:30 +02:00
rtmutex.c sched/wake_q: Rename WAKE_Q to DEFINE_WAKE_Q 2016-11-21 10:29:01 +01:00
rtmutex.h rtmutex: Cleanup deadlock detector debug logic 2014-06-21 22:05:30 +02:00
rwsem-spinlock.c locking/rwsem: Introduce basis for down_write_killable() 2016-04-13 10:42:20 +02:00
rwsem-xadd.c sched/wake_q: Rename WAKE_Q to DEFINE_WAKE_Q 2016-11-21 10:29:01 +01:00
rwsem.c locking/rwsem: Add reader-owned state to the owner field 2016-06-08 15:16:59 +02:00
rwsem.h locking/rwsem: Protect all writes to owner by WRITE_ONCE() 2016-06-08 15:16:59 +02:00
semaphore.c locking/semaphore: Resolve some shadow warnings 2014-09-04 07:17:24 +02:00
spinlock_debug.c locking: Move the spinlock code to kernel/locking/ 2013-11-06 07:55:21 +01:00
spinlock.c spinlock: Add spin_lock_bh_nested() 2015-01-03 14:32:57 -05:00