linux/kernel/locking
Waiman Long f5bfdc8e39 locking/osq: Use optimized spinning loop for arm64
Arm64 has a more optimized spinning loop (atomic_cond_read_acquire)
using wfe for spinlock that can boost performance of sibling threads
by putting the current cpu to a wait state that is broken only when
the monitored variable changes or an external event happens.

OSQ has a more complicated spinning loop. Besides the lock value, it
also checks for need_resched() and vcpu_is_preempted(). The check for
need_resched() is not a problem as it is only set by the tick interrupt
handler. That will be detected by the spinning cpu right after iret.

The vcpu_is_preempted() check, however, is a problem as changes to the
preempt state of of previous node will not affect the wait state. For
ARM64, vcpu_is_preempted is not currently defined and so is a no-op.
Will has indicated that he is planning to para-virtualize wfe instead
of defining vcpu_is_preempted for PV support. So just add a comment in
arch/arm64/include/asm/spinlock.h to indicate that vcpu_is_preempted()
should not be defined as suggested.

On a 2-socket 56-core 224-thread ARM64 system, a kernel mutex locking
microbenchmark was run for 10s with and without the patch. The
performance numbers before patch were:

Running locktest with mutex [runtime = 10s, load = 1]
Threads = 224, Min/Mean/Max = 316/123,143/2,121,269
Threads = 224, Total Rate = 2,757 kop/s; Percpu Rate = 12 kop/s

After patch, the numbers were:

Running locktest with mutex [runtime = 10s, load = 1]
Threads = 224, Min/Mean/Max = 334/147,836/1,304,787
Threads = 224, Total Rate = 3,311 kop/s; Percpu Rate = 15 kop/s

So there was about 20% performance improvement.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/20200113150735.21956-1-longman@redhat.com
2020-01-17 10:19:30 +01:00
..
lock_events_list.h locking/rwsem: Adaptive disabling of reader optimistic spinning 2019-06-17 12:28:09 +02:00
lock_events.c locking/lock_events: Don't show pvqspinlock events on bare metal 2019-04-10 10:56:05 +02:00
lock_events.h locking/lock_events: Use raw_cpu_{add,inc}() for stats 2019-06-03 12:32:56 +02:00
lockdep_internals.h locking/lockdep: Report more stack trace statistics 2019-07-25 15:43:28 +02:00
lockdep_proc.c locking/lockdep: Fix lockdep_stats indentation problem 2020-01-17 10:19:30 +01:00
lockdep_states.h
lockdep.c locking/lockdep: Update the comment for __lock_release() 2019-11-13 11:07:48 +01:00
locktorture.c locking: locktorture: Do not include rwlock.h directly 2019-10-05 11:50:24 -07:00
Makefile locking/rwsem: Merge rwsem.h and rwsem-xadd.c into rwsem.c 2019-06-17 12:27:57 +02:00
mcs_spinlock.h locking/mcs: Use smp_cond_load_acquire() in MCS spin loop 2018-04-27 09:48:49 +02:00
mutex-debug.c locking/mutex: Replace spin_is_locked() with lockdep 2018-11-12 09:06:22 -08:00
mutex-debug.h
mutex.c Revert "locking/mutex: Complain upon mutex API misuse in IRQ contexts" 2019-12-11 00:27:43 +01:00
mutex.h mutex: Fix up mutex_waiter usage 2019-08-08 09:09:25 +02:00
osq_lock.c locking/osq: Use optimized spinning loop for arm64 2020-01-17 10:19:30 +01:00
percpu-rwsem.c Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu 2019-06-28 19:46:47 +02:00
qrwlock.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157 2019-05-30 11:26:37 -07:00
qspinlock_paravirt.h Revert "locking/pvqspinlock: Don't wait if vCPU is preempted" 2019-09-25 10:22:37 +02:00
qspinlock_stat.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157 2019-05-30 11:26:37 -07:00
qspinlock.c locking/qspinlock: Fix inaccessible URL of MCS lock paper 2020-01-17 10:19:30 +01:00
rtmutex_common.h
rtmutex-debug.c
rtmutex-debug.h
rtmutex.c locking/lockdep: Remove unused @nested argument from lock_release() 2019-10-09 12:46:10 +02:00
rtmutex.h
rwsem.c locking/lockdep: Remove unused @nested argument from lock_release() 2019-10-09 12:46:10 +02:00
rwsem.h locking/rwsem: Merge rwsem.h and rwsem-xadd.c into rwsem.c 2019-06-17 12:27:57 +02:00
semaphore.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 436 2019-06-05 17:37:17 +02:00
spinlock_debug.c locking/spinlock/debug: Fix various data races 2019-11-29 08:03:27 +01:00
spinlock.c asm-generic/mmiowb: Add generic implementation of mmiowb() tracking 2019-04-08 11:59:39 +01:00
test-ww_mutex.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 9 2019-05-21 11:28:40 +02:00