5cfd92e12e
Reader optimistic spinning is helpful when the reader critical section is short and there aren't that many readers around. It makes readers relatively more preferred than writers. When a writer times out spinning on a reader-owned lock and set the nospinnable bits, there are two main reasons for that. 1) The reader critical section is long, perhaps the task sleeps after acquiring the read lock. 2) There are just too many readers contending the lock causing it to take a while to service all of them. In the former case, long reader critical section will impede the progress of writers which is usually more important for system performance. In the later case, reader optimistic spinning tends to make the reader groups that contain readers that acquire the lock together smaller leading to more of them. That may hurt performance in some cases. In other words, the setting of nonspinnable bits indicates that reader optimistic spinning may not be helpful for those workloads that cause it. Therefore, any writers that have observed the setting of the writer nonspinnable bit for a given rwsem after they fail to acquire the lock via optimistic spinning will set the reader nonspinnable bit once they acquire the write lock. Similarly, readers that observe the setting of reader nonspinnable bit at slowpath entry will also set the reader nonspinnable bit when they acquire the read lock via the wakeup path. Once the reader nonspinnable bit is on, it will only be reset when a writer is able to acquire the rwsem in the fast path or somehow a reader or writer in the slowpath doesn't observe the nonspinable bit. This is to discourage reader optmistic spinning on that particular rwsem and make writers more preferred. This adaptive disabling of reader optimistic spinning will alleviate some of the negative side effect of this feature. In addition, this patch tries to make readers in the spinning queue follow the phase-fair principle after quitting optimistic spinning by checking if another reader has somehow acquired a read lock after this reader enters the optimistic spinning queue. If so and the rwsem is still reader-owned, this reader is in the right read-phase and can attempt to acquire the lock. On a 2-socket 40-core 80-thread Skylake system, the page_fault1 test of the will-it-scale benchmark was run with various number of threads. The number of operations done before reader optimistic spinning patches, this patch and after this patch were: Threads Before rspin Before patch After patch %change ------- ------------ ------------ ----------- ------- 20 55410685345484
5455667 -3.5%/ +2.1% 40 10185150 7292313 9219276 -28.5%/+26.4% 60 81967336460517
7181209 -21.2%/+11.2% 80 9508864 6739559 8107025 -29.1%/+20.3% This patch doesn't recover all the lost performance, but it is more than half. Given the fact that reader optimistic spinning does benefit some workloads, this is a good compromise. Using the rwsem locking microbenchmark with very short critical section, this patch doesn't have too much impact on locking performance as shown by the locking rates (kops/s) below with equal numbers of readers and writers before and after this patch: # of Threads Pre-patch Post-patch ------------ --------- ---------- 2 4,730 4,969 4 4,814 4,786 8 4,866 4,815 16 4,715 4,511 32 3,338 3,500 64 3,212 3,389 80 3,110 3,044 When running the locking microbenchmark with 40 dedicated reader and writer threads, however, the reader performance is curtailed to favor the writer. Before patch: 40 readers, Iterations Min/Mean/Max = 204,026/234,309/254,816 40 writers, Iterations Min/Mean/Max = 88,515/95,884/115,644 After patch: 40 readers, Iterations Min/Mean/Max = 33,813/35,260/36,791 40 writers, Iterations Min/Mean/Max = 95,368/96,565/97,798 Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: huang ying <huang.ying.caritas@gmail.com> Link: https://lkml.kernel.org/r/20190520205918.22251-16-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
72 lines
3.2 KiB
C
72 lines
3.2 KiB
C
/* SPDX-License-Identifier: GPL-2.0 */
|
|
/*
|
|
* This program is free software; you can redistribute it and/or modify
|
|
* it under the terms of the GNU General Public License as published by
|
|
* the Free Software Foundation; either version 2 of the License, or
|
|
* (at your option) any later version.
|
|
*
|
|
* This program is distributed in the hope that it will be useful,
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
* GNU General Public License for more details.
|
|
*
|
|
* Authors: Waiman Long <longman@redhat.com>
|
|
*/
|
|
|
|
#ifndef LOCK_EVENT
|
|
#define LOCK_EVENT(name) LOCKEVENT_ ## name,
|
|
#endif
|
|
|
|
#ifdef CONFIG_QUEUED_SPINLOCKS
|
|
#ifdef CONFIG_PARAVIRT_SPINLOCKS
|
|
/*
|
|
* Locking events for PV qspinlock.
|
|
*/
|
|
LOCK_EVENT(pv_hash_hops) /* Average # of hops per hashing operation */
|
|
LOCK_EVENT(pv_kick_unlock) /* # of vCPU kicks issued at unlock time */
|
|
LOCK_EVENT(pv_kick_wake) /* # of vCPU kicks for pv_latency_wake */
|
|
LOCK_EVENT(pv_latency_kick) /* Average latency (ns) of vCPU kick */
|
|
LOCK_EVENT(pv_latency_wake) /* Average latency (ns) of kick-to-wakeup */
|
|
LOCK_EVENT(pv_lock_stealing) /* # of lock stealing operations */
|
|
LOCK_EVENT(pv_spurious_wakeup) /* # of spurious wakeups in non-head vCPUs */
|
|
LOCK_EVENT(pv_wait_again) /* # of wait's after queue head vCPU kick */
|
|
LOCK_EVENT(pv_wait_early) /* # of early vCPU wait's */
|
|
LOCK_EVENT(pv_wait_head) /* # of vCPU wait's at the queue head */
|
|
LOCK_EVENT(pv_wait_node) /* # of vCPU wait's at non-head queue node */
|
|
#endif /* CONFIG_PARAVIRT_SPINLOCKS */
|
|
|
|
/*
|
|
* Locking events for qspinlock
|
|
*
|
|
* Subtracting lock_use_node[234] from lock_slowpath will give you
|
|
* lock_use_node1.
|
|
*/
|
|
LOCK_EVENT(lock_pending) /* # of locking ops via pending code */
|
|
LOCK_EVENT(lock_slowpath) /* # of locking ops via MCS lock queue */
|
|
LOCK_EVENT(lock_use_node2) /* # of locking ops that use 2nd percpu node */
|
|
LOCK_EVENT(lock_use_node3) /* # of locking ops that use 3rd percpu node */
|
|
LOCK_EVENT(lock_use_node4) /* # of locking ops that use 4th percpu node */
|
|
LOCK_EVENT(lock_no_node) /* # of locking ops w/o using percpu node */
|
|
#endif /* CONFIG_QUEUED_SPINLOCKS */
|
|
|
|
/*
|
|
* Locking events for rwsem
|
|
*/
|
|
LOCK_EVENT(rwsem_sleep_reader) /* # of reader sleeps */
|
|
LOCK_EVENT(rwsem_sleep_writer) /* # of writer sleeps */
|
|
LOCK_EVENT(rwsem_wake_reader) /* # of reader wakeups */
|
|
LOCK_EVENT(rwsem_wake_writer) /* # of writer wakeups */
|
|
LOCK_EVENT(rwsem_opt_rlock) /* # of opt-acquired read locks */
|
|
LOCK_EVENT(rwsem_opt_wlock) /* # of opt-acquired write locks */
|
|
LOCK_EVENT(rwsem_opt_fail) /* # of failed optspins */
|
|
LOCK_EVENT(rwsem_opt_nospin) /* # of disabled optspins */
|
|
LOCK_EVENT(rwsem_opt_norspin) /* # of disabled reader-only optspins */
|
|
LOCK_EVENT(rwsem_opt_rlock2) /* # of opt-acquired 2ndary read locks */
|
|
LOCK_EVENT(rwsem_rlock) /* # of read locks acquired */
|
|
LOCK_EVENT(rwsem_rlock_fast) /* # of fast read locks acquired */
|
|
LOCK_EVENT(rwsem_rlock_fail) /* # of failed read lock acquisitions */
|
|
LOCK_EVENT(rwsem_rlock_handoff) /* # of read lock handoffs */
|
|
LOCK_EVENT(rwsem_wlock) /* # of write locks acquired */
|
|
LOCK_EVENT(rwsem_wlock_fail) /* # of failed write lock acquisitions */
|
|
LOCK_EVENT(rwsem_wlock_handoff) /* # of write lock handoffs */
|