linux/kernel/locking
Waiman Long 19c5d690e4 locking/rwsem: Add reader-owned state to the owner field
Currently, it is not possible to determine for sure if a reader
owns a rwsem by looking at the content of the rwsem data structure.
This patch adds a new state RWSEM_READER_OWNED to the owner field
to indicate that readers currently own the lock. This enables us to
address the following 2 issues in the rwsem optimistic spinning code:

 1) rwsem_can_spin_on_owner() will disallow optimistic spinning if
    the owner field is NULL which can mean either the readers own
    the lock or the owning writer hasn't set the owner field yet.
    In the latter case, we miss the chance to do optimistic spinning.

 2) While a writer is waiting in the OSQ and a reader takes the lock,
    the writer will continue to spin when out of the OSQ in the main
    rwsem_optimistic_spin() loop as the owner field is NULL wasting
    CPU cycles if some of readers are sleeping.

Adding the new state will allow optimistic spinning to go forward as
long as the owner field is not RWSEM_READER_OWNED and the owner is
running, if set, but stop immediately when that state has been reached.

On a 4-socket Haswell machine running on a 4.6-rc1 based kernel, the
fio test with multithreaded randrw and randwrite tests on the same
file on a XFS partition on top of a NVDIMM were run, the aggregated
bandwidths before and after the patch were as follows:

  Test      BW before patch     BW after patch  % change
  ----      ---------------     --------------  --------
  randrw         988 MB/s          1192 MB/s      +21%
  randwrite     1513 MB/s          1623 MB/s      +7.3%

The perf profile of the rwsem_down_write_failed() function in randrw
before and after the patch were:

   19.95%  5.88%  fio  [kernel.vmlinux]  [k] rwsem_down_write_failed
   14.20%  1.52%  fio  [kernel.vmlinux]  [k] rwsem_down_write_failed

The actual CPU cycles spend in rwsem_down_write_failed() dropped from
5.88% to 1.52% after the patch.

The xfstests was also run and no regression was observed.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Jason Low <jason.low2@hp.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1463534783-38814-2-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-08 15:16:59 +02:00
..
lglock.c sched/stop_machine: Fix deadlock between multiple stop_two_cpus() 2015-06-19 10:03:12 +02:00
lockdep_internals.h lockdep: Increase static allocations 2014-04-18 14:20:50 +02:00
lockdep_proc.c lockdep: Fix lock_chain::base size 2016-04-23 13:53:03 +02:00
lockdep_states.h
lockdep.c locking/lockdep: Use __jhash_mix() for iterate_chain_key() 2016-06-08 14:22:00 +02:00
locktorture.c lcoking/locktorture: Simplify the torture_runnable computation 2016-04-28 10:57:51 +02:00
Makefile kernel: add kcov code coverage 2016-03-22 15:36:02 -07:00
mcs_spinlock.h locking/mcs: Fix mcs_spin_lock() ordering 2016-02-29 10:02:41 +01:00
mutex-debug.c mutex: Always clear owner field upon mutex_unlock() 2015-01-09 11:20:39 +01:00
mutex-debug.h locking/mutex: Set and clear owner using WRITE_ONCE() 2016-06-03 12:06:10 +02:00
mutex.c locking/ww_mutex: Report recursive ww_mutex locking early 2016-06-03 08:37:26 +02:00
mutex.h locking/mutex: Set and clear owner using WRITE_ONCE() 2016-06-03 12:06:10 +02:00
osq_lock.c locking/osq: Fix ordering of node initialisation in osq_lock 2015-12-17 11:40:29 -08:00
percpu-rwsem.c ext4: fix races between changing inode journal mode and ext4_writepages 2016-04-25 23:22:35 -04:00
qrwlock.c locking/qrwlock: Rename ->lock to ->wait_lock 2015-09-18 09:27:29 +02:00
qspinlock_paravirt.h locking/pvqspinlock: Enable slowpath locking count tracking 2016-02-29 10:02:42 +01:00
qspinlock_stat.h locking/pvqspinlock: Robustify init_qspinlock_stat() 2016-05-05 09:58:51 +02:00
qspinlock.c locking/qspinlock: Add comments 2016-06-08 14:44:01 +02:00
rtmutex_common.h rtmutex: Delete scriptable tester 2015-07-20 11:45:45 +02:00
rtmutex-debug.c rtmutex: Cleanup deadlock detector debug logic 2014-06-21 22:05:30 +02:00
rtmutex-debug.h rtmutex: Cleanup deadlock detector debug logic 2014-06-21 22:05:30 +02:00
rtmutex.c locking/rtmutex: Only warn once on a trylock from bad context 2016-06-08 14:22:00 +02:00
rtmutex.h rtmutex: Cleanup deadlock detector debug logic 2014-06-21 22:05:30 +02:00
rwsem-spinlock.c locking/rwsem: Introduce basis for down_write_killable() 2016-04-13 10:42:20 +02:00
rwsem-xadd.c locking/rwsem: Add reader-owned state to the owner field 2016-06-08 15:16:59 +02:00
rwsem.c locking/rwsem: Add reader-owned state to the owner field 2016-06-08 15:16:59 +02:00
rwsem.h locking/rwsem: Add reader-owned state to the owner field 2016-06-08 15:16:59 +02:00
semaphore.c locking/semaphore: Resolve some shadow warnings 2014-09-04 07:17:24 +02:00
spinlock_debug.c locking: Move the spinlock code to kernel/locking/ 2013-11-06 07:55:21 +01:00
spinlock.c spinlock: Add spin_lock_bh_nested() 2015-01-03 14:32:57 -05:00