linux/include
Waiman Long 364f784f04 locking/rwsem: Optimize rwsem structure for uncontended lock acquisition
For an uncontended rwsem, count and owner are the only fields a task
needs to touch when acquiring the rwsem. So they are put next to each
other to increase the chance that they will share the same cacheline.

On a ThunderX2 99xx (arm64) system with 32K L1 cache and 256K L2
cache, a rwsem locking microbenchmark with one locking thread was
run to write-lock and write-unlock an array of rwsems separated 2
cachelines apart in a 1M byte memory block. The locking rates (kops/s)
of the microbenchmark when the rwsems are at various "long" (8-byte)
offsets from beginning of the cacheline before and after the patch were
as follows:

  Cacheline Offset   Pre-patch    Post-patch
  ----------------   ---------    ----------
        0             17,449        16,588
        1             17,450        16,465
	2             17,450        16,460
	3             17,453        16,462
	4             14,867        16,471
	5             14,867        16,470
	6             14,853        16,464
	7             14,867        13,172

Before the patch, the count and owner are 4 "long"s apart. After the
patch, they are only 1 "long" apart.

The rwsem data have to be loaded from the L3 cache for each access. It
can be seen that the locking rates are more consistent after the patch
than before. Note that for this particular system, the performance
drop happens whenever the count and owner are at an odd multiples of
"long"s apart. No performance drop was observed when only a single rwsem
was used (hot cache). So the drop is likely just an idiosyncrasy of the
cache architecture of this chip than an inherent problem with the patch.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/20190404174320.22416-12-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-10 10:56:06 +02:00
..
acpi ACPI: use different default debug value than ACPICA 2019-03-25 10:45:59 +01:00
asm-generic Merge branch 'linus' into locking/core, to pick up fixes 2019-04-10 09:14:42 +02:00
clocksource
crypto
drm drm i915, amdgpu, qxl and etnaviv fixes 2019-03-15 13:58:35 -07:00
dt-bindings dt-bindings: reset: meson-g12a: Add missing USB2 PHY resets 2019-03-25 16:22:10 +01:00
keys KEYS: trusted: fix -Wvarags warning 2019-04-08 15:58:54 -07:00
kvm ARM: some cleanups, direct physical timer assignment, cache sanitization 2019-03-15 15:00:28 -07:00
linux locking/rwsem: Optimize rwsem structure for uncontended lock acquisition 2019-04-10 10:56:06 +02:00
math-emu
media
memory
misc auxdisplay: charlcd: Introduce charlcd_free() helper 2019-03-17 08:48:16 +01:00
net nfc: nci: Potential off by one in ->pipes[] array 2019-04-06 15:05:07 -07:00
pcmcia
ras
rdma
scsi
soc IOMMU Updates for Linux v5.1 2019-03-10 12:29:52 -07:00
sound sound fixes for 5.1-rc1 2019-03-15 14:05:00 -07:00
target
trace syscalls: Remove start and number from syscall_get_arguments() args 2019-04-05 09:26:43 -04:00
uapi ethtool: avoid signed-unsigned comparison in ethtool_validate_speed() 2019-04-08 16:30:43 -07:00
video media updates for v5.1-rc1 2019-03-09 14:45:54 -08:00
xen