linux/include/asm-generic
Sebastian Andrzej Siewior cede88418b locking/rtmutex: Drop usage of __HAVE_ARCH_CMPXCHG
The rtmutex code is the only user of __HAVE_ARCH_CMPXCHG and we have a few
other user of cmpxchg() which do not care about __HAVE_ARCH_CMPXCHG. This
define was first introduced in 23f78d4a0 ("[PATCH] pi-futex: rt mutex core")
which is v2.6.18. The generic cmpxchg was introduced later in 068fbad288
("Add cmpxchg_local to asm-generic for per cpu atomic operations") which is
v2.6.25.
Back then something was required to get rtmutex working with the fast
path on architectures without cmpxchg and this seems to be the result.

It popped up recently on rt-users because ARM (v6+) does not define
__HAVE_ARCH_CMPXCHG (even that it implements it) which results in slower
locking performance in the fast path.
To put some numbers on it: preempt -RT, am335x, 10 loops of
100000 invocations of rt_spin_lock() + rt_spin_unlock() (time "total" is
the average of the 10 loops for the 100000 invocations, "loop" is
"total / 100000 * 1000"):

     cmpxchg |    slowpath used  ||    cmpxchg used
             |   total   | loop  ||   total    | loop
     --------|-----------|-------||------------|-------
     ARMv6   | 9129.4 us | 91 ns ||  3311.9 us |  33 ns
     generic | 9360.2 us | 94 ns || 10834.6 us | 108 ns
     ----------------------------||--------------------

Forcing it to generic cmpxchg() made things worse for the slowpath and
even worse in cmpxchg() path. It boils down to 14ns more per lock+unlock
in a cache hot loop so it might not be that much in real world.
The last test was a substitute for pre ARMv6 machine but then I was able
to perform the comparison on imx28 which is ARMv5 and therefore is
always is using the generic cmpxchg implementation. And the numbers:

              |   total     | loop
     -------- |-----------  |--------
     slowpath | 263937.2 us | 2639 ns
     cmpxchg  |  16934.2 us |  169 ns
     --------------------------------

The numbers are larger since the machine is slower in general. However,
letting rtmutex use cmpxchg() instead the slowpath seem to improve things.

Since from the ARM (tested on am335x + imx28) point of view always
using cmpxchg() in rt_mutex_lock() + rt_mutex_unlock() makes sense I
would drop the define.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: will.deacon@arm.com
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20150225175613.GE6823@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-13 10:51:28 +02:00
..
bitops arch: Mass conversion of smp_mb__*() 2014-04-18 14:20:48 +02:00
4level-fixup.h mm, asm-generic: define PUD_SHIFT in <asm-generic/4level-fixup.h> 2015-02-11 17:06:03 -08:00
atomic64.h locking,arch: Rewrite generic atomic support 2014-08-14 12:48:14 +02:00
atomic-long.h
atomic.h locking,arch: Use ACCESS_ONCE() instead of cast to volatile in atomic_read() 2014-10-03 06:06:23 +02:00
audit_change_attr.h audit: Modify a set of system calls in audit class definitions 2014-01-17 17:01:46 -05:00
audit_dir_write.h
audit_read.h
audit_signal.h
audit_write.h audit: Modify a set of system calls in audit class definitions 2014-01-17 17:01:46 -05:00
barrier.h arch: Add lightweight memory barriers dma_rmb() and dma_wmb() 2014-12-11 21:15:06 -05:00
bitops.h arch: Prepare for smp_mb__{before,after}_atomic() 2014-04-18 11:40:30 +02:00
bitsperlong.h UAPI: (Scripted) Disintegrate include/asm-generic 2012-10-04 18:20:15 +01:00
bug.h bug: Make BUG() always stop the machine 2014-04-07 16:36:10 -07:00
bugs.h
cache.h
cacheflush.h
checksum.h asm-generic headers: Allow yet more arch overrides in checksum.h 2013-02-11 20:00:33 +05:30
clkdev.h asm-generic: COMMON_CLK defines __clk_{get,put} 2014-09-25 18:00:45 -07:00
cmpxchg-local.h LLVMLinux: Remove warning about returning an uninitialized variable 2014-04-09 13:44:35 -07:00
cmpxchg.h locking/rtmutex: Drop usage of __HAVE_ARCH_CMPXCHG 2015-05-13 10:51:28 +02:00
cputime_jiffies.h sched, time: Fix build error with 64 bit cputime_t on 32 bit systems 2014-10-03 05:46:55 +02:00
cputime_nsecs.h sched, time: Fix build error with 64 bit cputime_t on 32 bit systems 2014-10-03 05:46:55 +02:00
cputime.h cputime: Generic on-demand virtual cputime accounting 2013-01-27 19:23:27 +01:00
current.h
delay.h
device.h
div64.h
dma-coherent.h DMA-API: Change dma_declare_coherent_memory() CPU address to phys_addr_t 2014-05-20 16:55:23 -06:00
dma-contiguous.h asm-generic: Add dma-contiguous.h 2014-09-22 13:35:51 +02:00
dma-mapping-broken.h asm-generic/dma-mapping-broken.h: Provide dma_alloc_attrs()/dma_free_attrs() 2012-12-25 20:14:54 +01:00
dma-mapping-common.h asm/dma-mapping-common: Clarify output of dma_map_sg_attrs 2015-03-09 13:05:47 +01:00
dma.h
early_ioremap.h mm: create generic early_ioremap() support 2014-04-07 16:36:15 -07:00
emergency-restart.h
exec.h
fb.h
fixmap.h arm64: fixmap: fix missing sub-page offset for earlyprintk 2014-05-03 22:20:31 +01:00
ftrace.h
futex.h asm-generic: add generic futex for !CONFIG_SMP 2014-12-08 12:55:48 +08:00
getorder.h
gpio.h gpio: move pincontrol calls to <linux/gpio/driver.h> 2015-03-19 09:45:54 +01:00
hardirq.h
hugetlb.h mm: Fix generic hugetlb pte check return type. 2013-10-02 20:02:35 -04:00
hw_irq.h
ide_iops.h
int-ll64.h UAPI: (Scripted) Disintegrate include/asm-generic 2012-10-04 18:20:15 +01:00
io-64-nonatomic-hi-lo.h readq/writeq: Add explicit lo_hi_[read|write]_q and hi_lo_[read|write]_q 2014-07-04 13:27:30 +02:00
io-64-nonatomic-lo-hi.h readq/writeq: Add explicit lo_hi_[read|write]_q and hi_lo_[read|write]_q 2014-07-04 13:27:30 +02:00
io.h Merge branch 'io' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into asm-generic 2014-11-11 19:55:45 +01:00
ioctl.h include/asm-generic/ioctl.h: fix _IOC_TYPECHECK sparse error 2014-06-06 16:08:13 -07:00
iomap.h Kconfig: rename HAS_IOPORT to HAS_IOPORT_MAP 2014-04-07 16:36:11 -07:00
irq_regs.h
irq_work.h irq_work: Introduce arch_irq_work_has_interrupt() 2014-09-13 18:38:07 +02:00
irq.h
irqflags.h
Kbuild.asm UAPI: Set up uapi/asm/Kbuild.asm 2012-10-02 18:01:56 +01:00
kdebug.h
kmap_types.h asm-generic: remove km_type definitions 2012-07-24 15:27:30 +08:00
kvm_para.h KVM: add kvm_para_available to asm-generic/kvm_para.h 2013-06-05 13:21:29 +03:00
libata-portmap.h
linkage.h
local64.h
local.h
mcs_spinlock.h locking/mcs: Allow architecture specific asm files to be used for contended case 2014-02-09 21:18:52 +01:00
memory_model.h __page_to_pfn: Fix typo in comment 2013-10-14 15:28:29 +02:00
mm_hooks.h mm: Make arch_unmap()/bprm_mm_init() available to all architectures 2014-11-19 11:54:13 +01:00
mmu_context.h asm-generic: Remove asm-generic arch_bprm_mm_init() 2014-11-22 21:52:08 +01:00
mmu.h asm-generic/mmu.h: Add support for FDPIC 2012-12-09 23:14:14 +01:00
module.h Make most arch asm/module.h files use asm-generic/module.h 2012-09-28 14:31:03 +09:30
msi.h asm-generic: Add msi.h 2014-11-23 13:01:47 +01:00
mutex-dec.h arch: Make __mutex_fastpath_lock_retval return whether fastpath succeeded or not 2013-06-26 12:10:55 +02:00
mutex-null.h arch: Make __mutex_fastpath_lock_retval return whether fastpath succeeded or not 2013-06-26 12:10:55 +02:00
mutex-xchg.h arch: Make __mutex_fastpath_lock_retval return whether fastpath succeeded or not 2013-06-26 12:10:55 +02:00
mutex.h
page.h
param.h UAPI: (Scripted) Disintegrate include/asm-generic 2012-10-04 18:20:15 +01:00
parport.h include: remove __dev* attributes. 2013-01-03 15:57:16 -08:00
pci_iomap.h pci: add pci_iomap_range 2015-01-21 16:28:49 +10:30
pci-bridge.h
pci-dma-compat.h pci-dma-compat: add pci_zalloc_consistent helper 2014-08-08 15:57:28 -07:00
pci.h
percpu.h percpu: preffity percpu header files 2014-06-17 19:12:40 -04:00
pgalloc.h
pgtable-nopmd.h
pgtable-nopud.h
pgtable.h mm: change vunmap to tear down huge KVA mappings 2015-04-14 16:49:04 -07:00
preempt.h sched: Kill task_preempt_count() 2014-10-28 10:47:56 +01:00
ptrace.h
qrwlock_types.h locking/rwlocks: Introduce 'qrwlocks' - fair, queued rwlocks 2014-06-06 07:58:28 +02:00
qrwlock.h locking/rwlocks: Introduce 'qrwlocks' - fair, queued rwlocks 2014-06-06 07:58:28 +02:00
qspinlock_types.h locking/qspinlock: Optimize for smaller NR_CPUS 2015-05-08 12:36:48 +02:00
qspinlock.h locking/qspinlock: Revert to test-and-set on hypervisors 2015-05-08 12:36:58 +02:00
resource.h asm-generic: remove _STK_LIM_MAX 2014-05-15 00:32:09 +01:00
rtc.h
rwsem.h asm-generic: rwsem: de-PPCify rwsem.h 2014-03-14 18:02:08 +00:00
scatterlist.h
seccomp.h seccomp: allow COMPAT sigreturn overrides 2015-04-17 09:04:09 -04:00
sections.h nosave: consolidate __nosave_{begin,end} in <asm/sections.h> 2014-10-09 22:26:04 -04:00
segment.h
serial.h
siginfo.h constify copy_siginfo_to_user{,32}() 2013-11-09 00:16:29 -05:00
signal.h unify default ptrace_signal_deliver 2012-11-29 00:01:23 -05:00
simd.h crypto: create generic version of ablk_helper 2013-09-24 06:02:24 +10:00
sizes.h ARM: 7430/1: sizes.h: move from asm-generic to <linux/sizes.h> 2012-06-28 17:14:34 +01:00
spinlock.h
statfs.h UAPI: (Scripted) Disintegrate include/asm-generic 2012-10-04 18:20:15 +01:00
string.h
switch_to.h
syscall.h syscall.h: fix doc text for syscall_get_arch() 2014-09-23 16:20:00 -04:00
syscalls.h burying unused conditionals 2013-02-14 09:21:15 -05:00
termios-base.h
termios.h UAPI: (Scripted) Disintegrate include/asm-generic 2012-10-04 18:20:15 +01:00
timex.h
tlb.h mm: mmu_gather: use tlb->end != 0 only for TLB invalidation 2015-01-13 15:20:40 +13:00
tlbflush.h
topology.h
trace_clock.h tracing,x86: Add a TSC trace_clock 2012-11-13 15:48:27 -05:00
uaccess-unaligned.h
uaccess.h asm-generic: uaccess: Spelling s/a ny/any/ 2014-01-02 10:45:23 +01:00
unaligned.h asm-generic: allow generic unaligned access if the arch supports it 2014-05-08 10:22:23 +02:00
unistd.h We get rid of the general module prefix confusion with a binary config option, 2013-05-05 10:58:06 -07:00
user.h
vga.h
vmlinux.lds.h TTY/Serial patches for 4.1-rc1 2015-04-21 09:33:10 -07:00
vtime.h include/asm-generic/vtime.h: avoid zero-length file 2013-09-30 14:31:02 -07:00
word-at-a-time.h word-at-a-time: simplify big-endian zero_bytemask macro 2014-05-01 08:57:44 -07:00
xor.h asm-generic: xor: mark static functions as __maybe_unused 2012-10-03 21:21:06 +02:00