linux/include/asm-generic
Peter Zijlstra (Intel) 69f9cae909 locking/qspinlock: Optimize for smaller NR_CPUS
When we allow for a max NR_CPUS < 2^14 we can optimize the pending
wait-acquire and the xchg_tail() operations.

By growing the pending bit to a byte, we reduce the tail to 16bit.
This means we can use xchg16 for the tail part and do away with all
the repeated compxchg() operations.

This in turn allows us to unconditionally acquire; the locked state
as observed by the wait loops cannot change. And because both locked
and pending are now a full byte we can use simple stores for the
state transition, obviating one atomic operation entirely.

This optimization is needed to make the qspinlock achieve performance
parity with ticket spinlock at light load.

All this is horribly broken on Alpha pre EV56 (and any other arch that
cannot do single-copy atomic byte stores).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel J Blueman <daniel@numascale.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1429901803-29771-6-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08 12:36:48 +02:00
..
bitops arch: Mass conversion of smp_mb__*() 2014-04-18 14:20:48 +02:00
4level-fixup.h mm, asm-generic: define PUD_SHIFT in <asm-generic/4level-fixup.h> 2015-02-11 17:06:03 -08:00
atomic64.h locking,arch: Rewrite generic atomic support 2014-08-14 12:48:14 +02:00
atomic-long.h
atomic.h locking,arch: Use ACCESS_ONCE() instead of cast to volatile in atomic_read() 2014-10-03 06:06:23 +02:00
audit_change_attr.h audit: Modify a set of system calls in audit class definitions 2014-01-17 17:01:46 -05:00
audit_dir_write.h audit: support the "standard" <asm-generic/unistd.h> 2011-05-04 14:41:28 -04:00
audit_read.h audit: support the "standard" <asm-generic/unistd.h> 2011-05-04 14:41:28 -04:00
audit_signal.h
audit_write.h audit: Modify a set of system calls in audit class definitions 2014-01-17 17:01:46 -05:00
barrier.h arch: Add lightweight memory barriers dma_rmb() and dma_wmb() 2014-12-11 21:15:06 -05:00
bitops.h arch: Prepare for smp_mb__{before,after}_atomic() 2014-04-18 11:40:30 +02:00
bitsperlong.h UAPI: (Scripted) Disintegrate include/asm-generic 2012-10-04 18:20:15 +01:00
bug.h bug: Make BUG() always stop the machine 2014-04-07 16:36:10 -07:00
bugs.h
cache.h
cacheflush.h asm-generic/cacheflush.h: flush icache when copying to user pages 2011-05-25 08:39:37 -07:00
checksum.h asm-generic headers: Allow yet more arch overrides in checksum.h 2013-02-11 20:00:33 +05:30
clkdev.h asm-generic: COMMON_CLK defines __clk_{get,put} 2014-09-25 18:00:45 -07:00
cmpxchg-local.h LLVMLinux: Remove warning about returning an uninitialized variable 2014-04-09 13:44:35 -07:00
cmpxchg.h asm-generic: move cmpxchg*_local defs to cmpxchg.h 2013-03-13 06:11:05 +01:00
cputime_jiffies.h sched, time: Fix build error with 64 bit cputime_t on 32 bit systems 2014-10-03 05:46:55 +02:00
cputime_nsecs.h sched, time: Fix build error with 64 bit cputime_t on 32 bit systems 2014-10-03 05:46:55 +02:00
cputime.h cputime: Generic on-demand virtual cputime accounting 2013-01-27 19:23:27 +01:00
current.h
delay.h asm-generic: delay.h fix udelay and ndelay for 8 bit args 2011-07-22 18:45:33 +02:00
device.h
div64.h
dma-coherent.h DMA-API: Change dma_declare_coherent_memory() CPU address to phys_addr_t 2014-05-20 16:55:23 -06:00
dma-contiguous.h asm-generic: Add dma-contiguous.h 2014-09-22 13:35:51 +02:00
dma-mapping-broken.h asm-generic/dma-mapping-broken.h: Provide dma_alloc_attrs()/dma_free_attrs() 2012-12-25 20:14:54 +01:00
dma-mapping-common.h asm/dma-mapping-common: Clarify output of dma_map_sg_attrs 2015-03-09 13:05:47 +01:00
dma.h
early_ioremap.h mm: create generic early_ioremap() support 2014-04-07 16:36:15 -07:00
emergency-restart.h
exec.h Split arch_align_stack() out from asm-generic/system.h 2012-03-28 18:30:03 +01:00
fb.h
fixmap.h arm64: fixmap: fix missing sub-page offset for earlyprintk 2014-05-03 22:20:31 +01:00
ftrace.h asm-generic headers: add ftrace.h 2011-03-17 09:19:04 +08:00
futex.h asm-generic: add generic futex for !CONFIG_SMP 2014-12-08 12:55:48 +08:00
getorder.h bitops: Add missing parentheses to new get_order macro 2012-02-24 10:39:27 -08:00
gpio.h gpio: move pincontrol calls to <linux/gpio/driver.h> 2015-03-19 09:45:54 +01:00
hardirq.h
hugetlb.h mm: Fix generic hugetlb pte check return type. 2013-10-02 20:02:35 -04:00
hw_irq.h
ide_iops.h
int-ll64.h UAPI: (Scripted) Disintegrate include/asm-generic 2012-10-04 18:20:15 +01:00
io-64-nonatomic-hi-lo.h readq/writeq: Add explicit lo_hi_[read|write]_q and hi_lo_[read|write]_q 2014-07-04 13:27:30 +02:00
io-64-nonatomic-lo-hi.h readq/writeq: Add explicit lo_hi_[read|write]_q and hi_lo_[read|write]_q 2014-07-04 13:27:30 +02:00
io.h Merge branch 'io' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into asm-generic 2014-11-11 19:55:45 +01:00
ioctl.h include/asm-generic/ioctl.h: fix _IOC_TYPECHECK sparse error 2014-06-06 16:08:13 -07:00
iomap.h Kconfig: rename HAS_IOPORT to HAS_IOPORT_MAP 2014-04-07 16:36:11 -07:00
irq_regs.h core: Replace __get_cpu_var with __this_cpu_read if not used for an address. 2010-12-17 15:07:19 +01:00
irq_work.h irq_work: Introduce arch_irq_work_has_interrupt() 2014-09-13 18:38:07 +02:00
irq.h
irqflags.h
Kbuild.asm UAPI: Set up uapi/asm/Kbuild.asm 2012-10-02 18:01:56 +01:00
kdebug.h
kmap_types.h asm-generic: remove km_type definitions 2012-07-24 15:27:30 +08:00
kvm_para.h KVM: add kvm_para_available to asm-generic/kvm_para.h 2013-06-05 13:21:29 +03:00
libata-portmap.h
linkage.h
local64.h atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
local.h atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
mcs_spinlock.h locking/mcs: Allow architecture specific asm files to be used for contended case 2014-02-09 21:18:52 +01:00
memory_model.h __page_to_pfn: Fix typo in comment 2013-10-14 15:28:29 +02:00
mm_hooks.h mm: Make arch_unmap()/bprm_mm_init() available to all architectures 2014-11-19 11:54:13 +01:00
mmu_context.h asm-generic: Remove asm-generic arch_bprm_mm_init() 2014-11-22 21:52:08 +01:00
mmu.h asm-generic/mmu.h: Add support for FDPIC 2012-12-09 23:14:14 +01:00
module.h Make most arch asm/module.h files use asm-generic/module.h 2012-09-28 14:31:03 +09:30
msi.h asm-generic: Add msi.h 2014-11-23 13:01:47 +01:00
mutex-dec.h arch: Make __mutex_fastpath_lock_retval return whether fastpath succeeded or not 2013-06-26 12:10:55 +02:00
mutex-null.h arch: Make __mutex_fastpath_lock_retval return whether fastpath succeeded or not 2013-06-26 12:10:55 +02:00
mutex-xchg.h arch: Make __mutex_fastpath_lock_retval return whether fastpath succeeded or not 2013-06-26 12:10:55 +02:00
mutex.h
page.h The following changes since commit 3ee72ca992 2012-01-10 17:39:40 -08:00
param.h UAPI: (Scripted) Disintegrate include/asm-generic 2012-10-04 18:20:15 +01:00
parport.h include: remove __dev* attributes. 2013-01-03 15:57:16 -08:00
pci_iomap.h pci: add pci_iomap_range 2015-01-21 16:28:49 +10:30
pci-bridge.h PCI: work around Stratus ftServer broken PCIe hierarchy 2012-04-30 15:21:02 -06:00
pci-dma-compat.h pci-dma-compat: add pci_zalloc_consistent helper 2014-08-08 15:57:28 -07:00
pci.h PCI: collapse pcibios_resource_to_bus 2012-02-23 20:19:04 -07:00
percpu.h percpu: preffity percpu header files 2014-06-17 19:12:40 -04:00
pgalloc.h
pgtable-nopmd.h
pgtable-nopud.h
pgtable.h mm: change vunmap to tear down huge KVA mappings 2015-04-14 16:49:04 -07:00
preempt.h sched: Kill task_preempt_count() 2014-10-28 10:47:56 +01:00
ptrace.h asm-generic/ptrace.h: start a common low level ptrace helper 2011-05-26 17:12:36 -07:00
qrwlock_types.h locking/rwlocks: Introduce 'qrwlocks' - fair, queued rwlocks 2014-06-06 07:58:28 +02:00
qrwlock.h locking/rwlocks: Introduce 'qrwlocks' - fair, queued rwlocks 2014-06-06 07:58:28 +02:00
qspinlock_types.h locking/qspinlock: Optimize for smaller NR_CPUS 2015-05-08 12:36:48 +02:00
qspinlock.h locking/qspinlock: Introduce a simple generic 4-byte queued spinlock 2015-05-08 12:36:25 +02:00
resource.h asm-generic: remove _STK_LIM_MAX 2014-05-15 00:32:09 +01:00
rtc.h
rwsem.h asm-generic: rwsem: de-PPCify rwsem.h 2014-03-14 18:02:08 +00:00
scatterlist.h
seccomp.h seccomp: allow COMPAT sigreturn overrides 2015-04-17 09:04:09 -04:00
sections.h nosave: consolidate __nosave_{begin,end} in <asm/sections.h> 2014-10-09 22:26:04 -04:00
segment.h
serial.h
siginfo.h constify copy_siginfo_to_user{,32}() 2013-11-09 00:16:29 -05:00
signal.h unify default ptrace_signal_deliver 2012-11-29 00:01:23 -05:00
simd.h crypto: create generic version of ablk_helper 2013-09-24 06:02:24 +10:00
sizes.h ARM: 7430/1: sizes.h: move from asm-generic to <linux/sizes.h> 2012-06-28 17:14:34 +01:00
spinlock.h
statfs.h UAPI: (Scripted) Disintegrate include/asm-generic 2012-10-04 18:20:15 +01:00
string.h
switch_to.h Split the switch_to() wrapper out of asm-generic/system.h 2012-03-28 18:30:03 +01:00
syscall.h syscall.h: fix doc text for syscall_get_arch() 2014-09-23 16:20:00 -04:00
syscalls.h burying unused conditionals 2013-02-14 09:21:15 -05:00
termios-base.h
termios.h UAPI: (Scripted) Disintegrate include/asm-generic 2012-10-04 18:20:15 +01:00
timex.h
tlb.h mm: mmu_gather: use tlb->end != 0 only for TLB invalidation 2015-01-13 15:20:40 +13:00
tlbflush.h BUG: headers with BUG/BUG_ON etc. need linux/bug.h 2012-03-04 17:54:34 -05:00
topology.h
trace_clock.h tracing,x86: Add a TSC trace_clock 2012-11-13 15:48:27 -05:00
uaccess-unaligned.h
uaccess.h asm-generic: uaccess: Spelling s/a ny/any/ 2014-01-02 10:45:23 +01:00
unaligned.h asm-generic: allow generic unaligned access if the arch supports it 2014-05-08 10:22:23 +02:00
unistd.h We get rid of the general module prefix confusion with a binary config option, 2013-05-05 10:58:06 -07:00
user.h asm-generic/user.h: Fix spelling in comment 2011-03-01 15:49:39 +01:00
vga.h
vmlinux.lds.h TTY/Serial patches for 4.1-rc1 2015-04-21 09:33:10 -07:00
vtime.h include/asm-generic/vtime.h: avoid zero-length file 2013-09-30 14:31:02 -07:00
word-at-a-time.h word-at-a-time: simplify big-endian zero_bytemask macro 2014-05-01 08:57:44 -07:00
xor.h asm-generic: xor: mark static functions as __maybe_unused 2012-10-03 21:21:06 +02:00