__my_cpu_offset is non-volatile, since we want its value to be cached
when we access several per-cpu variables in a row with preemption
disabled. This means that we rely on preempt_{en,dis}able to hazard
with the operation via the barrier() macro, so that we can't end up
migrating CPUs without reloading the per-cpu offset.
Unfortunately, GCC doesn't treat a "memory" clobber on a non-volatile
asm block as a side-effect, and will happily re-order it before other
memory clobbers (including those in prempt_disable()) and cache the
value. This has been observed to break the cmpxchg logic in the slub
allocator, leading to livelock in kmem_cache_alloc in mainline kernels.
This patch adds a dummy memory input operand to __my_cpu_offset,
forcing it to be ordered with respect to the barrier() macro.
Cc: <stable@vger.kernel.org>
Cc: Rob Herring <rob.herring@calxeda.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Use the previously unused TPIDRPRW register to store percpu offsets.
TPIDRPRW is only accessible in PL1, so it can only be used in the kernel.
This replaces 2 loads with a mrc instruction for each percpu variable
access. With hackbench, the performance improvement is 1.4% on Cortex-A9
(highbank). Taking an average of 30 runs of "hackbench -l 1000" yields:
Before: 6.2191
After: 6.1348
Will Deacon reported similar delta on v6 with 11MPCore.
The asm "memory clobber" are needed here to ensure the percpu offset
gets reloaded. Testing by Will found that this would not happen in
__schedule() which is a bit of a special case as preemption is disabled
but the execution can move cores.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
With d8ecc5c (kbuild: asm-generic support, 2011-04-27) we can
remove a handful of asm-generic wrappers in ARM code. Since the
generic version of sizes.h doesn't contain SZ_48M, we replace
the 4 users of SZ_48M with the equivalent SZ_32M + SZ_16M.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Imre Kaloz <kaloz@openwrt.org>
Acked-by: Krzysztof Halasa <khc@pm.waw.pl>
Cc: Eric Miao <eric.y.miao@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Move platform independent header files to arch/arm/include/asm, leaving
those in asm/arch* and asm/plat* alone.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>