linux/arch
Anton Blanchard ae01f84b93 powerpc: Optimise per cpu accesses on 64bit
Now we dynamically allocate the paca array, it takes an extra load
whenever we want to access another cpu's paca. One place we do that a lot
is per cpu variables. A simple example:

DEFINE_PER_CPU(unsigned long, vara);
unsigned long test4(int cpu)
{
	return per_cpu(vara, cpu);
}

This takes 4 loads, 5 if you include the actual load of the per cpu variable:

    ld r11,-32760(r30)  # load address of paca pointer
    ld r9,-32768(r30)   # load link address of percpu variable
    sldi r3,r29,9       # get offset into paca (each entry is 512 bytes)
    ld r0,0(r11)        # load paca pointer
    add r3,r0,r3        # paca + offset
    ld r11,64(r3)       # load paca[cpu].data_offset

    ldx r3,r9,r11       # load per cpu variable

If we remove the ppc64 specific per_cpu_offset(), we get the generic one
which indexes into a statically allocated array. This removes one load and
one add:

    ld r11,-32760(r30)  # load address of __per_cpu_offset
    ld r9,-32768(r30)   # load link address of percpu variable
    sldi r3,r29,3       # get offset into __per_cpu_offset (each entry 8 bytes)
    ldx r11,r11,r3      # load __per_cpu_offset[cpu]

    ldx r3,r9,r11       # load per cpu variable

Having all the offsets in one array also helps when iterating over a per cpu
variable across a number of cpus, such as in the scheduler. Before we would
need to load one paca cacheline when calculating each per cpu offset. Now we
have 16 (128 / sizeof(long)) per cpu offsets in each cacheline.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-09 11:28:30 +10:00
..
alpha alpha: Detect Super IO chip, no IDE on Avanti, enable EPP19 2010-06-15 14:19:08 -04:00
arm ARM: SAMSUNG: Fix on wrong function name for S5PV210 sdhci0 2010-07-05 16:01:04 +09:00
avr32 avr32: use asm-generic/scatterlist.h 2010-05-27 09:12:54 -07:00
blackfin blackfin: use use asm-generic/scatterlist.h 2010-05-27 09:12:55 -07:00
cris Merge branch 'for-linus' of git://www.jni.nu/cris 2010-06-01 08:51:25 -07:00
frv FRV: Reinstate null behaviour for the GDB remote protocol 'p' command 2010-06-09 12:42:44 -07:00
h8300 Merge branch 'for-35' of git://repo.or.cz/linux-kbuild 2010-06-01 08:55:52 -07:00
ia64 [IA64] Fix spinaphore down_spin() 2010-06-30 10:46:16 -07:00
m32r m32r: invoke oom-killer from page fault 2010-06-04 15:21:44 -07:00
m68k Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k 2010-05-27 10:19:19 -07:00
m68knommu Merge branch 'for-35' of git://repo.or.cz/linux-kbuild 2010-06-01 08:55:52 -07:00
microblaze Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 2010-06-11 14:15:44 -07:00
mips MIPS: Return after handling coprocessor 2 exception 2010-07-05 17:17:33 +01:00
mn10300 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 2010-06-11 14:15:44 -07:00
parisc Merge branch 'for-35' of git://repo.or.cz/linux-kbuild 2010-06-01 08:55:52 -07:00
powerpc powerpc: Optimise per cpu accesses on 64bit 2010-07-09 11:28:30 +10:00
s390 [S390] Update default configuration. 2010-06-08 18:58:23 +02:00
score asm-generic: remove ISA_DMA_THRESHOLD in scatterlist.h 2010-05-27 09:12:54 -07:00
sh arch/sh/mm: Eliminate a double lock 2010-06-21 13:46:53 +09:00
sparc Merge branch 'for-35' of git://repo.or.cz/linux-kbuild 2010-06-01 08:55:52 -07:00
um um: os-linux/mem.c needs sys/stat.h 2010-06-29 15:29:32 -07:00
x86 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-07-06 17:16:09 -07:00
xtensa xtensa: invoke oom-killer from page fault 2010-06-04 15:21:44 -07:00
.gitignore
Kconfig hw-breakpoints: Separate constraint space for data and instruction breakpoints 2010-05-01 04:32:11 +02:00