linux/arch/arm/include/asm
Rob Herring d25c881aa3 ARM: 7493/1: use generic unaligned.h
This moves ARM over to the asm-generic/unaligned.h header. This has the
benefit of better code generated especially for ARMv7 on gcc 4.7+
compilers.

As Arnd Bergmann, points out: The asm-generic version uses the "struct"
version for native-endian unaligned access and the "byteshift" version
for the opposite endianess. The current ARM version however uses the
"byteshift" implementation for both.

Thanks to Nicolas Pitre for the excellent analysis:

Test case:

int foo (int *x) { return get_unaligned(x); }
long long bar (long long *x) { return get_unaligned(x); }

With the current ARM version:

foo:
	ldrb	r3, [r0, #2]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
	ldrb	r1, [r0, #1]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
	ldrb	r2, [r0, #0]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
	mov	r3, r3, asl #16	@ tmp154, MEM[(const u8 *)x_1(D) + 2B],
	ldrb	r0, [r0, #3]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
	orr	r3, r3, r1, asl #8	@, tmp155, tmp154, MEM[(const u8 *)x_1(D) + 1B],
	orr	r3, r3, r2	@ tmp157, tmp155, MEM[(const u8 *)x_1(D)]
	orr	r0, r3, r0, asl #24	@,, tmp157, MEM[(const u8 *)x_1(D) + 3B],
	bx	lr	@

bar:
	stmfd	sp!, {r4, r5, r6, r7}	@,
	mov	r2, #0	@ tmp184,
	ldrb	r5, [r0, #6]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 6B], MEM[(const u8 *)x_1(D) + 6B]
	ldrb	r4, [r0, #5]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 5B], MEM[(const u8 *)x_1(D) + 5B]
	ldrb	ip, [r0, #2]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
	ldrb	r1, [r0, #4]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 4B], MEM[(const u8 *)x_1(D) + 4B]
	mov	r5, r5, asl #16	@ tmp175, MEM[(const u8 *)x_1(D) + 6B],
	ldrb	r7, [r0, #1]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
	orr	r5, r5, r4, asl #8	@, tmp176, tmp175, MEM[(const u8 *)x_1(D) + 5B],
	ldrb	r6, [r0, #7]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 7B], MEM[(const u8 *)x_1(D) + 7B]
	orr	r5, r5, r1	@ tmp178, tmp176, MEM[(const u8 *)x_1(D) + 4B]
	ldrb	r4, [r0, #0]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
	mov	ip, ip, asl #16	@ tmp188, MEM[(const u8 *)x_1(D) + 2B],
	ldrb	r1, [r0, #3]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
	orr	ip, ip, r7, asl #8	@, tmp189, tmp188, MEM[(const u8 *)x_1(D) + 1B],
	orr	r3, r5, r6, asl #24	@,, tmp178, MEM[(const u8 *)x_1(D) + 7B],
	orr	ip, ip, r4	@ tmp191, tmp189, MEM[(const u8 *)x_1(D)]
	orr	ip, ip, r1, asl #24	@, tmp194, tmp191, MEM[(const u8 *)x_1(D) + 3B],
	mov	r1, r3	@,
	orr	r0, r2, ip	@ tmp171, tmp184, tmp194
	ldmfd	sp!, {r4, r5, r6, r7}
	bx	lr

In both cases the code is slightly suboptimal.  One may wonder why
wasting r2 with the constant 0 in the second case for example.  And all
the mov's could be folded in subsequent orr's, etc.

Now with the asm-generic version:

foo:
	ldr	r0, [r0, #0]	@ unaligned	@,* x
	bx	lr	@

bar:
	mov	r3, r0	@ x, x
	ldr	r0, [r0, #0]	@ unaligned	@,* x
	ldr	r1, [r3, #4]	@ unaligned	@,
	bx	lr	@

This is way better of course, but only because this was compiled for
ARMv7. In this case the compiler knows that the hardware can do
unaligned word access.  This isn't that obvious for foo(), but if we
remove the get_unaligned() from bar as follows:

long long bar (long long *x) {return *x; }

then the resulting code is:

bar:
	ldmia	r0, {r0, r1}	@ x,,
	bx	lr	@

So this proves that the presumed aligned vs unaligned cases does have
influence on the instructions the compiler may use and that the above
unaligned code results are not just an accident.

Still... this isn't fully conclusive without at least looking at the
resulting assembly fron a pre ARMv6 compilation.  Let's see with an
ARMv5 target:

foo:
	ldrb	r3, [r0, #0]	@ zero_extendqisi2	@ tmp139,* x
	ldrb	r1, [r0, #1]	@ zero_extendqisi2	@ tmp140,
	ldrb	r2, [r0, #2]	@ zero_extendqisi2	@ tmp143,
	ldrb	r0, [r0, #3]	@ zero_extendqisi2	@ tmp146,
	orr	r3, r3, r1, asl #8	@, tmp142, tmp139, tmp140,
	orr	r3, r3, r2, asl #16	@, tmp145, tmp142, tmp143,
	orr	r0, r3, r0, asl #24	@,, tmp145, tmp146,
	bx	lr	@

bar:
	stmfd	sp!, {r4, r5, r6, r7}	@,
	ldrb	r2, [r0, #0]	@ zero_extendqisi2	@ tmp139,* x
	ldrb	r7, [r0, #1]	@ zero_extendqisi2	@ tmp140,
	ldrb	r3, [r0, #4]	@ zero_extendqisi2	@ tmp149,
	ldrb	r6, [r0, #5]	@ zero_extendqisi2	@ tmp150,
	ldrb	r5, [r0, #2]	@ zero_extendqisi2	@ tmp143,
	ldrb	r4, [r0, #6]	@ zero_extendqisi2	@ tmp153,
	ldrb	r1, [r0, #7]	@ zero_extendqisi2	@ tmp156,
	ldrb	ip, [r0, #3]	@ zero_extendqisi2	@ tmp146,
	orr	r2, r2, r7, asl #8	@, tmp142, tmp139, tmp140,
	orr	r3, r3, r6, asl #8	@, tmp152, tmp149, tmp150,
	orr	r2, r2, r5, asl #16	@, tmp145, tmp142, tmp143,
	orr	r3, r3, r4, asl #16	@, tmp155, tmp152, tmp153,
	orr	r0, r2, ip, asl #24	@,, tmp145, tmp146,
	orr	r1, r3, r1, asl #24	@,, tmp155, tmp156,
	ldmfd	sp!, {r4, r5, r6, r7}
	bx	lr

Compared to the initial results, this is really nicely optimized and I
couldn't do much better if I were to hand code it myself.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-08-25 09:22:30 +01:00
..
hardware Viresh has moved 2012-06-20 14:39:36 -07:00
mach ARM: fiq: change FIQ_START to a variable 2012-07-01 21:59:19 +08:00
a.out-core.h ARM: 6798/1: aout-core: zero thread debug registers in a.out core dump 2011-03-10 15:16:29 +00:00
a.out.h
arch_timer.h ARM: 7451/1: arch timer: implement read_current_timer and get_cycles 2012-07-09 17:42:23 +01:00
asm-offsets.h
assembler.h ARM: cleanups of io includes 2012-03-29 18:02:10 -07:00
atomic.h ARM: fix warnings about atomic64_read 2012-07-05 13:06:32 +01:00
barrier.h ARM: OMAP2+: Fix omap2+ build error 2012-04-03 10:59:10 -07:00
bitops.h Disintegrate asm/system.h for ARM 2012-03-28 18:30:01 +01:00
bug.h Disintegrate asm/system.h for ARM 2012-03-28 18:30:01 +01:00
bugs.h
byteorder.h
cache.h
cacheflush.h ARM: 7479/1: mm: avoid NULL dereference when flushing gate_vma with VIVT caches 2012-07-31 12:04:47 +01:00
cachetype.h ARM: 7062/1: cache: detect PIPT I-cache using CTR 2011-10-17 09:13:41 +01:00
checksum.h
clkdev.h ARM: Consolidate the clkdev header files 2011-07-19 18:09:45 +02:00
cmpxchg.h ARM: 7404/1: cmpxchg64: use atomic64 and local64 routines for cmpxchg64 2012-04-28 17:32:44 +01:00
compiler.h Disintegrate asm/system.h for ARM 2012-03-28 18:30:01 +01:00
cp15.h Disintegrate asm/system.h for ARM 2012-03-28 18:30:01 +01:00
cpu.h arm: Use generic idle thread allocation 2012-04-26 12:06:11 +02:00
cpuidle.h cpuidle: Add common time keeping and irq enabling 2012-03-21 01:59:40 -04:00
cputype.h ARM: 7011/1: Add ARM cpu topology definition 2011-10-17 09:02:43 +01:00
cti.h arm: introduce cross trigger interface helpers 2011-12-02 15:16:33 +00:00
delay.h ARM: 7452/1: delay: allow timer-based delay implementation to be selected 2012-07-09 17:42:23 +01:00
device.h ARM: dma-mapping: add support for IOMMU mapper 2012-05-21 15:06:23 +02:00
div64.h Disintegrate asm/system.h for ARM 2012-03-28 18:30:01 +01:00
dma-contiguous.h ARM: integrate CMA with DMA-mapping subsystem 2012-05-21 15:09:38 +02:00
dma-iommu.h ARM: dma-mapping: add support for IOMMU mapper 2012-05-21 15:06:23 +02:00
dma-mapping.h ARM: dma-mapping: add support for dma_get_sgtable() 2012-07-30 12:25:47 +02:00
dma.h Disintegrate asm/system.h for ARM 2012-03-28 18:30:01 +01:00
domain.h ARM: fix set_domain() macro 2012-07-05 09:50:55 +01:00
ecard.h ARM: io: ecard: move ioaddr() inside __ecard_address 2011-08-17 08:44:16 +01:00
edac.h ARM: 7201/1: add EDAC atomic_scrub function 2011-12-11 08:35:50 +00:00
elf.h ARM: 7294/1: vectors: use gate_vma for vectors user mapping 2012-03-24 09:38:51 +00:00
entry-macro-multi.S ARM: gic: consolidate PPI handling 2011-10-23 13:32:29 +01:00
exception.h ARM: 7115/4: move __exception and friends to asm/exception.h 2011-10-17 09:02:44 +01:00
fb.h
fcntl.h
fiq.h ARM: 6940/1: fiq: Briefly document driver responsibilities for suspend/resume 2011-05-26 10:31:06 +01:00
fixmap.h
flat.h
floppy.h
fncpy.h
fpstate.h Fix common misspellings 2011-03-31 11:26:23 -03:00
ftrace.h
futex.h ARM: 7425/1: extable: ensure fixup entries are 4-byte aligned 2012-06-16 16:30:25 +01:00
glue-cache.h Fix common misspellings 2011-03-31 11:26:23 -03:00
glue-df.h ARM: Remove support for ARMv3 ARM610 and ARM710 CPUs 2012-05-05 05:50:50 +01:00
glue-pf.h ARM: move cache/processor/fault glue to separate include files 2011-02-12 11:52:21 +00:00
glue-proc.h ARM: Remove support for ARMv3 ARM610 and ARM710 CPUs 2012-05-05 05:50:50 +01:00
glue.h Fix common misspellings 2011-03-31 11:26:23 -03:00
gpio.h ARM: 7271/1: Fix typo in conversion of ARCH_NR_GPIOS to Kconfig 2012-01-08 09:27:19 +00:00
hardirq.h ARM: 7140/1: remove NR_IRQS dependency for ARM-specific HARDIRQ_BITS definition 2011-12-06 11:14:01 +00:00
highmem.h highmem: kill all __kmap_atomic() 2012-03-20 21:48:30 +08:00
hw_breakpoint.h ARM: hw_breakpoint: add support for multiple watchpoints 2011-08-31 10:42:48 +01:00
hw_irq.h arm: dove: Use proper irq accessor functions 2011-03-29 14:47:57 +02:00
hwcap.h UAPI: Split trivial #if defined(__KERNEL__) && X conditionals 2011-12-13 15:07:49 +00:00
ide.h
idmap.h ARM: SMP: use idmap_pgd for mapping MMU enable during secondary booting 2011-12-06 14:04:15 +00:00
io.h ARM: fix out[bwl]() 2012-05-25 08:39:25 +01:00
ioctls.h
irq.h ARM: only include mach/irqs.h for !SPARSE_IRQ 2012-01-30 13:24:37 -06:00
irqflags.h
jump_label.h ARM: 7386/1: jump_label: fixup for rename to static_key 2012-04-15 22:00:31 +01:00
Kbuild ARM: 7493/1: use generic unaligned.h 2012-08-25 09:22:30 +01:00
kexec.h [ARM] add machine-specific hook to machine_kexec 2011-03-03 16:26:55 -05:00
kgdb.h
kmap_types.h arm: remove km_type definitions 2012-07-24 15:27:28 +08:00
kprobes.h Kernel: Audit Support For The ARM Platform 2012-01-17 16:17:01 -05:00
kvm_para.h kvmclock: Add functions to check if the host has stopped the vm 2012-04-08 12:48:59 +03:00
leds.h
limits.h
linkage.h
localtimer.h ARM: local timers: make the runtime registration interface mandatory 2012-03-13 13:45:55 +00:00
mach-types.h
mc146818rtc.h ARM: mc146818rtc: remove unnecessary include of mach/irqs.h 2012-01-25 20:37:45 -06:00
memblock.h ARM: Add arm_memblock_steal() to allocate memory away from the kernel 2012-01-13 15:02:35 +00:00
memory.h ARM: 7432/1: use the new linux/sizes.h 2012-06-28 17:14:35 +01:00
mman.h
mmu_context.h ARM: Remove __ARCH_WANT_INTERRUPTS_ON_CTXSW on pre-ARMv6 CPUs 2012-04-17 15:29:44 +01:00
mmu.h ARM: Remove __ARCH_WANT_INTERRUPTS_ON_CTXSW on pre-ARMv6 CPUs 2012-04-17 15:29:44 +01:00
module.h ARM: 7013/1: P2V: Remove ARM_PATCH_PHYS_VIRT_16BIT 2011-08-13 11:26:40 +01:00
mtd-xip.h
mutex.h ARM: 7467/1: mutex: use generic xchg-based implementation for ARMv6+ 2012-07-31 10:30:41 +01:00
nwflash.h
opcodes.h ARM: 7311/1: Add generic instruction opcode manipulation helpers 2012-03-24 09:38:51 +00:00
outercache.h ARM: 7114/1: cache-l2x0: add resume entry for l2 in secure mode 2011-10-17 09:11:51 +01:00
page-nommu.h
page.h ARM: Remove support for ARMv3 ARM610 and ARM710 CPUs 2012-05-05 05:50:50 +01:00
pci.h PCI: collapse pcibios_resource_to_bus 2012-02-23 20:19:04 -07:00
perf_event.h ARM: 7448/1: perf: remove arm_perf_pmu_ids global enumeration 2012-07-09 17:41:10 +01:00
pgalloc.h ARM: LPAE: Page table maintenance for the 3-level format 2011-12-08 10:30:39 +00:00
pgtable-2level-hwdef.h ARM: 7077/1: LPAE: Use a mask for physical addresses in page table entries 2011-10-06 15:40:06 +01:00
pgtable-2level-types.h ARM: 7076/1: LPAE: Add (pte|pmd)val_t type definitions as u32 2011-10-06 15:40:05 +01:00
pgtable-2level.h ARM: LPAE: Move page table maintenance macros to pgtable-2level.h 2011-12-08 10:30:37 +00:00
pgtable-3level-hwdef.h ARM: LPAE: Introduce the 3-level page table format definitions 2011-12-08 10:30:39 +00:00
pgtable-3level-types.h ARM: LPAE: Introduce the 3-level page table format definitions 2011-12-08 10:30:39 +00:00
pgtable-3level.h ARM: 7416/1: LPAE: Remove unused L_PTE_(BUFFERABLE|CACHEABLE) macros 2012-05-12 14:38:21 +01:00
pgtable-hwdef.h ARM: LPAE: Introduce the 3-level page table format definitions 2011-12-08 10:30:39 +00:00
pgtable-nommu.h Remove remaining bits of io_remap_page_range() 2012-03-23 16:58:31 -07:00
pgtable.h Merge branch 'devel-stable' into for-linus 2012-01-05 13:24:33 +00:00
pmu.h ARM: 7448/1: perf: remove arm_perf_pmu_ids global enumeration 2012-07-09 17:41:10 +01:00
posix_types.h bury __kernel_nlink_t, make internal nlink_t consistent 2012-05-30 21:04:50 -04:00
proc-fns.h ARM: LPAE: Page table maintenance for the 3-level format 2011-12-08 10:30:39 +00:00
processor.h Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-05-23 10:59:07 -07:00
procinfo.h
prom.h ARM: get rid of asm/irq.h in asm/prom.h 2012-03-24 09:38:54 +00:00
ptrace.h ARM: 7374/1: add TRACEHOOK support 2012-04-25 19:49:28 +01:00
scatterlist.h ARM: Allow SoCs to enable scatterlist chaining 2011-06-02 11:16:22 +01:00
sched_clock.h ARM: 7205/2: sched_clock: allow sched_clock to be selected at runtime 2011-12-18 23:00:26 +00:00
seccomp.h
setup.h ARM: 7465/1: Handle >4GB memory sizes in device tree and mem=size@start option 2012-07-29 22:19:14 +01:00
shmparam.h
sigcontext.h
signal.h
smp_plat.h ARM: 7293/1: logical_cpu_map: decouple CPU mapping from SMP 2012-01-23 10:20:05 +00:00
smp_scu.h
smp_twd.h ARM: smp_twd: remove old local timer interface 2012-03-13 13:45:54 +00:00
smp.h ARM: 7293/1: logical_cpu_map: decouple CPU mapping from SMP 2012-01-23 10:20:05 +00:00
sparsemem.h
spinlock_types.h ARM: 7446/1: spinlock: use ticket algorithm for ARMv6+ locking implementation 2012-07-09 17:41:10 +01:00
spinlock.h ARM: 7447/1: rwlocks: remove unused branch labels from trylock routines 2012-07-09 17:41:10 +01:00
stackprotector.h
stacktrace.h
stat.h
statfs.h
string.h
suspend.h ARM: pm: preallocate a page table for suspend/resume 2011-09-20 23:33:36 +01:00
swab.h Merge branch 'for-next' of git://git.infradead.org/users/dhowells/linux-headers 2012-01-14 18:03:30 -08:00
switch_to.h Disintegrate asm/system.h for ARM 2012-03-28 18:30:01 +01:00
syscall.h ARM: 7373/1: add support for the generic syscall.h interface 2012-04-25 19:49:27 +01:00
system_info.h Disintegrate asm/system.h for ARM 2012-03-28 18:30:01 +01:00
system_misc.h Disintegrate and delete asm/system.h 2012-03-28 15:58:21 -07:00
system.h Disintegrate asm/system.h for ARM 2012-03-28 18:30:01 +01:00
tcm.h ARM: 6985/1: export functions to determine the presence of I/DTCM 2011-07-06 20:49:45 +01:00
termios.h
therm.h
thread_info.h ARM: 7443/1: Revert "new way of handling ERESTART_RESTARTBLOCK" 2012-07-05 09:50:56 +01:00
thread_notify.h ARM: 6867/1: Introduce THREAD_NOTIFY_COPY for copy_thread() hooks 2011-04-10 21:13:36 +01:00
timex.h ARM: 7491/1: use generic version of identical asm headers 2012-08-25 09:22:30 +01:00
tlb.h ARM: 7302/1: Add TLB flushing for both entries in a PMD 2012-02-02 17:37:42 +00:00
tlbflush.h ARM: Remove support for ARMv3 ARM610 and ARM710 CPUs 2012-05-05 05:50:50 +01:00
tls.h ARM: 7403/1: tls: remove covert channel via TPIDRURW 2012-04-28 11:01:30 +01:00
topology.h ARM: 7182/1: ARM cpu topology: fix warning 2011-11-30 23:55:21 +00:00
traps.h ARM: earlier initialization of vectors page 2012-01-23 10:24:11 +00:00
uaccess.h ARM: 7449/1: use generic strnlen_user and strncpy_from_user functions 2012-07-09 17:41:11 +01:00
ucontext.h Fix common misspellings 2011-03-31 11:26:23 -03:00
unified.h ARM: make BSYM macro assembly only 2012-01-16 08:56:25 -06:00
unistd.h ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION 2012-07-30 17:25:21 -07:00
unwind.h ARM: 7187/1: fix unwinding for XIP kernels 2011-12-06 11:16:13 +00:00
user.h ARM: 6798/1: aout-core: zero thread debug registers in a.out core dump 2011-03-10 15:16:29 +00:00
vfp.h
vfpmacros.h
vga.h ARM: set vga memory base at run-time 2011-07-12 11:19:29 -05:00
word-at-a-time.h ARM: 7450/1: dcache: select DCACHE_WORD_ACCESS for little-endian ARMv6+ CPUs 2012-07-09 17:41:11 +01:00
xor.h