linux

History

Rob Herring d25c881aa3 ARM: 7493/1: use generic unaligned.h This moves ARM over to the asm-generic/unaligned.h header. This has the benefit of better code generated especially for ARMv7 on gcc 4.7+ compilers. As Arnd Bergmann, points out: The asm-generic version uses the "struct" version for native-endian unaligned access and the "byteshift" version for the opposite endianess. The current ARM version however uses the "byteshift" implementation for both. Thanks to Nicolas Pitre for the excellent analysis: Test case: int foo (int x) { return get_unaligned(x); } long long bar (long long x) { return get_unaligned(x); } With the current ARM version: foo: ldrb r3, [r0, #2] @ zero_extendqisi2 @ MEM[(const u8 )x_1(D) + 2B], MEM[(const u8 )x_1(D) + 2B] ldrb r1, [r0, #1] @ zero_extendqisi2 @ MEM[(const u8 )x_1(D) + 1B], MEM[(const u8 )x_1(D) + 1B] ldrb r2, [r0, #0] @ zero_extendqisi2 @ MEM[(const u8 )x_1(D)], MEM[(const u8 )x_1(D)] mov r3, r3, asl #16 @ tmp154, MEM[(const u8 )x_1(D) + 2B], ldrb r0, [r0, #3] @ zero_extendqisi2 @ MEM[(const u8 )x_1(D) + 3B], MEM[(const u8 )x_1(D) + 3B] orr r3, r3, r1, asl #8 @, tmp155, tmp154, MEM[(const u8 )x_1(D) + 1B], orr r3, r3, r2 @ tmp157, tmp155, MEM[(const u8 )x_1(D)] orr r0, r3, r0, asl #24 @,, tmp157, MEM[(const u8 )x_1(D) + 3B], bx lr @ bar: stmfd sp!, {r4, r5, r6, r7} @, mov r2, #0 @ tmp184, ldrb r5, [r0, #6] @ zero_extendqisi2 @ MEM[(const u8 )x_1(D) + 6B], MEM[(const u8 )x_1(D) + 6B] ldrb r4, [r0, #5] @ zero_extendqisi2 @ MEM[(const u8 )x_1(D) + 5B], MEM[(const u8 )x_1(D) + 5B] ldrb ip, [r0, #2] @ zero_extendqisi2 @ MEM[(const u8 )x_1(D) + 2B], MEM[(const u8 )x_1(D) + 2B] ldrb r1, [r0, #4] @ zero_extendqisi2 @ MEM[(const u8 )x_1(D) + 4B], MEM[(const u8 )x_1(D) + 4B] mov r5, r5, asl #16 @ tmp175, MEM[(const u8 )x_1(D) + 6B], ldrb r7, [r0, #1] @ zero_extendqisi2 @ MEM[(const u8 )x_1(D) + 1B], MEM[(const u8 )x_1(D) + 1B] orr r5, r5, r4, asl #8 @, tmp176, tmp175, MEM[(const u8 )x_1(D) + 5B], ldrb r6, [r0, #7] @ zero_extendqisi2 @ MEM[(const u8 )x_1(D) + 7B], MEM[(const u8 )x_1(D) + 7B] orr r5, r5, r1 @ tmp178, tmp176, MEM[(const u8 )x_1(D) + 4B] ldrb r4, [r0, #0] @ zero_extendqisi2 @ MEM[(const u8 )x_1(D)], MEM[(const u8 )x_1(D)] mov ip, ip, asl #16 @ tmp188, MEM[(const u8 )x_1(D) + 2B], ldrb r1, [r0, #3] @ zero_extendqisi2 @ MEM[(const u8 )x_1(D) + 3B], MEM[(const u8 )x_1(D) + 3B] orr ip, ip, r7, asl #8 @, tmp189, tmp188, MEM[(const u8 )x_1(D) + 1B], orr r3, r5, r6, asl #24 @,, tmp178, MEM[(const u8 )x_1(D) + 7B], orr ip, ip, r4 @ tmp191, tmp189, MEM[(const u8 )x_1(D)] orr ip, ip, r1, asl #24 @, tmp194, tmp191, MEM[(const u8 )x_1(D) + 3B], mov r1, r3 @, orr r0, r2, ip @ tmp171, tmp184, tmp194 ldmfd sp!, {r4, r5, r6, r7} bx lr In both cases the code is slightly suboptimal. One may wonder why wasting r2 with the constant 0 in the second case for example. And all the mov's could be folded in subsequent orr's, etc. Now with the asm-generic version: foo: ldr r0, [r0, #0] @ unaligned @,* x bx lr @ bar: mov r3, r0 @ x, x ldr r0, [r0, #0] @ unaligned @,* x ldr r1, [r3, #4] @ unaligned @, bx lr @ This is way better of course, but only because this was compiled for ARMv7. In this case the compiler knows that the hardware can do unaligned word access. This isn't that obvious for foo(), but if we remove the get_unaligned() from bar as follows: long long bar (long long x) {return x; } then the resulting code is: bar: ldmia r0, {r0, r1} @ x,, bx lr @ So this proves that the presumed aligned vs unaligned cases does have influence on the instructions the compiler may use and that the above unaligned code results are not just an accident. Still... this isn't fully conclusive without at least looking at the resulting assembly fron a pre ARMv6 compilation. Let's see with an ARMv5 target: foo: ldrb r3, [r0, #0] @ zero_extendqisi2 @ tmp139,* x ldrb r1, [r0, #1] @ zero_extendqisi2 @ tmp140, ldrb r2, [r0, #2] @ zero_extendqisi2 @ tmp143, ldrb r0, [r0, #3] @ zero_extendqisi2 @ tmp146, orr r3, r3, r1, asl #8 @, tmp142, tmp139, tmp140, orr r3, r3, r2, asl #16 @, tmp145, tmp142, tmp143, orr r0, r3, r0, asl #24 @,, tmp145, tmp146, bx lr @ bar: stmfd sp!, {r4, r5, r6, r7} @, ldrb r2, [r0, #0] @ zero_extendqisi2 @ tmp139,* x ldrb r7, [r0, #1] @ zero_extendqisi2 @ tmp140, ldrb r3, [r0, #4] @ zero_extendqisi2 @ tmp149, ldrb r6, [r0, #5] @ zero_extendqisi2 @ tmp150, ldrb r5, [r0, #2] @ zero_extendqisi2 @ tmp143, ldrb r4, [r0, #6] @ zero_extendqisi2 @ tmp153, ldrb r1, [r0, #7] @ zero_extendqisi2 @ tmp156, ldrb ip, [r0, #3] @ zero_extendqisi2 @ tmp146, orr r2, r2, r7, asl #8 @, tmp142, tmp139, tmp140, orr r3, r3, r6, asl #8 @, tmp152, tmp149, tmp150, orr r2, r2, r5, asl #16 @, tmp145, tmp142, tmp143, orr r3, r3, r4, asl #16 @, tmp155, tmp152, tmp153, orr r0, r2, ip, asl #24 @,, tmp145, tmp146, orr r1, r3, r1, asl #24 @,, tmp155, tmp156, ldmfd sp!, {r4, r5, r6, r7} bx lr Compared to the initial results, this is really nicely optimized and I couldn't do much better if I were to hand code it myself. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Reviewed-by: Nicolas Pitre <nico@linaro.org> Tested-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>		2012-08-25 09:22:30 +01:00
..
alpha	ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION	2012-07-30 17:25:21 -07:00
arm	ARM: 7493/1: use generic unaligned.h	2012-08-25 09:22:30 +01:00
avr32	ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION	2012-07-30 17:25:21 -07:00
blackfin	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k	2012-08-03 10:52:41 -07:00
c6x	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next	2012-07-24 10:01:50 -07:00
cris	ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION	2012-07-30 17:25:21 -07:00
frv	Merge branch 'akpm' (Andrew's patch-bomb)	2012-07-30 17:25:34 -07:00
h8300	ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION	2012-07-30 17:25:21 -07:00
hexagon	Merge branch 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild	2012-07-30 11:24:53 -07:00
ia64	ACPI: Only count valid srat memory structures	2012-08-03 00:15:53 -04:00
m32r	ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION	2012-07-30 17:25:21 -07:00
m68k	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k	2012-08-03 10:52:41 -07:00
microblaze	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k	2012-08-03 10:52:41 -07:00
mips	Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus	2012-08-01 16:47:15 -07:00
mn10300	Merge branch 'akpm' (Andrew's patch-bomb)	2012-07-30 17:25:34 -07:00
openrisc	Remove useless wrappers of asm-generic/rmap.h	2012-06-28 11:29:26 +02:00
parisc	PCI changes for the 3.6 merge window:	2012-07-24 16:17:07 -07:00
powerpc	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2012-08-01 10:26:23 -07:00
s390	Merge branch 'akpm' (Andrew's patch-bomb)	2012-07-31 19:25:39 -07:00
score
sh	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k	2012-08-03 10:52:41 -07:00
sparc	This patch series contains a major revamp of how we collect entropy	2012-07-31 19:07:42 -07:00
tile	memcg: rename config variables	2012-07-31 18:42:43 -07:00
um	Merge branch 'for-linus-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml	2012-08-01 16:45:02 -07:00
unicore32	PCI changes for the 3.6 merge window:	2012-07-24 16:17:07 -07:00
x86	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux	2012-08-03 14:10:00 -07:00
xtensa	xtensa: select generic atomic64_t support	2012-07-31 18:42:39 -07:00
.gitignore
Kconfig	ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION	2012-07-30 17:25:21 -07:00