linux/arch/arm/lib
Fabio Estevam 11d4bb1bd0 ARM: 7907/1: lib: delay-loop: Add align directive to fix BogoMIPS calculation
Currently mx53 (CortexA8) running at 1GHz reports:
Calibrating delay loop... 663.55 BogoMIPS (lpj=3317760)

Tom Evans verified that alignments of 0x0 and 0x8 run the two instructions of __loop_delay in one clock cycle (1 clock/loop), while alignments of 0x4 and 0xc take 3 clocks to run the loop twice. (1.5 clock/loop)

The original object code looks like this:

00000010 <__loop_const_udelay>:
  10:	e3e01000 	mvn	r1, #0
  14:	e51f201c 	ldr	r2, [pc, #-28]	; 0 <__loop_udelay-0x8>
  18:	e5922000 	ldr	r2, [r2]
  1c:	e0800921 	add	r0, r0, r1, lsr #18
  20:	e1a00720 	lsr	r0, r0, #14
  24:	e0822b21 	add	r2, r2, r1, lsr #22
  28:	e1a02522 	lsr	r2, r2, #10
  2c:	e0000092 	mul	r0, r2, r0
  30:	e0800d21 	add	r0, r0, r1, lsr #26
  34:	e1b00320 	lsrs	r0, r0, #6
  38:	01a0f00e 	moveq	pc, lr

0000003c <__loop_delay>:
  3c:	e2500001 	subs	r0, r0, #1
  40:	8afffffe 	bhi	3c <__loop_delay>
  44:	e1a0f00e 	mov	pc, lr

After adding the 'align 3' directive to __loop_delay (align to 8 bytes):

00000010 <__loop_const_udelay>:
  10:	e3e01000 	mvn	r1, #0
  14:	e51f201c 	ldr	r2, [pc, #-28]	; 0 <__loop_udelay-0x8>
  18:	e5922000 	ldr	r2, [r2]
  1c:	e0800921 	add	r0, r0, r1, lsr #18
  20:	e1a00720 	lsr	r0, r0, #14
  24:	e0822b21 	add	r2, r2, r1, lsr #22
  28:	e1a02522 	lsr	r2, r2, #10
  2c:	e0000092 	mul	r0, r2, r0
  30:	e0800d21 	add	r0, r0, r1, lsr #26
  34:	e1b00320 	lsrs	r0, r0, #6
  38:	01a0f00e 	moveq	pc, lr
  3c:	e320f000 	nop	{0}

00000040 <__loop_delay>:
  40:	e2500001 	subs	r0, r0, #1
  44:	8afffffe 	bhi	40 <__loop_delay>
  48:	e1a0f00e 	mov	pc, lr
  4c:	e320f000 	nop	{0}

, which now reports:
Calibrating delay loop... 996.14 BogoMIPS (lpj=4980736)

Some more test results:

On mx31 (ARM1136) running at 532 MHz, before the patch:
Calibrating delay loop... 351.43 BogoMIPS (lpj=1757184)

On mx31 (ARM1136) running at 532 MHz after the patch:
Calibrating delay loop... 528.79 BogoMIPS (lpj=2643968)

Also tested on mx6 (CortexA9) and on mx27 (ARM926), which shows the same
BogoMIPS value before and after this patch.

Reported-by: Tom Evans <tom_usenet@optusnet.com.au>
Suggested-by: Tom Evans <tom_usenet@optusnet.com.au>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-11-30 22:21:03 +00:00
..
ashldi3.S
ashrdi3.S
backtrace.S ARM: 7068/1: process: change from __backtrace to dump_stack in show_regs 2011-10-17 09:12:41 +01:00
bitops.h ARM: 7893/1: bitops: only emit .arch_extension mp if CONFIG_SMP 2013-11-20 23:05:53 +00:00
call_with_stack.S ARM: lib: add call_with_stack function for safely changing stack 2011-12-12 16:07:35 +00:00
changebit.S ARM: 7171/1: unwind: add unwind directives to bitops assembly macros 2011-11-26 21:58:53 +00:00
clear_user.S
clearbit.S ARM: 7171/1: unwind: add unwind directives to bitops assembly macros 2011-11-26 21:58:53 +00:00
copy_from_user.S
copy_page.S
copy_template.S
copy_to_user.S
csumipv6.S
csumpartial.S
csumpartialcopy.S
csumpartialcopygeneric.S
csumpartialcopyuser.S
delay-loop.S ARM: 7907/1: lib: delay-loop: Add align directive to fix BogoMIPS calculation 2013-11-30 22:21:03 +00:00
delay.c arm: delete __cpuinit/__CPUINIT usage from all ARM users 2013-07-14 19:36:52 -04:00
div64.S ARM: 7125/1: Add unwinding annotations for 64bit division functions 2011-10-17 09:13:42 +01:00
ecard.S ARM: remove unnecessary mach/hardware.h includes 2011-07-12 11:19:27 -05:00
findbit.S
floppydma.S
getuser.S ARM: 7527/1: uaccess: explicitly check __user pointer when !CPU_USE_DOMAINS 2012-09-09 17:28:47 +01:00
io-acorn.S arch: remove direct definitions of KERN_<LEVEL> uses 2012-07-30 17:25:13 -07:00
io-readsb.S
io-readsl.S
io-readsw-armv3.S ARM: Bring back ARMv3 IO and user access code 2012-08-13 11:44:13 +01:00
io-readsw-armv4.S
io-writesb.S
io-writesl.S
io-writesw-armv3.S ARM: Bring back ARMv3 IO and user access code 2012-08-13 11:44:13 +01:00
io-writesw-armv4.S
lib1funcs.S
lshrdi3.S
Makefile ARM: delete mach-shark 2013-09-17 12:34:36 +02:00
memchr.S
memcpy.S
memmove.S
memset.S ARM: 7670/1: fix the memset fix 2013-03-12 12:18:47 +00:00
memzero.S
muldi3.S
putuser.S ARM: 7527/1: uaccess: explicitly check __user pointer when !CPU_USE_DOMAINS 2012-09-09 17:28:47 +01:00
setbit.S ARM: 7171/1: unwind: add unwind directives to bitops assembly macros 2011-11-26 21:58:53 +00:00
strchr.S
strrchr.S
testchangebit.S ARM: 7171/1: unwind: add unwind directives to bitops assembly macros 2011-11-26 21:58:53 +00:00
testclearbit.S ARM: 7171/1: unwind: add unwind directives to bitops assembly macros 2011-11-26 21:58:53 +00:00
testsetbit.S ARM: 7171/1: unwind: add unwind directives to bitops assembly macros 2011-11-26 21:58:53 +00:00
uaccess_with_memcpy.c ARM: 7858/1: mm: make UACCESS_WITH_MEMCPY huge page aware 2013-10-29 11:06:15 +00:00
uaccess.S ARM: Bring back ARMv3 IO and user access code 2012-08-13 11:44:13 +01:00
ucmpdi2.S
xor-neon.c ARM: 7835/2: fix modular build of xor_blocks() with NEON enabled 2013-09-09 15:24:47 +01:00