linux/arch
Jane Chu 9d53caec84 sparc64: Measure receiver forward progress to avoid send mondo timeout
A large sun4v SPARC system may have moments of intensive xcall activities,
usually caused by unmapping many pages on many CPUs concurrently. This can
flood receivers with CPU mondo interrupts for an extended period, causing
some unlucky senders to hit send-mondo timeout. This problem gets worse
as cpu count increases because sometimes mappings must be invalidated on
all CPUs, and sometimes all CPUs may gang up on a single CPU.

But a busy system is not a broken system. In the above scenario, as long
as the receiver is making forward progress processing mondo interrupts,
the sender should continue to retry.

This patch implements the receiver's forward progress meter by introducing
a per cpu counter 'cpu_mondo_counter[cpu]' where 'cpu' is in the range
of 0..NR_CPUS. The receiver increments its counter as soon as it receives
a mondo and the sender tracks the receiver's counter. If the receiver has
stopped making forward progress when the retry limit is reached, the sender
declares send-mondo-timeout and panic; otherwise, the receiver is allowed
to keep making forward progress.

In addition, it's been observed that PCIe hotplug events generate Correctable
Errors that are handled by hypervisor and then OS. Hypervisor 'borrows'
a guest cpu strand briefly to provide the service. If the cpu strand is
simultaneously the only cpu targeted by a mondo, it may not be available
for the mondo in 20msec, causing SUN4V mondo timeout. It appears that 1 second
is the agreed wait time between hypervisor and guest OS, this patch makes
the adjustment.

Orabug: 25476541
Orabug: 26417466

Signed-off-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
Reviewed-by: Thomas Tai <thomas.tai@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-14 11:18:02 -07:00
..
alpha Merge branch 'uaccess.strlen' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-06 22:07:44 -07:00
arc Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-07-03 16:14:51 -07:00
arm arm: move ELF_ET_DYN_BASE to 4MB 2017-07-10 16:32:36 -07:00
arm64 arm64: move ELF_ET_DYN_BASE to 4GB / 4MB 2017-07-10 16:32:36 -07:00
blackfin - Core Frameworks 2017-07-07 13:38:26 -07:00
c6x This is the first pull request for the new dma-mapping subsystem 2017-07-06 19:20:54 -07:00
cris Merge branch 'uaccess.strlen' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-06 22:07:44 -07:00
frv frv: cmpxchg: implement cmpxchg64() 2017-07-10 16:32:34 -07:00
h8300 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-07-03 16:14:51 -07:00
hexagon Merge branch 'akpm' (patches from Andrew) 2017-07-06 22:27:08 -07:00
ia64 Kbuild thin archives updates for v4.13 2017-07-07 15:11:12 -07:00
m32r Merge branch 'uaccess.strlen' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-06 22:07:44 -07:00
m68k Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu 2017-07-11 09:52:56 -07:00
metag Merge branch 'akpm' (patches from Andrew) 2017-07-06 22:27:08 -07:00
microblaze Merge branch 'uaccess.strlen' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-06 22:07:44 -07:00
mips lib/extable.c: use bsearch() library function in search_extable() 2017-07-10 16:32:35 -07:00
mn10300 Merge branch 'akpm' (patches from Andrew) 2017-07-06 22:27:08 -07:00
nios2 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-07-03 16:14:51 -07:00
openrisc OpenRISC fixes for 4.13 2017-07-07 13:58:49 -07:00
parisc Merge branch 'akpm' (patches from Andrew) 2017-07-06 22:27:08 -07:00
powerpc powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB 2017-07-10 16:32:36 -07:00
s390 s390: reduce ELF_ET_DYN_BASE 2017-07-10 16:32:36 -07:00
score Merge branch 'uaccess.strlen' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-06 22:07:44 -07:00
sh lib/extable.c: use bsearch() library function in search_extable() 2017-07-10 16:32:35 -07:00
sparc sparc64: Measure receiver forward progress to avoid send mondo timeout 2017-07-14 11:18:02 -07:00
tile Kbuild thin archives updates for v4.13 2017-07-07 15:11:12 -07:00
um Merge branch 'uaccess.strlen' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-06 22:07:44 -07:00
unicore32
x86 binfmt_elf: use ELF_ET_DYN_BASE only for PIE 2017-07-10 16:32:36 -07:00
xtensa Merge branch 'uaccess.strlen' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-06 22:07:44 -07:00
.gitignore
Kconfig Kbuild thin archives updates for v4.13 2017-07-07 15:11:12 -07:00