linux/arch/sparc/mm
bob picco 4ccb927289 sparc64: sun4v TLB error power off events
We've witnessed a few TLB events causing the machine to power off because
of prom_halt. In one case it was some nfs related area during rmmod. Another
was an mmapper of /dev/mem. A more recent one is an ITLB issue with
a bad pagesize which could be a hardware bug. Bugs happen but we should
attempt to not power off the machine and/or hang it when possible.

This is a DTLB error from an mmapper of /dev/mem:
[root@sparcie ~]# SUN4V-DTLB: Error at TPC[fffff80100903e6c], tl 1
SUN4V-DTLB: TPC<0xfffff80100903e6c>
SUN4V-DTLB: O7[fffff801081979d0]
SUN4V-DTLB: O7<0xfffff801081979d0>
SUN4V-DTLB: vaddr[fffff80100000000] ctx[1250] pte[98000000000f0610] error[2]
.

This is recent mainline for ITLB:
[ 3708.179864] SUN4V-ITLB: TPC<0xfffffc010071cefc>
[ 3708.188866] SUN4V-ITLB: O7[fffffc010071cee8]
[ 3708.197377] SUN4V-ITLB: O7<0xfffffc010071cee8>
[ 3708.206539] SUN4V-ITLB: vaddr[e0003] ctx[1a3c] pte[2900000dcc800eeb] error[4]
.

Normally sun4v_itlb_error_report() and sun4v_dtlb_error_report() would call
prom_halt() and drop us to OF command prompt "ok". This isn't the case for
LDOMs and the machine powers off.

For the HV reported error of HV_ENORADDR for HV HV_MMU_MAP_ADDR_TRAP we cause
a SIGBUS error by qualifying it within do_sparc64_fault() for fault code mask
of FAULT_CODE_BAD_RA. This is done when trap level (%tl) is less or equal
one("1"). Otherwise, for %tl > 1,  we proceed eventually to die_if_kernel().

The logic of this patch was partially inspired by David Miller's feedback.

Power off of large sparc64 machines is painful. Plus die_if_kernel provides
more context. A reset sequence isn't a brief period on large sparc64 but
better than power-off/power-on sequence.

Cc: sparclinux@vger.kernel.org
Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 17:46:44 -07:00
..
extable.c sparc: Add module.h to files previously implicitly using it. 2011-10-31 19:30:54 -04:00
fault_32.c sparc32: fix sparse warnings in unaligned_32.c 2014-04-29 01:12:26 -04:00
fault_64.c sparc64: sun4v TLB error power off events 2014-09-16 17:46:44 -07:00
gup.c sparc64: Fix bugs in get_user_pages_fast() wrt. THP. 2014-05-03 22:32:37 -07:00
highmem.c sparc32: move kmap_init() to highmem.c 2012-07-26 16:46:17 -07:00
hugetlbpage.c hugetlb: restrict hugepage_migration_support() to x86_64 2014-06-04 16:53:51 -07:00
hypersparc.S [PATCH] sparc32: vm_area_struct access for old Sun SPARCs. 2013-07-10 13:56:10 -07:00
init_32.c sparc32: fix sparse warning in devices.c 2014-04-29 01:12:26 -04:00
init_64.c sparc64: Fix up merge thinko. 2014-08-05 19:09:19 -07:00
init_64.h sparc: drop use of extern for prototypes in arch/sparc/* 2014-05-18 19:01:29 -07:00
io-unit.c sparc32: fix sparse warning in io-unit.c 2014-05-18 19:01:26 -07:00
iommu.c sparc32: fix sparse warning in iommu.c 2014-05-18 19:01:26 -07:00
leon_mm.c sparc32: fix sparse "Should it be static?" in mm/ 2014-04-29 01:12:25 -04:00
Makefile sparc32: introduce run-time patching of srmmu access functions 2012-05-27 23:52:49 -07:00
mm_32.h sparc32: fix sparse "Should it be static?" in mm/ 2014-04-29 01:12:25 -04:00
srmmu_access.S sparc32: introduce run-time patching of srmmu access functions 2012-05-27 23:52:49 -07:00
srmmu.c sparc32: fix sparse "Should it be static?" in mm/ 2014-04-29 01:12:25 -04:00
swift.S [PATCH] sparc32: vm_area_struct access for old Sun SPARCs. 2013-07-10 13:56:10 -07:00
tlb.c sparc64: Fix huge PMD invalidation. 2014-05-03 22:31:52 -07:00
tsb.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next 2014-06-19 07:50:07 -10:00
tsunami.S [PATCH] sparc32: vm_area_struct access for old Sun SPARCs. 2013-07-10 13:56:10 -07:00
ultra.S sparc64: Make PAGE_OFFSET variable. 2013-11-12 15:22:34 -08:00
viking.S [PATCH] sparc32: vm_area_struct access for old Sun SPARCs. 2013-07-10 13:56:10 -07:00