linux/arch/powerpc
Mahesh Salgaonkar 75ecfb4951 powerpc/mce: Fix a bug where mce loops on memory UE.
The current code extracts the physical address for UE errors and then
hooks it up into memory failure infrastructure. On successful
extraction of physical address it wrongly sets "handled = 1" which
means this UE error has been recovered. Since MCE handler gets return
value as handled = 1, it assumes that error has been recovered and
goes back to same NIP. This causes MCE interrupt again and again in a
loop leading to hard lockup.

Also, initialize phys_addr to ULONG_MAX so that we don't end up
queuing undesired page to hwpoison.

Without this patch we see:
  Severe Machine check interrupt [Recovered]
    NIP: [000000001002588c] PID: 7109 Comm: find
    Initiator: CPU
    Error type: UE [Load/Store]
      Effective address: 00007fffd2755940
      Physical address:  000020181a080000
  ...
  Severe Machine check interrupt [Recovered]
    NIP: [000000001002588c] PID: 7109 Comm: find
    Initiator: CPU
    Error type: UE [Load/Store]
      Effective address: 00007fffd2755940
      Physical address:  000020181a080000
  Severe Machine check interrupt [Recovered]
    NIP: [000000001002588c] PID: 7109 Comm: find
    Initiator: CPU
    Error type: UE [Load/Store]
      Effective address: 00007fffd2755940
      Physical address:  000020181a080000
  Memory failure: 0x20181a08: recovery action for dirty LRU page: Recovered
  Memory failure: 0x20181a08: already hardware poisoned
  Memory failure: 0x20181a08: already hardware poisoned
  Memory failure: 0x20181a08: already hardware poisoned
  Memory failure: 0x20181a08: already hardware poisoned
  Memory failure: 0x20181a08: already hardware poisoned
  Memory failure: 0x20181a08: already hardware poisoned
  ...
  Watchdog CPU:38 Hard LOCKUP

After this patch we see:

  Severe Machine check interrupt [Not recovered]
    NIP: [00007fffaae585f4] PID: 7168 Comm: find
    Initiator: CPU
    Error type: UE [Load/Store]
      Effective address: 00007fffaafe28ac
      Physical address:  00002017c0bd0000
  find[7168]: unhandled signal 7 at 00007fffaae585f4 nip 00007fffaae585f4 lr 00007fffaae585e0 code 4
  Memory failure: 0x2017c0bd: recovery action for dirty LRU page: Recovered

Fixes: 01eaac2b05 ("powerpc/mce: Hookup ierror (instruction) UE errors")
Fixes: ba41e1e1cc ("powerpc/mce: Hookup derror (load/store) UE errors")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24 13:54:51 +10:00
..
boot powerpc updates for 4.17 2018-04-07 12:08:19 -07:00
configs scsi: remove the fdomain and fdomain_cs drivers 2018-03-19 22:54:47 -04:00
crypto crypto: hash - annotate algorithms taking optional key 2018-01-12 23:03:35 +11:00
include powerpc/powernv/npu: Prevent overwriting of pnv_npu2_init_contex() callback parameters 2018-04-24 09:46:57 +10:00
kernel powerpc/mce: Fix a bug where mce loops on memory UE. 2018-04-24 13:54:51 +10:00
kvm powerpc fixes for 4.17 #2 2018-04-15 11:57:12 -07:00
lib powerpc/lib: Fix off-by-one in alternate feature patching 2018-04-17 00:37:48 +10:00
math-emu License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mm powerpc/mm: Flush cache on memory hot(un)plug 2018-04-24 09:46:56 +10:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-03-05 11:29:24 -08:00
oprofile powerpc: Use sizeof(*foo) rather than sizeof(struct foo) 2018-03-20 16:47:53 +11:00
perf powerpc updates for 4.17 2018-04-07 12:08:19 -07:00
platforms powerpc/powernv/npu: Do a PID GPU TLB flush when invalidating a large address range 2018-04-24 09:46:57 +10:00
purgatory License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sysdev powerpc/xive: Fix trying to "push" an already active pool VP 2018-04-19 00:49:45 +10:00
tools License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
xmon Merge branch 'topic/paca' into next 2018-03-31 09:09:36 +11:00
Kconfig kexec_file: make use of purgatory optional 2018-04-13 17:10:27 -07:00
Kconfig.debug powerpc: Add new kconfig CONFIG_PPC_IRQ_SOFT_MASK_DEBUG 2018-01-19 22:37:03 +11:00
Makefile powerpc/64s: Add POWER9 CPU type selection 2018-04-01 22:15:32 +10:00
Makefile.postlink License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00