linux/drivers/acpi/apei
James Morse 7f17b4a121 ACPI: APEI: Kick the memory_failure() queue for synchronous errors
memory_failure() offlines or repairs pages of memory that have been
discovered to be corrupt. These may be detected by an external
component, (e.g. the memory controller), and notified via an IRQ.
In this case the work is queued as not all of memory_failure()s work
can happen in IRQ context.

If the error was detected as a result of user-space accessing a
corrupt memory location the CPU may take an abort instead. On arm64
this is a 'synchronous external abort', and on a firmware first
system it is replayed using NOTIFY_SEA.

This notification has NMI like properties, (it can interrupt
IRQ-masked code), so the memory_failure() work is queued. If we
return to user-space before the queued memory_failure() work is
processed, we will take the fault again. This loop may cause platform
firmware to exceed some threshold and reboot when Linux could have
recovered from this error.

For NMIlike notifications keep track of whether memory_failure() work
was queued, and make task_work pending to flush out the queue.
To save memory allocations, the task_work is allocated as part of
the ghes_estatus_node, and free()ing it back to the pool is deferred.

Signed-off-by: James Morse <james.morse@arm.com>
Tested-by: Tyler Baicar <baicar@os.amperecomputing.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-05-19 19:51:11 +02:00
..
apei-base.c acpi: Use pr_warn instead of pr_warning 2019-10-18 15:00:19 +02:00
apei-internal.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
bert.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 437 2019-06-05 17:37:17 +02:00
einj.c acpi: Use pr_warn instead of pr_warning 2019-10-18 15:00:19 +02:00
erst-dbg.c acpi: Use pr_warn instead of pr_warning 2019-10-18 15:00:19 +02:00
erst.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 174 2019-05-30 11:26:41 -07:00
ghes.c ACPI: APEI: Kick the memory_failure() queue for synchronous errors 2020-05-19 19:51:11 +02:00
hest.c acpi: Use pr_warn instead of pr_warning 2019-10-18 15:00:19 +02:00
Kconfig ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue 2019-02-07 23:10:45 +01:00
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00