011d826111
Introduce a simple data structure for collecting correctable errors along with accessors. More detailed description in the code itself. The error decoding is done with the decoding chain now and mce_first_notifier() gets to see the error first and the CEC decides whether to log it and then the rest of the chain doesn't hear about it - basically the main reason for the CE collector - or to continue running the notifiers. When the CEC hits the action threshold, it will try to soft-offine the page containing the ECC and then the whole decoding chain gets to see the error. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170327093304.10683-5-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
24 lines
916 B
Plaintext
24 lines
916 B
Plaintext
config MCE_AMD_INJ
|
|
tristate "Simple MCE injection interface for AMD processors"
|
|
depends on RAS && X86_MCE && DEBUG_FS && AMD_NB
|
|
default n
|
|
help
|
|
This is a simple debugfs interface to inject MCEs and test different
|
|
aspects of the MCE handling code.
|
|
|
|
WARNING: Do not even assume this interface is staying stable!
|
|
|
|
config RAS_CEC
|
|
bool "Correctable Errors Collector"
|
|
depends on X86_MCE && MEMORY_FAILURE && DEBUG_FS
|
|
---help---
|
|
This is a small cache which collects correctable memory errors per 4K
|
|
page PFN and counts their repeated occurrence. Once the counter for a
|
|
PFN overflows, we try to soft-offline that page as we take it to mean
|
|
that it has reached a relatively high error count and would probably
|
|
be best if we don't use it anymore.
|
|
|
|
Bear in mind that this is absolutely useless if your platform doesn't
|
|
have ECC DIMMs and doesn't have DRAM ECC checking enabled in the BIOS.
|
|
|