2010-05-18 06:35:12 +00:00
|
|
|
config ACPI_APEI
|
|
|
|
bool "ACPI Platform Error Interface (APEI)"
|
2011-04-13 17:48:12 +00:00
|
|
|
select MISC_FILESYSTEMS
|
2011-01-03 22:22:11 +00:00
|
|
|
select PSTORE
|
2010-05-18 06:35:12 +00:00
|
|
|
depends on X86
|
|
|
|
help
|
|
|
|
APEI allows to report errors (for example from the chipset)
|
|
|
|
to the operating system. This improves NMI handling
|
|
|
|
especially. In addition it supports error serialization and
|
|
|
|
error injection.
|
2010-05-18 06:35:14 +00:00
|
|
|
|
ACPI, APEI, Generic Hardware Error Source memory error support
Generic Hardware Error Source provides a way to report platform
hardware errors (such as that from chipset). It works in so called
"Firmware First" mode, that is, hardware errors are reported to
firmware firstly, then reported to Linux by firmware. This way, some
non-standard hardware error registers or non-standard hardware link
can be checked by firmware to produce more valuable hardware error
information for Linux.
Now, only SCI notification type and memory errors are supported. More
notification type and hardware error type will be added later. These
memory errors are reported to user space through /dev/mcelog via
faking a corrected Machine Check, so that the error memory page can be
offlined by /sbin/mcelog if the error count for one page is beyond the
threshold.
On some machines, Machine Check can not report physical address for
some corrected memory errors, but GHES can do that. So this simplified
GHES is implemented firstly.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-05-18 06:35:20 +00:00
|
|
|
config ACPI_APEI_GHES
|
2011-07-13 05:14:18 +00:00
|
|
|
bool "APEI Generic Hardware Error Source"
|
ACPI, APEI, Generic Hardware Error Source memory error support
Generic Hardware Error Source provides a way to report platform
hardware errors (such as that from chipset). It works in so called
"Firmware First" mode, that is, hardware errors are reported to
firmware firstly, then reported to Linux by firmware. This way, some
non-standard hardware error registers or non-standard hardware link
can be checked by firmware to produce more valuable hardware error
information for Linux.
Now, only SCI notification type and memory errors are supported. More
notification type and hardware error type will be added later. These
memory errors are reported to user space through /dev/mcelog via
faking a corrected Machine Check, so that the error memory page can be
offlined by /sbin/mcelog if the error count for one page is beyond the
threshold.
On some machines, Machine Check can not report physical address for
some corrected memory errors, but GHES can do that. So this simplified
GHES is implemented firstly.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-05-18 06:35:20 +00:00
|
|
|
depends on ACPI_APEI && X86
|
|
|
|
select ACPI_HED
|
2011-08-10 02:46:22 +00:00
|
|
|
select IRQ_WORK
|
ACPI, APEI, GHES, printk support for recoverable error via NMI
Some APEI GHES recoverable errors are reported via NMI, but printk is
not safe in NMI context.
To solve the issue, a lock-less memory allocator is used to allocate
memory in NMI handler, save the error record into the allocated
memory, put the error record into a lock-less list. On the other
hand, an irq_work is used to delay the operation from NMI context to
IRQ context. The irq_work IRQ handler will remove nodes from
lock-less list, printk the error record and do some further processing
include recovery operation, then free the memory.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-07-13 05:14:25 +00:00
|
|
|
select GENERIC_ALLOCATOR
|
ACPI, APEI, Generic Hardware Error Source memory error support
Generic Hardware Error Source provides a way to report platform
hardware errors (such as that from chipset). It works in so called
"Firmware First" mode, that is, hardware errors are reported to
firmware firstly, then reported to Linux by firmware. This way, some
non-standard hardware error registers or non-standard hardware link
can be checked by firmware to produce more valuable hardware error
information for Linux.
Now, only SCI notification type and memory errors are supported. More
notification type and hardware error type will be added later. These
memory errors are reported to user space through /dev/mcelog via
faking a corrected Machine Check, so that the error memory page can be
offlined by /sbin/mcelog if the error count for one page is beyond the
threshold.
On some machines, Machine Check can not report physical address for
some corrected memory errors, but GHES can do that. So this simplified
GHES is implemented firstly.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-05-18 06:35:20 +00:00
|
|
|
help
|
|
|
|
Generic Hardware Error Source provides a way to report
|
|
|
|
platform hardware errors (such as that from chipset). It
|
|
|
|
works in so called "Firmware First" mode, that is, hardware
|
|
|
|
errors are reported to firmware firstly, then reported to
|
|
|
|
Linux by firmware. This way, some non-standard hardware
|
|
|
|
error registers or non-standard hardware link can be checked
|
|
|
|
by firmware to produce more valuable hardware error
|
|
|
|
information for Linux.
|
|
|
|
|
2011-02-21 05:54:43 +00:00
|
|
|
config ACPI_APEI_PCIEAER
|
|
|
|
bool "APEI PCIe AER logging/recovering support"
|
|
|
|
depends on ACPI_APEI && PCIEAER
|
|
|
|
help
|
|
|
|
PCIe AER errors may be reported via APEI firmware first mode.
|
|
|
|
Turn on this option to enable the corresponding support.
|
|
|
|
|
2011-07-13 05:14:28 +00:00
|
|
|
config ACPI_APEI_MEMORY_FAILURE
|
|
|
|
bool "APEI memory error recovering support"
|
|
|
|
depends on ACPI_APEI && MEMORY_FAILURE
|
|
|
|
help
|
|
|
|
Memory errors may be reported via APEI firmware first mode.
|
|
|
|
Turn on this option to enable the memory recovering support.
|
|
|
|
|
2010-05-18 06:35:14 +00:00
|
|
|
config ACPI_APEI_EINJ
|
|
|
|
tristate "APEI Error INJection (EINJ)"
|
|
|
|
depends on ACPI_APEI && DEBUG_FS
|
|
|
|
help
|
|
|
|
EINJ provides a hardware error injection mechanism, it is
|
|
|
|
mainly used for debugging and testing the other parts of
|
|
|
|
APEI and some other RAS features.
|
2010-08-12 03:55:17 +00:00
|
|
|
|
|
|
|
config ACPI_APEI_ERST_DEBUG
|
|
|
|
tristate "APEI Error Record Serialization Table (ERST) Debug Support"
|
|
|
|
depends on ACPI_APEI
|
|
|
|
help
|
|
|
|
ERST is a way provided by APEI to save and retrieve hardware
|
2010-09-07 16:49:45 +00:00
|
|
|
error information to and from a persistent store. Enable this
|
2010-08-12 03:55:17 +00:00
|
|
|
if you want to debugging and testing the ERST kernel support
|
|
|
|
and firmware implementation.
|