ace3647afb
To ensure EINJ working well when injecting errors via EINJ table, add some restrictions: param1 must be a valid physical RAM address and param2 must specify page granularity or narrower. Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
113 lines
4.7 KiB
Plaintext
113 lines
4.7 KiB
Plaintext
APEI Error INJection
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
|
|
EINJ provides a hardware error injection mechanism
|
|
It is very useful for debugging and testing of other APEI and RAS features.
|
|
|
|
To use EINJ, make sure the following are enabled in your kernel
|
|
configuration:
|
|
|
|
CONFIG_DEBUG_FS
|
|
CONFIG_ACPI_APEI
|
|
CONFIG_ACPI_APEI_EINJ
|
|
|
|
The user interface of EINJ is debug file system, under the
|
|
directory apei/einj. The following files are provided.
|
|
|
|
- available_error_type
|
|
Reading this file returns the error injection capability of the
|
|
platform, that is, which error types are supported. The error type
|
|
definition is as follow, the left field is the error type value, the
|
|
right field is error description.
|
|
|
|
0x00000001 Processor Correctable
|
|
0x00000002 Processor Uncorrectable non-fatal
|
|
0x00000004 Processor Uncorrectable fatal
|
|
0x00000008 Memory Correctable
|
|
0x00000010 Memory Uncorrectable non-fatal
|
|
0x00000020 Memory Uncorrectable fatal
|
|
0x00000040 PCI Express Correctable
|
|
0x00000080 PCI Express Uncorrectable fatal
|
|
0x00000100 PCI Express Uncorrectable non-fatal
|
|
0x00000200 Platform Correctable
|
|
0x00000400 Platform Uncorrectable non-fatal
|
|
0x00000800 Platform Uncorrectable fatal
|
|
|
|
The format of file contents are as above, except there are only the
|
|
available error type lines.
|
|
|
|
- error_type
|
|
This file is used to set the error type value. The error type value
|
|
is defined in "available_error_type" description.
|
|
|
|
- error_inject
|
|
Write any integer to this file to trigger the error
|
|
injection. Before this, please specify all necessary error
|
|
parameters.
|
|
|
|
- param1
|
|
This file is used to set the first error parameter value. Effect of
|
|
parameter depends on error_type specified. For example, if error
|
|
type is memory related type, the param1 should be a valid physical
|
|
memory address.
|
|
|
|
- param2
|
|
This file is used to set the second error parameter value. Effect of
|
|
parameter depends on error_type specified. For example, if error
|
|
type is memory related type, the param2 should be a physical memory
|
|
address mask. Linux requires page or narrower granularity, say,
|
|
0xfffffffffffff000.
|
|
|
|
- notrigger
|
|
The EINJ mechanism is a two step process. First inject the error, then
|
|
perform some actions to trigger it. Setting "notrigger" to 1 skips the
|
|
trigger phase, which *may* allow the user to cause the error in some other
|
|
context by a simple access to the cpu, memory location, or device that is
|
|
the target of the error injection. Whether this actually works depends
|
|
on what operations the BIOS actually includes in the trigger phase.
|
|
|
|
BIOS versions based in the ACPI 4.0 specification have limited options
|
|
to control where the errors are injected. Your BIOS may support an
|
|
extension (enabled with the param_extension=1 module parameter, or
|
|
boot command line einj.param_extension=1). This allows the address
|
|
and mask for memory injections to be specified by the param1 and
|
|
param2 files in apei/einj.
|
|
|
|
BIOS versions using the ACPI 5.0 specification have more control over
|
|
the target of the injection. For processor related errors (type 0x1,
|
|
0x2 and 0x4) the APICID of the target should be provided using the
|
|
param1 file in apei/einj. For memory errors (type 0x8, 0x10 and 0x20)
|
|
the address is set using param1 with a mask in param2 (0x0 is equivalent
|
|
to all ones). For PCI express errors (type 0x40, 0x80 and 0x100) the
|
|
segment, bus, device and function are specified using param1:
|
|
|
|
31 24 23 16 15 11 10 8 7 0
|
|
+-------------------------------------------------+
|
|
| segment | bus | device | function | reserved |
|
|
+-------------------------------------------------+
|
|
|
|
An ACPI 5.0 BIOS may also allow vendor specific errors to be injected.
|
|
In this case a file named vendor will contain identifying information
|
|
from the BIOS that hopefully will allow an application wishing to use
|
|
the vendor specific extension to tell that they are running on a BIOS
|
|
that supports it. All vendor extensions have the 0x80000000 bit set in
|
|
error_type. A file vendor_flags controls the interpretation of param1
|
|
and param2 (1 = PROCESSOR, 2 = MEMORY, 4 = PCI). See your BIOS vendor
|
|
documentation for details (and expect changes to this API if vendors
|
|
creativity in using this feature expands beyond our expectations).
|
|
|
|
Example:
|
|
# cd /sys/kernel/debug/apei/einj
|
|
# cat available_error_type # See which errors can be injected
|
|
0x00000002 Processor Uncorrectable non-fatal
|
|
0x00000008 Memory Correctable
|
|
0x00000010 Memory Uncorrectable non-fatal
|
|
# echo 0x12345000 > param1 # Set memory address for injection
|
|
# echo 0xfffffffffffff000 > param2 # Mask - anywhere in this page
|
|
# echo 0x8 > error_type # Choose correctable memory error
|
|
# echo 1 > error_inject # Inject now
|
|
|
|
|
|
For more information about EINJ, please refer to ACPI specification
|
|
version 4.0, section 17.5 and ACPI 5.0, section 18.6.
|