linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-22 20:22:09 +00:00

History

Sourabh Jain c6c5b14dac powerpc: make fadump resilient with memory add/remove events Due to changes in memory resources caused by either memory hotplug or online/offline events, the elfcorehdr, which describes the CPUs and memory of the crashed kernel to the kernel that collects the dump (known as second/fadump kernel), becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr can lead to failed or inaccurate dump collection. Memory hotplug or online/offline events is referred as memory add/remove events in reset of the commit message. The current solution to address the aforementioned issue is as follows: Monitor memory add/remove events in userspace using udev rules, and re-register fadump whenever there are changes in memory resources. This leads to the creation of a new elfcorehdr with updated system memory information. There are several notable issues associated with re-registering fadump for every memory add/remove events. 1. Bulk memory add/remove events with udev-based fadump re-registration can lead to race conditions and, more importantly, it creates a wide window during which fadump is inactive until all memory add/remove events are settled. 2. Re-registering fadump for every memory add/remove event is inefficient. 3. The memory for elfcorehdr is allocated based on the memblock regions available during early boot and remains fixed thereafter. However, if elfcorehdr is later recreated with additional memblock regions, its size will increase, potentially leading to memory corruption. Address the aforementioned challenges by shifting the creation of elfcorehdr from the first kernel (also referred as the crashed kernel), where it was created and frequently recreated for every memory add/remove event, to the fadump kernel. As a result, the elfcorehdr only needs to be created once, thus eliminating the necessity to re-register fadump during memory add/remove events. At present, the first kernel prepares fadump header and stores it in the fadump reserved area. The fadump header includes the start address of the elfcorehdr, crashing CPU details, and other relevant information. In the event of a crash in the first kernel, the second/fadump boots and accesses the fadump header prepared by the first kernel. It then performs the following steps in a platform-specific function [rtas\|opal]_fadump_process: 1. Sanity check for fadump header 2. Update CPU notes in elfcorehdr Along with the above, update the setup_fadump()/fadump.c to create elfcorehdr and set its address to the global variable elfcorehdr_addr for the vmcore module to process it in the second/fadump kernel. Section below outlines the information required to create the elfcorehdr and the changes made to make it available to the fadump kernel if it's not already. To create elfcorehdr, the following crashed kernel information is required: CPU notes, vmcoreinfo, and memory ranges. At present, the CPU notes are already prepared in the fadump kernel, so no changes are needed in that regard. The fadump kernel has access to all crashed kernel memory regions, including boot memory regions that are relocated by firmware to fadump reserved areas, so no changes for that either. However, it is necessary to add new members to the fadump header, i.e., the 'fadump_crash_info_header' structure, in order to pass the crashed kernel's vmcoreinfo address and its size to fadump kernel. In addition to the vmcoreinfo address and size, there are a few other attributes also added to the fadump_crash_info_header structure. 1. version: It stores the fadump header version, which is currently set to 1. This provides flexibility to update the fadump crash info header in the future without changing the magic number. For each change in the fadump header, the version will be increased. This will help the updated kernel determine how to handle kernel dumps from older kernels. The magic number remains relevant for checking fadump header corruption. 2. pt_regs_sz/cpu_mask_sz: Store size of pt_regs and cpu_mask structure of first kernel. These attributes are used to prevent dump processing if the sizes of pt_regs or cpu_mask structure differ between the first and fadump kernels. Note: if either first/crashed kernel or second/fadump kernel do not have the changes introduced here then kernel fail to collect the dump and prints relevant error message on the console. Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240422195932.1583833-2-sourabhjain@linux.ibm.com		2024-04-29 23:51:15 +10:00
..
boot	USB/Thunderbolt changes for 6.9-rc1	2024-03-21 12:35:20 -07:00
configs	powerpc: Add allmodconfig for all 32-bit sub-arches	2024-03-03 22:20:29 +11:00
crypto	crypto: vmx - Move to arch/powerpc/crypto	2024-01-26 16:36:57 +08:00
include	powerpc: make fadump resilient with memory add/remove events	2024-04-29 23:51:15 +10:00
kernel	powerpc: make fadump resilient with memory add/remove events	2024-04-29 23:51:15 +10:00
kexec	powerpc/kdump: Split KEXEC_CORE and CRASH_DUMP dependency	2024-03-17 13:34:00 +11:00
kvm	powerpc updates for 6.9	2024-03-15 17:53:48 -07:00
lib	powerpc: Add static_key_feature_checks_initialized flag	2024-04-15 12:53:39 +10:00
math-emu
mm	powerpc/ptdump: Fix walk_vmemmap() to also print first vmemmap entry	2024-04-18 15:35:40 +10:00
net	powerpc/bpf: use bpf_jit_binary_pack_[alloc\|finalize\|free]	2023-10-23 20:33:19 +11:00
perf	powerpc/hv-gpci: Fix the H_GET_PERF_COUNTER_INFO hcall return value checks	2024-03-03 23:05:21 +11:00
platforms	powerpc: make fadump resilient with memory add/remove events	2024-04-29 23:51:15 +10:00
purgatory	powerpc/purgatory: remove PGO flags	2023-06-12 11:31:50 -07:00
sysdev	powerpc/fsl-soc: hide unused const variable	2024-04-03 21:23:23 +11:00
tools	powerpc/tools: Pass -mabi=elfv2 to gcc-check-mprofile-kernel.sh	2023-10-20 17:46:33 +11:00
xmon	powerpc updates for 6.9	2024-03-15 17:53:48 -07:00
Kbuild	powerpc: Fix fatal warnings flag for LLVM's integrated assembler	2024-04-08 16:06:41 +10:00
Kconfig	powerpc/kdump: Split KEXEC_CORE and CRASH_DUMP dependency	2024-03-17 13:34:00 +11:00
Kconfig.debug	powerpc/ps3: move udbg_shutdown_ps3gelic prototype	2023-11-21 12:06:50 +11:00
Makefile	powerpc updates for 6.9	2024-03-15 17:53:48 -07:00
Makefile.postlink	kbuild: remove ARCH_POSTLINK from module builds	2023-10-28 21:10:08 +09:00