linux/drivers
zhenwei pi 67f22ba775 mm/memory-failure: disable unpoison once hw error happens
Currently unpoison_memory(unsigned long pfn) is designed for soft
poison(hwpoison-inject) only.  Since 17fae1294a, the KPTE gets cleared
on a x86 platform once hardware memory corrupts.

Unpoisoning a hardware corrupted page puts page back buddy only, the
kernel has a chance to access the page with *NOT PRESENT* KPTE.  This
leads BUG during accessing on the corrupted KPTE.

Suggested by David&Naoya, disable unpoison mechanism when a real HW error
happens to avoid BUG like this:

 Unpoison: Software-unpoisoned page 0x61234
 BUG: unable to handle page fault for address: ffff888061234000
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0002) - not-present page
 PGD 2c01067 P4D 2c01067 PUD 107267063 PMD 10382b063 PTE 800fffff9edcb062
 Oops: 0002 [#1] PREEMPT SMP NOPTI
 CPU: 4 PID: 26551 Comm: stress Kdump: loaded Tainted: G   M       OE     5.18.0.bm.1-amd64 #7
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...
 RIP: 0010:clear_page_erms+0x7/0x10
 Code: ...
 RSP: 0000:ffffc90001107bc8 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: 0000000000000901 RCX: 0000000000001000
 RDX: ffffea0001848d00 RSI: ffffea0001848d40 RDI: ffff888061234000
 RBP: ffffea0001848d00 R08: 0000000000000901 R09: 0000000000001276
 R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
 R13: 0000000000000000 R14: 0000000000140dca R15: 0000000000000001
 FS:  00007fd8b2333740(0000) GS:ffff88813fd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffff888061234000 CR3: 00000001023d2005 CR4: 0000000000770ee0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  <TASK>
  prep_new_page+0x151/0x170
  get_page_from_freelist+0xca0/0xe20
  ? sysvec_apic_timer_interrupt+0xab/0xc0
  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
  __alloc_pages+0x17e/0x340
  __folio_alloc+0x17/0x40
  vma_alloc_folio+0x84/0x280
  __handle_mm_fault+0x8d4/0xeb0
  handle_mm_fault+0xd5/0x2a0
  do_user_addr_fault+0x1d0/0x680
  ? kvm_read_and_reset_apf_flags+0x3b/0x50
  exc_page_fault+0x78/0x170
  asm_exc_page_fault+0x27/0x30

Link: https://lkml.kernel.org/r/20220615093209.259374-2-pizhenwei@bytedance.com
Fixes: 847ce401df ("HWPOISON: Add unpoisoning support")
Fixes: 17fae1294a ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned")
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>	[5.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16 19:11:32 -07:00
..
accessibility Revert "speakup: Generate speakupmap.h automatically" 2022-05-20 21:07:05 +02:00
acpi More power management updates for 5.19-rc1 2022-05-30 11:37:26 -07:00
amba Driver core changes for 5.19-rc1 2022-06-03 11:48:47 -07:00
android fix for breakage in #work.fd this window 2022-06-05 17:14:03 -07:00
ata ata: libata-transport: fix {dma|pio|xfer}_mode sysfs files 2022-06-09 09:25:25 +09:00
atm
auxdisplay
base mm/memory-failure: disable unpoison once hw error happens 2022-06-16 19:11:32 -07:00
bcma
block xen: branch for v5.19-rc1b 2022-06-04 13:42:53 -07:00
bluetooth Bluetooth: btmtksdio: fix the reset takes too long 2022-05-13 13:19:01 +02:00
bus Driver core changes for 5.19-rc1 2022-06-03 11:48:47 -07:00
cdrom cdrom: remove obsolete TODO list 2022-05-15 18:31:28 -06:00
char Random number generator fixes for Linux 5.19-rc2. 2022-06-12 10:33:38 -07:00
clk Driver core changes for 5.19-rc1 2022-06-03 11:48:47 -07:00
clocksource Clockevent/clocksource updates: 2022-06-05 10:47:06 -07:00
comedi drivers: comedi: replace ternary operator with min() 2022-05-19 18:54:45 +02:00
connector
counter
cpufreq ARM: multiplatform changes, part 2 2022-06-02 15:23:54 -07:00
cpuidle Merge branches 'pm-em' and 'pm-cpuidle' 2022-05-23 19:18:51 +02:00
crypto virtio-crypto: enable retry for virtio-crypto-dev 2022-05-31 12:45:09 -04:00
cxl cxl/port: Enable HDM Capability after validating DVSEC Ranges 2022-05-20 12:30:53 -07:00
dax dax: add .recovery_write dax_operation 2022-05-16 13:37:59 -07:00
dca
devfreq PM / devfreq: passive: Return non-error when not-supported event is required 2022-05-19 19:32:19 +02:00
dio drivers: dio: add missing iounmap() in dio_init() 2022-05-19 18:56:51 +02:00
dma dmaengine updates for v5.19-rc1 2022-05-29 11:38:27 -07:00
dma-buf drm for 5.19-rc1 2022-05-25 16:18:27 -07:00
edac - A gargen variety of fixes which don't fit any other tip bucket: 2022-05-23 19:32:59 -07:00
eisa
extcon
firewire Merge branch 'for-linus' into for-next 2022-05-23 07:48:27 +02:00
firmware Follow-up tweaks for the EFI changes in v5.19 2022-06-03 13:39:30 -07:00
fpga
fsi
gnss
gpio gpio: dwapb: Don't print error on -EPROBE_DEFER 2022-06-10 14:26:15 +02:00
gpu drm fixes for 5.19-rc2 2022-06-10 10:13:24 -07:00
greybus
hid USB / Thunderbolt changes for 5.19-rc1 2022-06-03 11:17:49 -07:00
hsi
hte hte: Uninitialized variable in hte_ts_get() 2022-05-20 15:54:41 +02:00
hv Driver core changes for 5.19-rc1 2022-06-03 11:48:47 -07:00
hwmon hwmon: (aquacomputer_d5next) Fix an error handling path in aqc_probe() 2022-05-22 12:25:55 -07:00
hwspinlock
hwtracing
i2c i2c: ismt: prevent memory corruption in ismt_access() 2022-06-02 08:40:56 -07:00
i3c i3c: master: svc: fix returnvar.cocci warning 2022-05-17 22:34:42 +02:00
idle cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE 2022-06-08 18:05:45 +02:00
iio Bitmap patches for 5.19-rc1 2022-06-04 14:04:27 -07:00
infiniband v5.19 pull request 2022-05-26 21:08:40 -07:00
input Input updates for v5.19-rc1 2022-06-07 15:00:29 -07:00
interconnect Char / Misc / Other smaller driver subsystem updates for 5.19-rc1 2022-06-03 11:36:34 -07:00
iommu IOMMU Updates for Linux v5.19 2022-05-31 09:56:54 -07:00
ipack
irqchip irq: mips: replace cpumask_weight with cpumask_empty where appropriate 2022-06-03 06:52:57 -07:00
isdn
leds ARM: multiplatform changes, part 2 2022-06-02 15:23:54 -07:00
macintosh macintosh: via-pmu and via-cuda need RTC_LIB 2022-05-22 15:58:30 +10:00
mailbox mailbox: qcom-ipcc: Fix -Wunused-function with CONFIG_PM_SLEEP=n 2022-05-24 08:08:24 -05:00
mcb
md dm: fix zoned locking imbalance due to needless check in clone_endio 2022-06-10 15:23:54 -04:00
media USB / Thunderbolt changes for 5.19-rc1 2022-06-03 11:17:49 -07:00
memory More power management updates for 5.19-rc1 2022-05-30 11:37:26 -07:00
memstick
message
mfd ARM: multiplatform changes, part 2 2022-06-02 15:23:54 -07:00
misc Char / Misc / Other smaller driver subsystem updates for 5.19-rc1 2022-06-03 11:36:34 -07:00
mmc MMC core: 2022-06-07 14:24:30 -07:00
most
mtd This pull request contains fixes for JFFS2, UBI and UBIFS 2022-06-03 14:42:24 -07:00
mux
net mlx5-fixes-2022-06-08 2022-06-09 22:05:37 -07:00
nfc nfc: nfcmrvl: Fix memory leak in nfcmrvl_play_deferred 2022-06-08 10:18:10 -07:00
ntb
nubus
nvdimm cxl for 5.19 2022-05-27 21:24:19 -07:00
nvme SCSI misc on 20220604 2022-06-05 09:25:12 -07:00
nvmem
of drm for 5.19-rc1 2022-05-25 16:18:27 -07:00
opp OPP updates for 5.19-rc1 2022-05-25 15:02:26 +02:00
parisc
parport
pci Driver core changes for 5.19-rc1 2022-06-03 11:48:47 -07:00
pcmcia ARM: multiplatform changes, part 2 2022-06-02 15:23:54 -07:00
peci
perf arm64 updates for 5.19: 2022-05-23 21:06:11 -07:00
phy phy-for-5.19 2022-05-19 16:56:17 +02:00
pinctrl Pin control bulk changes for the v5.19 series: 2022-05-28 11:15:54 -07:00
platform platform-drivers-x86 for v5.19-2 2022-06-12 11:33:42 -07:00
pnp
power Char / Misc / Other smaller driver subsystem updates for 5.19-rc1 2022-06-03 11:36:34 -07:00
powercap Merge branches 'pm-em' and 'pm-cpuidle' 2022-05-23 19:18:51 +02:00
pps
ps3
ptp ptp: ptp_clockmatrix: fix is_single_shot 2022-05-25 21:51:32 -07:00
pwm pwm: pwm-cros-ec: Add channel type support 2022-05-20 16:40:01 +02:00
rapidio
ras
regulator Merge back reboot/poweroff notifiers rework for 5.19-rc1. 2022-05-25 14:38:29 +02:00
remoteproc
reset
rpmsg Driver core changes for 5.19-rc1 2022-06-03 11:48:47 -07:00
rtc ARM: multiplatform changes, part 2 2022-06-02 15:23:54 -07:00
s390 Driver core changes for 5.19-rc1 2022-06-03 11:48:47 -07:00
sbus
scsi scsi: pmcraid: Fix missing resource cleanup in error case 2022-06-07 22:05:14 -04:00
sh
siox
slimbus Driver core changes for 5.19-rc1 2022-06-03 11:48:47 -07:00
soc Char / Misc / Other smaller driver subsystem updates for 5.19-rc1 2022-06-03 11:36:34 -07:00
soundwire
spi Driver core changes for 5.19-rc1 2022-06-03 11:48:47 -07:00
spmi
ssb
staging Char / Misc / Other smaller driver subsystem updates for 5.19-rc1 2022-06-03 11:36:34 -07:00
target blk-mq: remove the done argument to blk_execute_rq_nowait 2022-05-28 06:15:27 -06:00
tc
tee Fix a compiler warning in OP-TEE driver 2022-05-30 14:44:27 +02:00
thermal Additional thermal control update for 5.19-rc1 2022-05-30 11:34:13 -07:00
thunderbolt USB / Thunderbolt changes for 5.19-rc1 2022-06-03 11:17:49 -07:00
tty xen: branch for v5.19-rc1b 2022-06-04 13:42:53 -07:00
ufs SCSI misc on 20220604 2022-06-05 09:25:12 -07:00
uio
usb Char / Misc / Other smaller driver subsystem updates for 5.19-rc1 2022-06-03 11:36:34 -07:00
vdpa vduse: Fix NULL pointer dereference on sysfs access 2022-06-08 08:56:03 -04:00
vfio VFIO updates for v5.19-rc1 2022-06-01 13:49:15 -07:00
vhost vdpa: make get_vq_group and set_group_asid optional 2022-06-09 00:26:35 -04:00
video parisc architecture fixes & updates for kernel v5.19-rc1 2022-06-04 13:50:23 -07:00
virt Char / Misc / Other smaller driver subsystem updates for 5.19-rc1 2022-06-03 11:36:34 -07:00
virtio virtio,vdpa: fixes 2022-06-11 16:32:47 -07:00
vlynq
vme
w1
watchdog ARM: SoC changes, part 2 2022-06-02 15:27:44 -07:00
xen xen: unexport __init-annotated xen_xlate_map_ballooned_pages() 2022-06-07 08:11:35 +02:00
zorro
Kconfig SCSI misc on 20220604 2022-06-05 09:25:12 -07:00
Makefile SCSI misc on 20220604 2022-06-05 09:25:12 -07:00