linux/drivers
Laurent Dufour f85086f95f mm: don't rely on system state to detect hot-plug operations
In register_mem_sect_under_node() the system_state's value is checked to
detect whether the call is made during boot time or during an hot-plug
operation.  Unfortunately, that check against SYSTEM_BOOTING is wrong
because regular memory is registered at SYSTEM_SCHEDULING state.  In
addition, memory hot-plug operation can be triggered at this system
state by the ACPI [1].  So checking against the system state is not
enough.

The consequence is that on system with interleaved node's ranges like this:

 Early memory node ranges
   node   1: [mem 0x0000000000000000-0x000000011fffffff]
   node   2: [mem 0x0000000120000000-0x000000014fffffff]
   node   1: [mem 0x0000000150000000-0x00000001ffffffff]
   node   0: [mem 0x0000000200000000-0x000000048fffffff]
   node   2: [mem 0x0000000490000000-0x00000007ffffffff]

This can be seen on PowerPC LPAR after multiple memory hot-plug and
hot-unplug operations are done.  At the next reboot the node's memory
ranges can be interleaved and since the call to link_mem_sections() is
made in topology_init() while the system is in the SYSTEM_SCHEDULING
state, the node's id is not checked, and the sections registered to
multiple nodes:

  $ ls -l /sys/devices/system/memory/memory21/node*
  total 0
  lrwxrwxrwx 1 root root     0 Aug 24 05:27 node1 -> ../../node/node1
  lrwxrwxrwx 1 root root     0 Aug 24 05:27 node2 -> ../../node/node2

In that case, the system is able to boot but if later one of theses
memory blocks is hot-unplugged and then hot-plugged, the sysfs
inconsistency is detected and this is triggering a BUG_ON():

  kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084!
  Oops: Exception in kernel mode, sig: 5 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4
  CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25
  Call Trace:
    add_memory_resource+0x23c/0x340 (unreliable)
    __add_memory+0x5c/0xf0
    dlpar_add_lmb+0x1b4/0x500
    dlpar_memory+0x1f8/0xb80
    handle_dlpar_errorlog+0xc0/0x190
    dlpar_store+0x198/0x4a0
    kobj_attr_store+0x30/0x50
    sysfs_kf_write+0x64/0x90
    kernfs_fop_write+0x1b0/0x290
    vfs_write+0xe8/0x290
    ksys_write+0xdc/0x130
    system_call_exception+0x160/0x270
    system_call_common+0xf0/0x27c

This patch addresses the root cause by not relying on the system_state
value to detect whether the call is due to a hot-plug operation.  An
extra parameter is added to link_mem_sections() detailing whether the
operation is due to a hot-plug operation.

[1] According to Oscar Salvador, using this qemu command line, ACPI
memory hotplug operations are raised at SYSTEM_SCHEDULING state:

  $QEMU -enable-kvm -machine pc -smp 4,sockets=4,cores=1,threads=1 -cpu host -monitor pty \
        -m size=$MEM,slots=255,maxmem=4294967296k  \
        -numa node,nodeid=0,cpus=0-3,mem=512 -numa node,nodeid=1,mem=512 \
        -object memory-backend-ram,id=memdimm0,size=134217728 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 \
        -object memory-backend-ram,id=memdimm1,size=134217728 -device pc-dimm,node=0,memdev=memdimm1,id=dimm1,slot=1 \
        -object memory-backend-ram,id=memdimm2,size=134217728 -device pc-dimm,node=0,memdev=memdimm2,id=dimm2,slot=2 \
        -object memory-backend-ram,id=memdimm3,size=134217728 -device pc-dimm,node=0,memdev=memdimm3,id=dimm3,slot=3 \
        -object memory-backend-ram,id=memdimm4,size=134217728 -device pc-dimm,node=1,memdev=memdimm4,id=dimm4,slot=4 \
        -object memory-backend-ram,id=memdimm5,size=134217728 -device pc-dimm,node=1,memdev=memdimm5,id=dimm5,slot=5 \
        -object memory-backend-ram,id=memdimm6,size=134217728 -device pc-dimm,node=1,memdev=memdimm6,id=dimm6,slot=6 \

Fixes: 4fbce63391 ("mm/memory_hotplug.c: make register_mem_sect_under_node() a callback of walk_memory_range()")
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Scott Cheloha <cheloha@linux.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200915094143.79181-3-ldufour@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-26 10:33:57 -07:00
..
accessibility Char/Misc driver fixes for 5.9-rc3 2020-08-26 10:50:50 -07:00
acpi ACPI: processor: Fix build for ARCH_APICTIMER_STOPS_ON_C3 unset 2020-09-23 13:50:12 +02:00
amba
android drivers: android: Fix the SPDX comment style 2020-07-29 17:05:44 +02:00
ata libata-5.9-2020-09-04 2020-09-04 13:19:19 -07:00
atm atm: eni: fix the missed pci_disable_device() for eni_init_one() 2020-09-04 21:42:57 -07:00
auxdisplay A trivial patch for auxdisplay: 2020-09-05 14:22:46 -07:00
base mm: don't rely on system state to detect hot-plug operations 2020-09-26 10:33:57 -07:00
bcma bcma: gpio: Use irqchip template 2020-08-02 18:26:51 +03:00
block rbd: require global CAP_SYS_ADMIN for mapping and unmapping 2020-09-07 13:14:30 +02:00
bluetooth Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2020-07-31 15:11:52 -07:00
bus treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
cdrom
char Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2020-08-30 15:53:44 -07:00
clk clk: qcom: lpass: Correct goto target in lpass_core_sc7180_probe() 2020-09-10 13:42:35 -07:00
clocksource RISC-V Fixes for 5.9-rc6 (or shortly after) 2020-09-20 10:51:11 -07:00
connector
counter counter: microchip-tcb-capture: check the correct variable 2020-08-22 11:38:42 +01:00
cpufreq cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max() for turbo disabled 2020-09-01 21:15:00 +02:00
cpuidle cpuidle: Drop misleading comments about RCU usage 2020-09-22 19:32:03 +02:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2020-08-30 15:53:44 -07:00
dax dax: Fix stack overflow when mounting fsdax pmem device 2020-09-20 08:57:36 -07:00
dca
devfreq PM / devfreq: tegra30: Disable clock on error in probe 2020-09-23 13:35:58 +09:00
dio
dma dmaengine fixes for v5.9-rc4 2020-09-04 12:12:39 -07:00
dma-buf dmabuf: fix NULL pointer dereference in dma_buf_release() 2020-09-21 11:17:06 +02:00
edac EDAC/ghes: Check whether the driver is on the safe list correctly 2020-09-15 09:42:15 +02:00
eisa
extcon
firewire treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
firmware Single EFI fix for v5.9-rc: 2020-09-20 15:18:11 -07:00
fpga Linux 5.8-rc7 2020-07-27 11:49:37 +02:00
fsi
gnss
gpio treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
gpu Merge tag 'drm-intel-fixes-2020-09-24' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes 2020-09-25 11:07:01 +10:00
greybus greybus: Use fallthrough pseudo-keyword 2020-07-29 16:58:08 +02:00
hid Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid 2020-09-02 12:55:46 -07:00
hsi treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
hv hyperv-fixes for 5.9-rc6 2020-09-15 16:20:43 -07:00
hwmon hwmon fixes for v5.9-rc3 2020-08-29 12:37:00 -07:00
hwspinlock
hwtracing treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
i2c i2c: mxs: use MXS_DMA_CTRL_WAIT4END instead of DMA_CTRL_ACK 2020-09-18 23:11:44 +02:00
i3c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
ide treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
idle cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic 2020-08-26 12:41:53 +02:00
iio Staging / IIO driver fixes for 5.9-rc5 2020-09-13 09:15:20 -07:00
infiniband RDMA/core: Fix ordering of CQ pool destruction 2020-09-14 15:20:18 -03:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2020-09-20 10:40:43 -07:00
interconnect interconnect: qcom: Fix small BW votes being truncated to zero 2020-09-04 00:07:12 +03:00
iommu iommu/amd: Restore IRTE.RemapEn bit for amd_iommu_activate_guest_mode 2020-09-18 11:17:19 +02:00
ipack
irqchip A set of fixes for interrupt chip drivers: 2020-08-30 11:56:54 -07:00
isdn treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
leds LEDs changes for 5.9-rc1. 2020-08-05 19:24:27 -07:00
lightnvm treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
macintosh treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
mailbox iomap: constify ioreadX() iomem argument (as in generic implementation) 2020-08-14 19:56:57 -07:00
mcb
md dm: fix comment in dm_process_bio() 2020-09-21 19:49:15 -04:00
media media fixes for v5.9-rc7 2020-09-24 09:05:04 -07:00
memory treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
memstick treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
message treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
mfd - Bug Fixes 2020-08-28 10:15:33 -07:00
misc Char / Misc driver fixes for 5.9-rc5 2020-09-13 08:52:21 -07:00
mmc mmc: mmc_spi: Fix mmc_spi_dma_alloc() return type for !HAS_DMA 2020-09-14 11:46:16 +02:00
most drivers: most: add USB adapter driver 2020-07-31 14:38:12 +02:00
mtd Revert "mtd: spi-nor: Add capability to disable flash quad mode" 2020-09-14 20:58:27 +05:30
mux treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
net net: mscc: ocelot: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries 2020-09-21 17:40:52 -07:00
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-03 18:50:48 -07:00
ntb treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
nubus
nvdimm libnvdimm: KASAN: global-out-of-bounds Read in internal_create_group 2020-08-17 14:47:38 -06:00
nvme block-5.9-2020-09-22 2020-09-22 14:31:38 -07:00
nvmem nvmem: qcom-spmi-sdam: Enable multiple devices 2020-07-29 17:12:09 +02:00
of of: address: Work around missing device_type property in pcie nodes 2020-08-19 16:30:57 -06:00
opp Merge branch 'opp/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm 2020-09-01 19:44:20 +02:00
oprofile
parisc Merge branch 'parisc-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux 2020-08-12 12:41:15 -07:00
parport treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
pci treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
pcmcia treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
perf treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
phy phy: fixes for 5.9 2020-09-04 12:41:55 +02:00
pinctrl This is the bulk of the pin control changes for the v5.9 2020-08-09 12:52:28 -07:00
platform treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
pnp
power treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
powercap powercap: RAPL: Add support for Lakefield 2020-09-16 14:16:04 +02:00
pps
ps3 treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
ptp ptp: ptp_clockmatrix: use i2c_master_send for i2c write 2020-08-19 16:23:22 -07:00
pwm pwm: Changes for v5.9-rc1 2020-08-14 16:00:09 -07:00
rapidio rapidio: Replace 'select' DMAENGINES 'with depends on' 2020-09-05 19:52:54 +03:00
ras
regulator regulator: Fix for v5.9 2020-09-25 15:16:01 -07:00
remoteproc treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
reset treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
rpmsg treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
rtc treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
s390 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-22 14:43:50 -07:00
sbus
scsi SCSI fixes on 20200915 2020-09-15 16:30:20 -07:00
sfi
sh iomap: constify ioreadX() iomem argument (as in generic implementation) 2020-08-14 19:56:57 -07:00
siox
slimbus
soc treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
soundwire soundwire: fix double free of dangling pointer 2020-09-03 14:10:19 +05:30
spi spi: Fixes for v5.9 2020-09-25 15:21:54 -07:00
spmi
ssb treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
staging Staging / IIO driver fixes for 5.9-rc5 2020-09-13 09:15:20 -07:00
target SCSI fixes on 20200908 2020-09-08 11:42:58 -07:00
tc
tee
thermal - Fix bogus thermal shutdowns for omap4430 where bogus values 2020-09-04 12:49:03 -07:00
thunderbolt thunderbolt: Fix for v5.9-rc6 2020-09-15 13:52:14 +02:00
tty serial: 8250_pci: Add Realtek 816a and 816b 2020-09-16 13:23:33 +02:00
uio
usb usblp: fix race between disconnect() and read() 2020-09-17 18:45:30 +02:00
vdpa vdpa/mlx5: Avoid warnings about shifts on 32-bit platforms 2020-08-26 08:13:59 -04:00
vfio treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
vhost Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-03 18:50:48 -07:00
video TTY/Serial/fbcon fixes for 5.9-rc6 2020-09-20 10:46:26 -07:00
virt
virtio virtio: pci: constify ioreadX() iomem argument (as in generic implementation) 2020-08-14 19:56:57 -07:00
visorbus
vlynq
vme
w1
watchdog treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
xen xen: branch for v5.9-rc4 2020-09-06 09:59:27 -07:00
zorro
Kconfig
Makefile