linux/drivers
David Hildenbrand 63341ab037 virtio-balloon: fix managed page counts when migrating pages between zones
In case we have to migrate a ballon page to a newpage of another zone, the
managed page count of both zones is wrong. Paired with memory offlining
(which will adjust the managed page count), we can trigger kernel crashes
and all kinds of different symptoms.

One way to reproduce:
1. Start a QEMU guest with 4GB, no NUMA
2. Hotplug a 1GB DIMM and online the memory to ZONE_NORMAL
3. Inflate the balloon to 1GB
4. Unplug the DIMM (be quick, otherwise unmovable data ends up on it)
5. Observe /proc/zoneinfo
  Node 0, zone   Normal
    pages free     16810
          min      24848885473806
          low      18471592959183339
          high     36918337032892872
          spanned  262144
          present  262144
          managed  18446744073709533486
6. Do anything that requires some memory (e.g., inflate the balloon some
more). The OOM goes crazy and the system crashes
  [  238.324946] Out of memory: Killed process 537 (login) total-vm:27584kB, anon-rss:860kB, file-rss:0kB, shmem-rss:00
  [  238.338585] systemd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
  [  238.339420] CPU: 0 PID: 1 Comm: systemd Tainted: G      D W         5.4.0-next-20191204+ #75
  [  238.340139] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4
  [  238.341121] Call Trace:
  [  238.341337]  dump_stack+0x8f/0xd0
  [  238.341630]  dump_header+0x61/0x5ea
  [  238.341942]  oom_kill_process.cold+0xb/0x10
  [  238.342299]  out_of_memory+0x24d/0x5a0
  [  238.342625]  __alloc_pages_slowpath+0xd12/0x1020
  [  238.343024]  __alloc_pages_nodemask+0x391/0x410
  [  238.343407]  pagecache_get_page+0xc3/0x3a0
  [  238.343757]  filemap_fault+0x804/0xc30
  [  238.344083]  ? ext4_filemap_fault+0x28/0x42
  [  238.344444]  ext4_filemap_fault+0x30/0x42
  [  238.344789]  __do_fault+0x37/0x1a0
  [  238.345087]  __handle_mm_fault+0x104d/0x1ab0
  [  238.345450]  handle_mm_fault+0x169/0x360
  [  238.345790]  do_user_addr_fault+0x20d/0x490
  [  238.346154]  do_page_fault+0x31/0x210
  [  238.346468]  async_page_fault+0x43/0x50
  [  238.346797] RIP: 0033:0x7f47eba4197e
  [  238.347110] Code: Bad RIP value.
  [  238.347387] RSP: 002b:00007ffd7c0c1890 EFLAGS: 00010293
  [  238.347834] RAX: 0000000000000002 RBX: 000055d196a20a20 RCX: 00007f47eba4197e
  [  238.348437] RDX: 0000000000000033 RSI: 00007ffd7c0c18c0 RDI: 0000000000000004
  [  238.349047] RBP: 00007ffd7c0c1c20 R08: 0000000000000000 R09: 0000000000000033
  [  238.349660] R10: 00000000ffffffff R11: 0000000000000293 R12: 0000000000000001
  [  238.350261] R13: ffffffffffffffff R14: 0000000000000000 R15: 00007ffd7c0c18c0
  [  238.350878] Mem-Info:
  [  238.351085] active_anon:3121 inactive_anon:51 isolated_anon:0
  [  238.351085]  active_file:12 inactive_file:7 isolated_file:0
  [  238.351085]  unevictable:0 dirty:0 writeback:0 unstable:0
  [  238.351085]  slab_reclaimable:5565 slab_unreclaimable:10170
  [  238.351085]  mapped:3 shmem:111 pagetables:155 bounce:0
  [  238.351085]  free:720717 free_pcp:2 free_cma:0
  [  238.353757] Node 0 active_anon:12484kB inactive_anon:204kB active_file:48kB inactive_file:28kB unevictable:0kB iss
  [  238.355979] Node 0 DMA free:11556kB min:36kB low:48kB high:60kB reserved_highatomic:0KB active_anon:152kB inactivB
  [  238.358345] lowmem_reserve[]: 0 2955 2884 2884 2884
  [  238.358761] Node 0 DMA32 free:2677864kB min:7004kB low:10028kB high:13052kB reserved_highatomic:0KB active_anon:0B
  [  238.361202] lowmem_reserve[]: 0 0 72057594037927865 72057594037927865 72057594037927865
  [  238.361888] Node 0 Normal free:193448kB min:99395541895224kB low:73886371836733356kB high:147673348131571488kB reB
  [  238.364765] lowmem_reserve[]: 0 0 0 0 0
  [  238.365101] Node 0 DMA: 7*4kB (U) 5*8kB (UE) 6*16kB (UME) 2*32kB (UM) 1*64kB (U) 2*128kB (UE) 3*256kB (UME) 2*512B
  [  238.366379] Node 0 DMA32: 0*4kB 1*8kB (U) 2*16kB (UM) 2*32kB (UM) 2*64kB (UM) 1*128kB (U) 1*256kB (U) 1*512kB (U)B
  [  238.367654] Node 0 Normal: 1985*4kB (UME) 1321*8kB (UME) 844*16kB (UME) 524*32kB (UME) 300*64kB (UME) 138*128kB (B
  [  238.369184] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
  [  238.369915] 130 total pagecache pages
  [  238.370241] 0 pages in swap cache
  [  238.370533] Swap cache stats: add 0, delete 0, find 0/0
  [  238.370981] Free swap  = 0kB
  [  238.371239] Total swap = 0kB
  [  238.371488] 1048445 pages RAM
  [  238.371756] 0 pages HighMem/MovableOnly
  [  238.372090] 306992 pages reserved
  [  238.372376] 0 pages cma reserved
  [  238.372661] 0 pages hwpoisoned

In another instance (older kernel), I was able to observe this
(negative page count :/):
  [  180.896971] Offlined Pages 32768
  [  182.667462] Offlined Pages 32768
  [  184.408117] Offlined Pages 32768
  [  186.026321] Offlined Pages 32768
  [  187.684861] Offlined Pages 32768
  [  189.227013] Offlined Pages 32768
  [  190.830303] Offlined Pages 32768
  [  190.833071] Built 1 zonelists, mobility grouping on.  Total pages: -36920272750453009

In another instance (older kernel), I was no longer able to start any
process:
  [root@vm ~]# [  214.348068] Offlined Pages 32768
  [  215.973009] Offlined Pages 32768
  cat /proc/meminfo
  -bash: fork: Cannot allocate memory
  [root@vm ~]# cat /proc/meminfo
  -bash: fork: Cannot allocate memory

Fix it by properly adjusting the managed page count when migrating if
the zone changed. The managed page count of the zones now looks after
unplug of the DIMM (and after deflating the balloon) just like before
inflating the balloon (and plugging+onlining the DIMM).

We'll temporarily modify the totalram page count. If this ever becomes a
problem, we can fine tune by providing helpers that don't touch
the totalram pages (e.g., adjust_zone_managed_page_count()).

Please note that fixing up the managed page count is only necessary when
we adjusted the managed page count when inflating - only if we
don't have VIRTIO_BALLOON_F_DEFLATE_ON_OOM. With that feature, the
managed page count is not touched when inflating/deflating.

Reported-by: Yumei Huang <yuhuang@redhat.com>
Fixes: 3dcc0571cd ("mm: correctly update zone->managed_pages")
Cc: <stable@vger.kernel.org> # v3.11+
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-12-11 08:14:06 -05:00
..
accessibility
acpi Additional ACPI updates for 5.5-rc1 2019-12-04 10:56:35 -08:00
amba
android compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
ata pci-v5.5-changes 2019-12-03 13:58:22 -08:00
atm
auxdisplay auxdisplay: charlcd: deduplicate simple_strtoul() 2019-12-04 19:44:12 -08:00
base ARM: SoC-related driver updates 2019-12-05 11:43:31 -08:00
bcma
block xen: branch for v5.5-rc1 2019-12-07 14:49:20 -08:00
bluetooth Bluetooth: btbcm: Use the BDADDR_PROPERTY quirk 2019-11-22 13:35:20 +01:00
bus Few ti-sysc related fixes for v5.5 merge window 2019-12-06 08:26:50 -08:00
cdrom cdrom: respect device capabilities during opening action 2019-11-26 13:02:24 -07:00
char drm msm + fixes for 5.5-rc1 2019-12-06 10:28:09 -08:00
clk ARM: SoC platform updates 2019-12-05 11:38:40 -08:00
clocksource Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-12-03 12:20:25 -08:00
connector
counter
cpufreq cpufreq: tegra: Changes for v5.5-rc1 2019-12-06 08:28:13 -08:00
cpuidle cpuidle: minor Kconfig help text fixes 2019-11-29 11:49:32 +01:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2019-12-02 17:23:21 -08:00
dax libnvdimm for 5.5 2019-12-01 18:43:25 -08:00
dca
devfreq PM / devfreq: Add missing locking while setting suspend_freq 2019-11-29 12:50:34 +01:00
dio
dma dmaengine: Fix Kconfig indentation 2019-11-22 11:16:26 +05:30
dma-buf compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
edac EDAC/altera: Use the Altera System Manager driver 2019-11-22 10:18:29 +01:00
eisa
extcon Char/Misc driver patches for 5.5-rc1 2019-11-27 10:53:50 -08:00
firewire FireWire (IEEE 1394) subsystem updates: 2019-12-02 14:13:00 -08:00
firmware ARM: SoC fixes 2019-12-06 14:19:37 -08:00
fpga
fsi
gnss
gpio gpio: pca953x: tighten up indentation 2019-12-04 19:44:14 -08:00
gpu drm msm + fixes for 5.5-rc1 2019-12-06 10:28:09 -08:00
greybus
hid Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid 2019-12-01 18:20:54 -08:00
hsi
hv Merge branch 'akpm' (patches from Andrew) 2019-12-01 20:36:41 -08:00
hwmon compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
hwspinlock
hwtracing compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
i2c Merge branch 'i2c/for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2019-12-01 18:29:36 -08:00
i3c
ide compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
idle cpuidle: Drop disabled field from struct cpuidle_state 2019-11-29 11:48:39 +01:00
iio chrome platform changes for v5.5 2019-12-03 14:37:12 -08:00
infiniband net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup 2019-12-04 12:27:13 -08:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2019-12-07 18:33:01 -08:00
interconnect
iommu pci-v5.5-changes 2019-12-03 13:58:22 -08:00
ipack
irqchip pci-v5.5-changes 2019-12-03 13:58:22 -08:00
isdn compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
leds Merge tag 'leds-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds 2019-12-01 16:09:28 -08:00
lightnvm
macintosh powerpc updates for 5.5 2019-11-30 14:35:43 -08:00
mailbox mailbox changes for v5.5 2019-12-01 18:42:02 -08:00
mcb
md block: don't handle bio based drivers in blk_revalidate_disk_zones 2019-12-03 08:51:25 -07:00
media compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
memory memory: tegra: Fixes for v5.5-rc1 2019-12-06 08:28:51 -08:00
memstick pci-v5.5-changes 2019-12-03 13:58:22 -08:00
message
mfd chrome platform changes for v5.5 2019-12-03 14:37:12 -08:00
misc lib/genalloc.c: rename addr_in_gen_pool to gen_pool_has_addr 2019-12-04 19:44:13 -08:00
mmc Driver core patches for 5.5-rc1 2019-11-27 11:06:20 -08:00
mtd TTY/Serial patches for 5.5-rc1 2019-12-03 14:09:14 -08:00
mux
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-12-08 13:28:11 -08:00
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-11-22 16:27:24 -08:00
ntb Add Hygon Device ID to the AMD NTB device driver 2019-12-07 18:38:17 -08:00
nubus
nvdimm libnvdimm for 5.5 2019-12-01 18:43:25 -08:00
nvme pci-v5.5-changes 2019-12-03 13:58:22 -08:00
nvmem ARM: SoC-related driver updates 2019-12-05 11:43:31 -08:00
of Devicetree updates for v5.5: 2019-12-02 11:41:35 -08:00
opp
oprofile Printk changes for 5.5 2019-11-25 19:40:40 -08:00
parisc
parport
pci pci-v5.5-changes 2019-12-03 13:58:22 -08:00
pcmcia pcmcia: remove unused dprintk definition 2019-11-22 07:03:45 +01:00
perf
phy ARM: SoC-related driver updates 2019-12-05 11:43:31 -08:00
pinctrl Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-12-03 09:29:50 -08:00
platform chrome platform changes for v5.5 2019-12-03 14:37:12 -08:00
pnp
power Additional power management updates for 5.5-rc1 2019-12-04 10:48:09 -08:00
powercap
pps
ps3
ptp
pwm pwm: Changes for v5.5-rc1 2019-12-05 11:28:14 -08:00
rapidio drivers/rapidio/rio-access.c: fix missing include of <linux/rio_drv.h> 2019-12-04 19:44:13 -08:00
ras
regulator Merge branch 'regulator-5.5' into regulator-next 2019-11-22 19:56:20 +00:00
remoteproc
reset ARM: SoC-related driver updates 2019-12-05 11:43:31 -08:00
rpmsg rpmsg updates for v5.5 2019-12-01 18:39:24 -08:00
rtc RTC for 5.5 2019-12-03 13:31:08 -08:00
s390 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-12-08 13:28:11 -08:00
sbus
scsi SCSI misc on 20191207 2019-12-08 12:23:42 -08:00
sfi
sh
siox
slimbus
soc ARM: SoC fixes 2019-12-06 14:19:37 -08:00
soundwire
spi Merge branch 'spi-5.5' into spi-next 2019-11-22 19:56:35 +00:00
spmi
ssb
staging pci-v5.5-changes 2019-12-03 13:58:22 -08:00
target SCSI misc on 20191130 2019-12-02 13:37:02 -08:00
tc
tee Merge mainline/master into arm/fixes 2019-12-05 13:18:54 -08:00
thermal Merge branch 'thermal/next' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux 2019-12-05 11:21:24 -08:00
thunderbolt
tty TTY/Serial patches for 5.5-rc1 2019-12-03 14:09:14 -08:00
uio
usb usb, kcov: collect coverage from hub_event 2019-12-04 19:44:14 -08:00
vfio VFIO updates for v5.5-rc1 2019-12-07 14:51:04 -08:00
vhost Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-12-08 13:28:11 -08:00
video pci-v5.5-changes 2019-12-03 13:58:22 -08:00
virt compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
virtio virtio-balloon: fix managed page counts when migrating pages between zones 2019-12-11 08:14:06 -05:00
visorbus
vlynq
vme
w1
watchdog linux-watchdog 5.5-rc1 tag 2019-12-01 18:01:03 -08:00
xen xen: branch for v5.5-rc1 2019-12-07 14:49:20 -08:00
zorro
Kconfig
Makefile