linux/drivers
David Hildenbrand da10329cb0 virtio-balloon: switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
Commit 71994620bb ("virtio_balloon: replace oom notifier with shrinker")
changed the behavior when deflation happens automatically.  Instead of
deflating when called by the OOM handler, the shrinker is used.

However, the balloon is not simply some other slab cache that should be
shrunk when under memory pressure.  The shrinker does not have a concept
of priorities yet, so this behavior cannot be configured.  Eventually once
that is in place, we might want to switch back after doing proper testing.

There was a report that this results in undesired side effects when
inflating the balloon to shrink the page cache. [1]
	"When inflating the balloon against page cache (i.e. no free memory
	 remains) vmscan.c will both shrink page cache, but also invoke the
	 shrinkers -- including the balloon's shrinker. So the balloon
	 driver allocates memory which requires reclaim, vmscan gets this
	 memory by shrinking the balloon, and then the driver adds the
	 memory back to the balloon. Basically a busy no-op."

The name "deflate on OOM" makes it pretty clear when deflation should
happen - after other approaches to reclaim memory failed, not while
reclaiming. This allows to minimize the footprint of a guest - memory
will only be taken out of the balloon when really needed.

Keep using the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT, because
this has no such side effects. Always register the shrinker with
VIRTIO_BALLOON_F_FREE_PAGE_HINT now. We are always allowed to reuse free
pages that are still to be processed by the guest. The hypervisor takes
care of identifying and resolving possible races between processing a
hinting request and the guest reusing a page.

In contrast to pre commit 71994620bb ("virtio_balloon: replace oom
notifier with shrinker"), don't add a module parameter to configure the
number of pages to deflate on OOM. Can be re-added if really needed.
Also, pay attention that leak_balloon() returns the number of 4k pages -
convert it properly in virtio_balloon_oom_notify().

Testing done by Tyler for future reference:
  Test setup: VM with 16 CPU, 64GB RAM. Running Debian 10. We have a 42
  GB file full of random bytes that we continually cat to /dev/null.
  This fills the page cache as the file is read. Meanwhile, we trigger
  the balloon to inflate, with a target size of 53 GB. This setup causes
  the balloon inflation to pressure the page cache as the page cache is
  also trying to grow. Afterwards we shrink the balloon back to zero (so
  total deflate == total inflate).

  Without this patch (kernel 4.19.0-5):
  Inflation never reaches the target until we stop the "cat file >
  /dev/null" process. Total inflation time was 542 seconds. The longest
  period that made no net forward progress was 315 seconds.
    Result of "grep balloon /proc/vmstat" after the test:
    balloon_inflate 154828377
    balloon_deflate 154828377

  With this patch (kernel 5.6.0-rc4+):
  Total inflation duration was 63 seconds. No deflate-queue activity
  occurs when pressuring the page-cache.
    Result of "grep balloon /proc/vmstat" after the test:
    balloon_inflate 12968539
    balloon_deflate 12968539

  Conclusion: This patch fixes the issue.  In the test it reduced
  inflate/deflate activity by 12x, and reduced inflation time by 8.6x.
  But more importantly, if we hadn't killed the "cat file > /dev/null"
  process then, without the patch, the inflation process would never reach
  the target.

[1] https://www.spinics.net/lists/linux-virtualization/msg40863.html

Link: http://lkml.kernel.org/r/20200311135523.18512-2-david@redhat.com
Fixes: 71994620bb ("virtio_balloon: replace oom notifier with shrinker")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Tyler Sanderson <tysand@google.com>
Tested-by: Tyler Sanderson <tysand@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Wei Wang <wei.w.wang@intel.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Michal Hocko <mhocko@kernel.org>

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:39 -07:00
..
accessibility
acpi Additional ACPI updates for 5.7-rc1 2020-04-06 10:35:06 -07:00
amba Revert "amba: Initialize dma_parms for amba devices" 2020-04-01 08:03:28 +02:00
android Merge 5.6-rc7 into char-misc-next 2020-03-23 07:59:38 +01:00
ata ata: make "libata.force" kernel parameter optional 2020-03-26 10:28:20 -06:00
atm .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
auxdisplay Merge 5.6-rc7 into char-misc-next 2020-03-23 07:59:38 +01:00
base Additional power management updates for 5.7-rc1 2020-04-06 10:14:39 -07:00
bcma
block for-5.7/drivers-2020-03-29 2020-03-30 11:43:51 -07:00
bluetooth
bus ARM: driver updates 2020-04-03 15:05:35 -07:00
cdrom
char sysfs: remove redundant __compat_only_sysfs_link_entry_to_kobj fn 2020-04-05 11:34:35 -07:00
clk There's not much to see in the core framework this time around. Instead the 2020-04-05 10:43:32 -07:00
clocksource clocksource/drivers/timer-vf-pit: Add missing parenthesis 2020-04-05 09:24:58 +02:00
connector
counter
cpufreq Additional power management updates for 5.7-rc1 2020-04-06 10:14:39 -07:00
cpuidle ARM: SoC updates 2020-04-03 15:02:35 -07:00
crypto SPDX patches for 5.7-rc1. 2020-04-03 13:12:26 -07:00
dax
dca
devfreq PM / devfreq: Fix handling dev_pm_qos_remove_request result 2020-03-25 08:35:03 +09:00
dio
dma dmaengine updates for v5.7-rc1 2020-04-02 16:04:42 -07:00
dma-buf
edac Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-03-30 16:40:08 -07:00
eisa .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
extcon Char/Misc driver patches for 5.7-rc1 2020-04-03 13:22:40 -07:00
firewire
firmware ARM: driver updates 2020-04-03 15:05:35 -07:00
fpga
fsi
gnss
gpio This is the bulk of GPIO development for the v5.7 kernel cycle. 2020-04-04 10:27:00 -07:00
gpu New tracing features: 2020-04-05 10:36:18 -07:00
greybus
hid Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid 2020-04-01 15:18:42 -07:00
hsi
hv
hwmon Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-03-30 16:40:08 -07:00
hwspinlock hwspinlock: hwspinlock_internal.h: Replace zero-length array with flexible-array member 2020-03-25 22:30:46 -07:00
hwtracing intel_th: msu: Make stopping the trace optional 2020-03-24 13:45:24 +01:00
i2c Merge branch 'i2c/for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2020-04-02 15:54:13 -07:00
i3c i3c: convert to use i2c_new_client_device() 2020-03-29 10:35:50 +02:00
ide Linux 5.6 2020-03-30 13:31:37 +02:00
idle Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-03-30 16:40:08 -07:00
iio staging: iio: adc: ad7192: Re-indent enum labels 2020-03-23 11:43:42 +01:00
infiniband RDMA 5.7 pull request 2020-04-01 18:18:18 -07:00
input Driver core patches for 5.7-rc1 2020-03-30 13:59:52 -07:00
interconnect interconnect changes for 5.7 2020-03-20 13:45:25 +01:00
iommu
ipack
irqchip Two reverts addressing regressions of the Xilinx interrupt controller 2020-04-05 11:57:12 -07:00
isdn
leds
lightnvm for-5.7/drivers-2020-03-29 2020-03-30 11:43:51 -07:00
macintosh Char/Misc driver patches for 5.7-rc1 2020-04-03 13:22:40 -07:00
mailbox mailbox: imx: add SCU MU support 2020-03-19 23:04:32 -05:00
mcb
md - Fix excessive bio splitting that caused performance regressions. 2020-04-03 14:44:48 -07:00
media Power management updates for 5.7-rc1 2020-03-30 15:05:01 -07:00
memory ARM: driver updates 2020-04-03 15:05:35 -07:00
memstick
message scsi: message: fusion: Replace zero-length array with flexible-array member 2020-03-26 22:40:47 -04:00
mfd
misc pci-v5.7-changes 2020-04-03 14:25:02 -07:00
mmc MMC core: 2020-03-31 16:13:09 -07:00
most staging: most: move core files out of the staging area 2020-03-24 13:42:44 +01:00
mtd mtd: Convert fallthrough comments into statements 2020-03-30 10:14:54 +02:00
mux
net pci-v5.7-changes 2020-04-03 14:25:02 -07:00
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-25 18:58:11 -07:00
ntb pci-v5.7-changes 2020-04-03 14:25:02 -07:00
nubus
nvdimm block: simplify queue allocation 2020-03-27 10:23:43 -06:00
nvme SCSI misc on 20200402 2020-04-02 17:03:53 -07:00
nvmem nvmem: core: remove nvmem_sysfs_get_groups() 2020-03-25 19:23:49 +01:00
of Devicetree updates for v5.7: 2020-04-02 17:32:52 -07:00
opp
oprofile
parisc
parport
pci powerpc updates for 5.7 2020-04-05 11:12:59 -07:00
pcmcia
perf arm64 updates for 5.7: 2020-03-31 10:05:01 -07:00
phy pci-v5.7-changes 2020-04-03 14:25:02 -07:00
pinctrl This is the bulk of GPIO development for the v5.7 kernel cycle. 2020-04-04 10:27:00 -07:00
platform Additional ACPI updates for 5.7-rc1 2020-04-06 10:35:06 -07:00
pnp
power power supply and reset changes for the v5.7 series 2020-04-05 13:47:57 -07:00
powercap Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-03-30 16:40:08 -07:00
pps
ps3
ptp ptp: Avoid deadlocks in the programmable pin code. 2020-03-30 11:16:38 -07:00
pwm clocksource/drivers/timer-ti-dm: Enable autoreload in set_pwm 2020-03-16 12:40:51 +01:00
rapidio
ras
regulator spi/regulator: Updates for v5.7 2020-03-30 14:58:26 -07:00
remoteproc remoteproc/omap: Fix set_load call in omap_rproc_request_timer 2020-04-03 10:47:21 -07:00
reset
rpmsg
rtc RTC for 5.7 2020-04-04 10:38:01 -07:00
s390 s390 updates for the 5.7 merge window 2020-04-04 09:45:50 -07:00
sbus misc: move FLASH_MINOR into miscdevice.h and fix conflicts 2020-03-18 12:27:04 +01:00
scsi pci-v5.7-changes 2020-04-03 14:25:02 -07:00
sfi
sh
siox
slimbus
soc ARM: driver updates 2020-04-03 15:05:35 -07:00
soundwire Char/Misc driver patches for 5.7-rc1 2020-04-03 13:22:40 -07:00
spi sound updates for 5.7-rc1 2020-04-02 15:50:04 -07:00
spmi
ssb
staging Char/Misc driver patches for 5.7-rc1 2020-04-03 13:22:40 -07:00
target SCSI misc on 20200402 2020-04-02 17:03:53 -07:00
tc
tee ARM: driver updates 2020-04-03 15:05:35 -07:00
thermal Additional ACPI updates for 5.7-rc1 2020-04-06 10:35:06 -07:00
thunderbolt Merge 5.6-rc7 into usb-next 2020-03-23 08:04:08 +01:00
tty powerpc updates for 5.7 2020-04-05 11:12:59 -07:00
uio uio: uio_pdrv_genirq: use new devm_uio_register_device() function 2020-03-18 12:34:10 +01:00
usb SCSI misc on 20200402 2020-04-02 17:03:53 -07:00
vfio vfio: Ignore -ENODEV when getting MSI cookie 2020-04-01 13:51:51 -06:00
vhost
video Char/Misc driver patches for 5.7-rc1 2020-04-03 13:22:40 -07:00
virt virt: vbox: Use fallthrough; 2020-03-19 07:41:03 +01:00
virtio virtio-balloon: switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2020-04-07 10:43:39 -07:00
visorbus
vlynq
vme
w1
watchdog
xen xen: branch for v5.7-rc1 2020-04-03 12:51:46 -07:00
zorro SPDX patches for 5.7-rc1. 2020-04-03 13:12:26 -07:00
Kconfig New tracing features: 2020-04-05 10:36:18 -07:00
Makefile staging: most: move core files out of the staging area 2020-03-24 13:42:44 +01:00