linux/include
Mel Gorman 1b4e3f26f9 mm: vmscan: Reduce throttling due to a failure to make progress
Mike Galbraith, Alexey Avramov and Darrick Wong all reported similar
problems due to reclaim throttling for excessive lengths of time.  In
Alexey's case, a memory hog that should go OOM quickly stalls for
several minutes before stalling.  In Mike and Darrick's cases, a small
memcg environment stalled excessively even though the system had enough
memory overall.

Commit 69392a403f ("mm/vmscan: throttle reclaim when no progress is
being made") introduced the problem although commit a19594ca4a
("mm/vmscan: increase the timeout if page reclaim is not making
progress") made it worse.  Systems at or near an OOM state that cannot
be recovered must reach OOM quickly and memcg should kill tasks if a
memcg is near OOM.

To address this, only stall for the first zone in the zonelist, reduce
the timeout to 1 tick for VMSCAN_THROTTLE_NOPROGRESS and only stall if
the scan control nr_reclaimed is 0, kswapd is still active and there
were excessive pages pending for writeback.  If kswapd has stopped
reclaiming due to excessive failures, do not stall at all so that OOM
triggers relatively quickly.  Similarly, if an LRU is simply congested,
only lightly throttle similar to NOPROGRESS.

Alexey's original case was the most straight forward

	for i in {1..3}; do tail /dev/zero; done

On vanilla 5.16-rc1, this test stalled heavily, after the patch the test
completes in a few seconds similar to 5.15.

Alexey's second test case added watching a youtube video while tail runs
10 times.  On 5.15, playback only jitters slightly, 5.16-rc1 stalls a
lot with lots of frames missing and numerous audio glitches.  With this
patch applies, the video plays similarly to 5.15.

[lkp@intel.com: Fix W=1 build warning]

Link: https://lore.kernel.org/r/99e779783d6c7fce96448a3402061b9dc1b3b602.camel@gmx.de
Link: https://lore.kernel.org/r/20211124011954.7cab9bb4@mail.inbox.lv
Link: https://lore.kernel.org/r/20211022144651.19914-1-mgorman@techsingularity.net
Link: https://lore.kernel.org/r/20211202150614.22440-1-mgorman@techsingularity.net
Link: https://linux-regtracking.leemhuis.info/regzbot/regression/20211124011954.7cab9bb4@mail.inbox.lv/
Reported-and-tested-by: Alexey Avramov <hakavlad@inbox.lv>
Reported-and-tested-by: Mike Galbraith <efault@gmx.de>
Reported-and-tested-by: Darrick J. Wong <djwong@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Hugh Dickins <hughd@google.com>
Tracked-by: Thorsten Leemhuis <regressions@leemhuis.info>
Fixes: 69392a403f ("mm/vmscan: throttle reclaim when no progress is being made")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-31 11:17:07 -08:00
..
acpi Merge branches 'acpica', 'acpi-ec', 'acpi-pmic' and 'acpi-video' 2021-11-10 14:03:14 +01:00
asm-generic Add linux/cacheflush.h 2021-11-17 10:36:15 -05:00
clocksource ARM: 2021-11-02 11:24:14 -07:00
crypto crypto: ecc - Export additional helper functions 2021-10-29 21:04:03 +08:00
drm Removed the TTM Huge Page functionnality to address a crash, a timeout 2021-11-11 08:14:19 +10:00
dt-bindings dt-bindings: Rename Ingenic CGU headers to ingenic,*.h 2021-11-11 22:27:14 -06:00
keys
kunit include/kunit/test.h: replace kernel.h with the necessary inclusions 2021-11-09 10:02:49 -08:00
kvm
linux mm: vmscan: Reduce throttling due to a failure to make progress 2021-12-31 11:17:07 -08:00
math-emu
media Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
memory
misc
net sctp: use call_rcu to free endpoint 2021-12-25 17:13:37 +00:00
pcmcia
ras
rdma RDMA/netlink: Add __maybe_unused to static inline in C file 2021-11-16 13:13:08 -04:00
scsi SCSI misc on 20211112 2021-11-12 12:25:50 -08:00
soc net: mscc: ocelot: create a function that replaces an existing VCAP filter 2021-11-26 11:38:20 -08:00
sound ASoC: Fixes for v5.16 2021-11-25 14:35:24 +01:00
target
trace mm: vmscan: Reduce throttling due to a failure to make progress 2021-12-31 11:17:07 -08:00
uapi Networking fixes for 5.16-rc8, including fixes from.. Santa? 2021-12-30 11:12:12 -08:00
vdso
video
xen xen/console: harden hvc_xen against event channel storms 2021-12-16 08:24:08 +01:00