A mirror of the official Linux kernel repository just in case
Go to file
Johannes Weiner e82553c10b Revert "mm: memcontrol: avoid workload stalls when lowering memory.high"
This reverts commit 536d3bf261, as it can
cause writers to memory.high to get stuck in the kernel forever,
performing page reclaim and consuming excessive amounts of CPU cycles.

Before the patch, a write to memory.high would first put the new limit
in place for the workload, and then reclaim the requested delta.  After
the patch, the kernel tries to reclaim the delta before putting the new
limit into place, in order to not overwhelm the workload with a sudden,
large excess over the limit.  However, if reclaim is actively racing
with new allocations from the uncurbed workload, it can keep the write()
working inside the kernel indefinitely.

This is causing problems in Facebook production.  A privileged
system-level daemon that adjusts memory.high for various workloads
running on a host can get unexpectedly stuck in the kernel and
essentially turn into a sort of involuntary kswapd for one of the
workloads.  We've observed that daemon busy-spin in a write() for
minutes at a time, neglecting its other duties on the system, and
expending privileged system resources on behalf of a workload.

To remedy this, we have first considered changing the reclaim logic to
break out after a couple of loops - whether the workload has converged
to the new limit or not - and bound the write() call this way.  However,
the root cause that inspired the sequence change in the first place has
been fixed through other means, and so a revert back to the proven
limit-setting sequence, also used by memory.max, is preferable.

The sequence was changed to avoid extreme latencies in the workload when
the limit was lowered: the sudden, large excess created by the limit
lowering would erroneously trigger the penalty sleeping code that is
meant to throttle excessive growth from below.  Allocating threads could
end up sleeping long after the write() had already reclaimed the delta
for which they were being punished.

However, erroneous throttling also caused problems in other scenarios at
around the same time.  This resulted in commit b3ff92916a ("mm, memcg:
reclaim more aggressively before high allocator throttling"), included
in the same release as the offending commit.  When allocating threads
now encounter large excess caused by a racing write() to memory.high,
instead of entering punitive sleeps, they will simply be tasked with
helping reclaim down the excess, and will be held no longer than it
takes to accomplish that.  This is in line with regular limit
enforcement - i.e.  if the workload allocates up against or over an
otherwise unchanged limit from below.

With the patch breaking userspace, and the root cause addressed by other
means already, revert it again.

Link: https://lkml.kernel.org/r/20210122184341.292461-1-hannes@cmpxchg.org
Fixes: 536d3bf261 ("mm: memcontrol: avoid workload stalls when lowering memory.high")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Tejun Heo <tj@kernel.org>
Acked-by: Chris Down <chris@chrisdown.name>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: <stable@vger.kernel.org>	[5.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09 17:26:44 -08:00
arch - For syscall user dispatch, separate ptctl operation from syscall 2021-02-07 10:16:24 -08:00
block block-5.11-2021-02-05 2021-02-06 14:40:27 -08:00
certs .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
crypto X.509: Fix crash caused by NULL pointer 2021-01-20 11:33:51 -08:00
Documentation kasan: fix stack traces dependency for HW_TAGS 2021-02-09 17:26:44 -08:00
drivers libnvdimm for 5.11-rc7 2021-02-07 10:45:26 -08:00
fs tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha 2021-02-09 17:26:44 -08:00
include firmware_loader: align .builtin_fw to 8 2021-02-09 17:26:44 -08:00
init init/gcov: allow CONFIG_CONSTRUCTORS on UML to fix module gcov 2021-02-05 11:03:47 -08:00
ipc Merge branch 'akpm' (patches from Andrew) 2020-12-15 12:53:37 -08:00
kernel tracing: Fix output of top level event "enable" file 2021-02-08 11:32:39 -08:00
lib - Revert an attempt to not spread IRQ threads on isolated CPUs which has 2021-02-07 10:03:43 -08:00
LICENSES LICENSES: Add the CC-BY-4.0 license 2020-12-08 10:33:27 -07:00
mm Revert "mm: memcontrol: avoid workload stalls when lowering memory.high" 2021-02-09 17:26:44 -08:00
net Fixes: 2021-02-05 10:11:14 -08:00
samples ARM: SoC drivers for v5.11 2020-12-16 16:38:41 -08:00
scripts kallsyms: fix nonconverging kallsyms table with lld 2021-02-05 17:53:28 +09:00
security cap: fix conversions on getxattr 2021-01-28 10:22:48 +01:00
sound ALSA: hda/via: Apply the workaround generically for Clevo machines 2021-01-26 18:05:03 +01:00
tools selftests/vm: rename file run_vmtests to run_vmtests.sh 2021-02-09 17:26:44 -08:00
usr Merge branch 'work.fdpic' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-08-07 13:29:39 -07:00
virt KVM/arm64 fixes for 5.11, take #2 2021-01-25 18:52:01 -05:00
.clang-format clang-format: Update with the latest for_each macro list 2021-01-29 15:00:23 +01:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: docs: ignore sphinx_*/ directories 2020-09-10 10:44:31 -06:00
.mailmap MAINTAINERS: update Andrey Ryabinin's email address 2021-02-09 17:26:44 -08:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: dccp: move Gerrit Renker to CREDITS 2021-01-14 10:53:49 -08:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS MAINTAINERS: update Andrey Ryabinin's email address 2021-02-09 17:26:44 -08:00
Makefile Linux 5.11-rc7 2021-02-07 13:57:38 -08:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.