forked from Minki/linux
27bc50fc90
linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam R. Howlett. An overlapping range-based tree for vmas. It it apparently slight more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat (https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com). This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0HaPgAKCRDdBJ7gKXxA joPjAQDZ5LlRCMWZ1oxLP2NOTp6nm63q9PWcGnmY50FjD/dNlwEAnx7OejCLWGWf bbTuk6U2+TKgJa4X7+pbbejeoqnt5QU= =xfWx -----END PGP SIGNATURE----- Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam Howlett. An overlapping range-based tree for vmas. It it apparently slightly more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat at [1]. This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1] * tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits) hugetlb: allocate vma lock for all sharable vmas hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer hugetlb: fix vma lock handling during split vma and range unmapping mglru: mm/vmscan.c: fix imprecise comments mm/mglru: don't sync disk for each aging cycle mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol mm: memcontrol: use do_memsw_account() in a few more places mm: memcontrol: deprecate swapaccounting=0 mode mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled mm/secretmem: remove reduntant return value mm/hugetlb: add available_huge_pages() func mm: remove unused inline functions from include/linux/mm_inline.h selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd selftests/vm: add thp collapse shmem testing selftests/vm: add thp collapse file and tmpfs testing selftests/vm: modularize thp collapse memory operations selftests/vm: dedup THP helpers mm/khugepaged: add tracepoint to hpage_collapse_scan_file() mm/madvise: add file and shmem support to MADV_COLLAPSE ...
353 lines
14 KiB
Plaintext
353 lines
14 KiB
Plaintext
# SPDX-License-Identifier: GPL-2.0-only
|
|
menu "Kernel hardening options"
|
|
|
|
config GCC_PLUGIN_STRUCTLEAK
|
|
bool
|
|
help
|
|
While the kernel is built with warnings enabled for any missed
|
|
stack variable initializations, this warning is silenced for
|
|
anything passed by reference to another function, under the
|
|
occasionally misguided assumption that the function will do
|
|
the initialization. As this regularly leads to exploitable
|
|
flaws, this plugin is available to identify and zero-initialize
|
|
such variables, depending on the chosen level of coverage.
|
|
|
|
This plugin was originally ported from grsecurity/PaX. More
|
|
information at:
|
|
* https://grsecurity.net/
|
|
* https://pax.grsecurity.net/
|
|
|
|
menu "Memory initialization"
|
|
|
|
config CC_HAS_AUTO_VAR_INIT_PATTERN
|
|
def_bool $(cc-option,-ftrivial-auto-var-init=pattern)
|
|
|
|
config CC_HAS_AUTO_VAR_INIT_ZERO_BARE
|
|
def_bool $(cc-option,-ftrivial-auto-var-init=zero)
|
|
|
|
config CC_HAS_AUTO_VAR_INIT_ZERO_ENABLER
|
|
# Clang 16 and later warn about using the -enable flag, but it
|
|
# is required before then.
|
|
def_bool $(cc-option,-ftrivial-auto-var-init=zero -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang)
|
|
depends on !CC_HAS_AUTO_VAR_INIT_ZERO_BARE
|
|
|
|
config CC_HAS_AUTO_VAR_INIT_ZERO
|
|
def_bool CC_HAS_AUTO_VAR_INIT_ZERO_BARE || CC_HAS_AUTO_VAR_INIT_ZERO_ENABLER
|
|
|
|
choice
|
|
prompt "Initialize kernel stack variables at function entry"
|
|
default GCC_PLUGIN_STRUCTLEAK_BYREF_ALL if COMPILE_TEST && GCC_PLUGINS
|
|
default INIT_STACK_ALL_PATTERN if COMPILE_TEST && CC_HAS_AUTO_VAR_INIT_PATTERN
|
|
default INIT_STACK_ALL_ZERO if CC_HAS_AUTO_VAR_INIT_ZERO
|
|
default INIT_STACK_NONE
|
|
help
|
|
This option enables initialization of stack variables at
|
|
function entry time. This has the possibility to have the
|
|
greatest coverage (since all functions can have their
|
|
variables initialized), but the performance impact depends
|
|
on the function calling complexity of a given workload's
|
|
syscalls.
|
|
|
|
This chooses the level of coverage over classes of potentially
|
|
uninitialized variables. The selected class of variable will be
|
|
initialized before use in a function.
|
|
|
|
config INIT_STACK_NONE
|
|
bool "no automatic stack variable initialization (weakest)"
|
|
help
|
|
Disable automatic stack variable initialization.
|
|
This leaves the kernel vulnerable to the standard
|
|
classes of uninitialized stack variable exploits
|
|
and information exposures.
|
|
|
|
config GCC_PLUGIN_STRUCTLEAK_USER
|
|
bool "zero-init structs marked for userspace (weak)"
|
|
# Plugin can be removed once the kernel only supports GCC 12+
|
|
depends on GCC_PLUGINS && !CC_HAS_AUTO_VAR_INIT_ZERO
|
|
select GCC_PLUGIN_STRUCTLEAK
|
|
help
|
|
Zero-initialize any structures on the stack containing
|
|
a __user attribute. This can prevent some classes of
|
|
uninitialized stack variable exploits and information
|
|
exposures, like CVE-2013-2141:
|
|
https://git.kernel.org/linus/b9e146d8eb3b9eca
|
|
|
|
config GCC_PLUGIN_STRUCTLEAK_BYREF
|
|
bool "zero-init structs passed by reference (strong)"
|
|
# Plugin can be removed once the kernel only supports GCC 12+
|
|
depends on GCC_PLUGINS && !CC_HAS_AUTO_VAR_INIT_ZERO
|
|
depends on !(KASAN && KASAN_STACK)
|
|
select GCC_PLUGIN_STRUCTLEAK
|
|
help
|
|
Zero-initialize any structures on the stack that may
|
|
be passed by reference and had not already been
|
|
explicitly initialized. This can prevent most classes
|
|
of uninitialized stack variable exploits and information
|
|
exposures, like CVE-2017-1000410:
|
|
https://git.kernel.org/linus/06e7e776ca4d3654
|
|
|
|
As a side-effect, this keeps a lot of variables on the
|
|
stack that can otherwise be optimized out, so combining
|
|
this with CONFIG_KASAN_STACK can lead to a stack overflow
|
|
and is disallowed.
|
|
|
|
config GCC_PLUGIN_STRUCTLEAK_BYREF_ALL
|
|
bool "zero-init everything passed by reference (very strong)"
|
|
# Plugin can be removed once the kernel only supports GCC 12+
|
|
depends on GCC_PLUGINS && !CC_HAS_AUTO_VAR_INIT_ZERO
|
|
depends on !(KASAN && KASAN_STACK)
|
|
select GCC_PLUGIN_STRUCTLEAK
|
|
help
|
|
Zero-initialize any stack variables that may be passed
|
|
by reference and had not already been explicitly
|
|
initialized. This is intended to eliminate all classes
|
|
of uninitialized stack variable exploits and information
|
|
exposures.
|
|
|
|
As a side-effect, this keeps a lot of variables on the
|
|
stack that can otherwise be optimized out, so combining
|
|
this with CONFIG_KASAN_STACK can lead to a stack overflow
|
|
and is disallowed.
|
|
|
|
config INIT_STACK_ALL_PATTERN
|
|
bool "pattern-init everything (strongest)"
|
|
depends on CC_HAS_AUTO_VAR_INIT_PATTERN
|
|
depends on !KMSAN
|
|
help
|
|
Initializes everything on the stack (including padding)
|
|
with a specific debug value. This is intended to eliminate
|
|
all classes of uninitialized stack variable exploits and
|
|
information exposures, even variables that were warned about
|
|
having been left uninitialized.
|
|
|
|
Pattern initialization is known to provoke many existing bugs
|
|
related to uninitialized locals, e.g. pointers receive
|
|
non-NULL values, buffer sizes and indices are very big. The
|
|
pattern is situation-specific; Clang on 64-bit uses 0xAA
|
|
repeating for all types and padding except float and double
|
|
which use 0xFF repeating (-NaN). Clang on 32-bit uses 0xFF
|
|
repeating for all types and padding.
|
|
|
|
config INIT_STACK_ALL_ZERO
|
|
bool "zero-init everything (strongest and safest)"
|
|
depends on CC_HAS_AUTO_VAR_INIT_ZERO
|
|
depends on !KMSAN
|
|
help
|
|
Initializes everything on the stack (including padding)
|
|
with a zero value. This is intended to eliminate all
|
|
classes of uninitialized stack variable exploits and
|
|
information exposures, even variables that were warned
|
|
about having been left uninitialized.
|
|
|
|
Zero initialization provides safe defaults for strings
|
|
(immediately NUL-terminated), pointers (NULL), indices
|
|
(index 0), and sizes (0 length), so it is therefore more
|
|
suitable as a production security mitigation than pattern
|
|
initialization.
|
|
|
|
endchoice
|
|
|
|
config GCC_PLUGIN_STRUCTLEAK_VERBOSE
|
|
bool "Report forcefully initialized variables"
|
|
depends on GCC_PLUGIN_STRUCTLEAK
|
|
depends on !COMPILE_TEST # too noisy
|
|
help
|
|
This option will cause a warning to be printed each time the
|
|
structleak plugin finds a variable it thinks needs to be
|
|
initialized. Since not all existing initializers are detected
|
|
by the plugin, this can produce false positive warnings.
|
|
|
|
config GCC_PLUGIN_STACKLEAK
|
|
bool "Poison kernel stack before returning from syscalls"
|
|
depends on GCC_PLUGINS
|
|
depends on HAVE_ARCH_STACKLEAK
|
|
help
|
|
This option makes the kernel erase the kernel stack before
|
|
returning from system calls. This has the effect of leaving
|
|
the stack initialized to the poison value, which both reduces
|
|
the lifetime of any sensitive stack contents and reduces
|
|
potential for uninitialized stack variable exploits or information
|
|
exposures (it does not cover functions reaching the same stack
|
|
depth as prior functions during the same syscall). This blocks
|
|
most uninitialized stack variable attacks, with the performance
|
|
impact being driven by the depth of the stack usage, rather than
|
|
the function calling complexity.
|
|
|
|
The performance impact on a single CPU system kernel compilation
|
|
sees a 1% slowdown, other systems and workloads may vary and you
|
|
are advised to test this feature on your expected workload before
|
|
deploying it.
|
|
|
|
This plugin was ported from grsecurity/PaX. More information at:
|
|
* https://grsecurity.net/
|
|
* https://pax.grsecurity.net/
|
|
|
|
config GCC_PLUGIN_STACKLEAK_VERBOSE
|
|
bool "Report stack depth analysis instrumentation" if EXPERT
|
|
depends on GCC_PLUGIN_STACKLEAK
|
|
depends on !COMPILE_TEST # too noisy
|
|
help
|
|
This option will cause a warning to be printed each time the
|
|
stackleak plugin finds a function it thinks needs to be
|
|
instrumented. This is useful for comparing coverage between
|
|
builds.
|
|
|
|
config STACKLEAK_TRACK_MIN_SIZE
|
|
int "Minimum stack frame size of functions tracked by STACKLEAK"
|
|
default 100
|
|
range 0 4096
|
|
depends on GCC_PLUGIN_STACKLEAK
|
|
help
|
|
The STACKLEAK gcc plugin instruments the kernel code for tracking
|
|
the lowest border of the kernel stack (and for some other purposes).
|
|
It inserts the stackleak_track_stack() call for the functions with
|
|
a stack frame size greater than or equal to this parameter.
|
|
If unsure, leave the default value 100.
|
|
|
|
config STACKLEAK_METRICS
|
|
bool "Show STACKLEAK metrics in the /proc file system"
|
|
depends on GCC_PLUGIN_STACKLEAK
|
|
depends on PROC_FS
|
|
help
|
|
If this is set, STACKLEAK metrics for every task are available in
|
|
the /proc file system. In particular, /proc/<pid>/stack_depth
|
|
shows the maximum kernel stack consumption for the current and
|
|
previous syscalls. Although this information is not precise, it
|
|
can be useful for estimating the STACKLEAK performance impact for
|
|
your workloads.
|
|
|
|
config STACKLEAK_RUNTIME_DISABLE
|
|
bool "Allow runtime disabling of kernel stack erasing"
|
|
depends on GCC_PLUGIN_STACKLEAK
|
|
help
|
|
This option provides 'stack_erasing' sysctl, which can be used in
|
|
runtime to control kernel stack erasing for kernels built with
|
|
CONFIG_GCC_PLUGIN_STACKLEAK.
|
|
|
|
config INIT_ON_ALLOC_DEFAULT_ON
|
|
bool "Enable heap memory zeroing on allocation by default"
|
|
depends on !KMSAN
|
|
help
|
|
This has the effect of setting "init_on_alloc=1" on the kernel
|
|
command line. This can be disabled with "init_on_alloc=0".
|
|
When "init_on_alloc" is enabled, all page allocator and slab
|
|
allocator memory will be zeroed when allocated, eliminating
|
|
many kinds of "uninitialized heap memory" flaws, especially
|
|
heap content exposures. The performance impact varies by
|
|
workload, but most cases see <1% impact. Some synthetic
|
|
workloads have measured as high as 7%.
|
|
|
|
config INIT_ON_FREE_DEFAULT_ON
|
|
bool "Enable heap memory zeroing on free by default"
|
|
depends on !KMSAN
|
|
help
|
|
This has the effect of setting "init_on_free=1" on the kernel
|
|
command line. This can be disabled with "init_on_free=0".
|
|
Similar to "init_on_alloc", when "init_on_free" is enabled,
|
|
all page allocator and slab allocator memory will be zeroed
|
|
when freed, eliminating many kinds of "uninitialized heap memory"
|
|
flaws, especially heap content exposures. The primary difference
|
|
with "init_on_free" is that data lifetime in memory is reduced,
|
|
as anything freed is wiped immediately, making live forensics or
|
|
cold boot memory attacks unable to recover freed memory contents.
|
|
The performance impact varies by workload, but is more expensive
|
|
than "init_on_alloc" due to the negative cache effects of
|
|
touching "cold" memory areas. Most cases see 3-5% impact. Some
|
|
synthetic workloads have measured as high as 8%.
|
|
|
|
config CC_HAS_ZERO_CALL_USED_REGS
|
|
def_bool $(cc-option,-fzero-call-used-regs=used-gpr)
|
|
|
|
config ZERO_CALL_USED_REGS
|
|
bool "Enable register zeroing on function exit"
|
|
depends on CC_HAS_ZERO_CALL_USED_REGS
|
|
help
|
|
At the end of functions, always zero any caller-used register
|
|
contents. This helps ensure that temporary values are not
|
|
leaked beyond the function boundary. This means that register
|
|
contents are less likely to be available for side channels
|
|
and information exposures. Additionally, this helps reduce the
|
|
number of useful ROP gadgets by about 20% (and removes compiler
|
|
generated "write-what-where" gadgets) in the resulting kernel
|
|
image. This has a less than 1% performance impact on most
|
|
workloads. Image size growth depends on architecture, and should
|
|
be evaluated for suitability. For example, x86_64 grows by less
|
|
than 1%, and arm64 grows by about 5%.
|
|
|
|
endmenu
|
|
|
|
config CC_HAS_RANDSTRUCT
|
|
def_bool $(cc-option,-frandomize-layout-seed-file=/dev/null)
|
|
|
|
choice
|
|
prompt "Randomize layout of sensitive kernel structures"
|
|
default RANDSTRUCT_FULL if COMPILE_TEST && (GCC_PLUGINS || CC_HAS_RANDSTRUCT)
|
|
default RANDSTRUCT_NONE
|
|
help
|
|
If you enable this, the layouts of structures that are entirely
|
|
function pointers (and have not been manually annotated with
|
|
__no_randomize_layout), or structures that have been explicitly
|
|
marked with __randomize_layout, will be randomized at compile-time.
|
|
This can introduce the requirement of an additional information
|
|
exposure vulnerability for exploits targeting these structure
|
|
types.
|
|
|
|
Enabling this feature will introduce some performance impact,
|
|
slightly increase memory usage, and prevent the use of forensic
|
|
tools like Volatility against the system (unless the kernel
|
|
source tree isn't cleaned after kernel installation).
|
|
|
|
The seed used for compilation is in scripts/basic/randomize.seed.
|
|
It remains after a "make clean" to allow for external modules to
|
|
be compiled with the existing seed and will be removed by a
|
|
"make mrproper" or "make distclean". This file should not be made
|
|
public, or the structure layout can be determined.
|
|
|
|
config RANDSTRUCT_NONE
|
|
bool "Disable structure layout randomization"
|
|
help
|
|
Build normally: no structure layout randomization.
|
|
|
|
config RANDSTRUCT_FULL
|
|
bool "Fully randomize structure layout"
|
|
depends on CC_HAS_RANDSTRUCT || GCC_PLUGINS
|
|
select MODVERSIONS if MODULES
|
|
help
|
|
Fully randomize the member layout of sensitive
|
|
structures as much as possible, which may have both a
|
|
memory size and performance impact.
|
|
|
|
One difference between the Clang and GCC plugin
|
|
implementations is the handling of bitfields. The GCC
|
|
plugin treats them as fully separate variables,
|
|
introducing sometimes significant padding. Clang tries
|
|
to keep adjacent bitfields together, but with their bit
|
|
ordering randomized.
|
|
|
|
config RANDSTRUCT_PERFORMANCE
|
|
bool "Limit randomization of structure layout to cache-lines"
|
|
depends on GCC_PLUGINS
|
|
select MODVERSIONS if MODULES
|
|
help
|
|
Randomization of sensitive kernel structures will make a
|
|
best effort at restricting randomization to cacheline-sized
|
|
groups of members. It will further not randomize bitfields
|
|
in structures. This reduces the performance hit of RANDSTRUCT
|
|
at the cost of weakened randomization.
|
|
endchoice
|
|
|
|
config RANDSTRUCT
|
|
def_bool !RANDSTRUCT_NONE
|
|
|
|
config GCC_PLUGIN_RANDSTRUCT
|
|
def_bool GCC_PLUGINS && RANDSTRUCT
|
|
help
|
|
Use GCC plugin to randomize structure layout.
|
|
|
|
This plugin was ported from grsecurity/PaX. More
|
|
information at:
|
|
* https://grsecurity.net/
|
|
* https://pax.grsecurity.net/
|
|
|
|
endmenu
|