Add a document for the macro language introduced to Kconfig.
The motivation of this work is to move the compiler option tests to
Kconfig from Makefile. A number of kernel features require the
compiler support. Enabling such features blindly in Kconfig ends up
with a lot of nasty build-time testing in Makefiles. If a chosen
feature turns out unsupported by the compiler, what the build system
can do is either to disable it (silently!) or to forcibly break the
build, despite Kconfig has let the user to enable it. By moving the
compiler capability tests to Kconfig, features unsupported by the
compiler will be hidden automatically.
This change was strongly prompted by Linus Torvalds. You can find
his suggestions [1] [2] in ML. The original idea was to add a new
attribute with 'option shell=...', but I found more generalized text
expansion would make Kconfig more powerful and lovely. The basic
ideas are from Make, but there are some differences.
[1]: https://lkml.org/lkml/2016/12/9/577
[2]: https://lkml.org/lkml/2018/2/7/527
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
When using a recursively expanded variable, it is a common mistake
to make circular reference.
For example, Make terminates the following code:
X = $(X)
Y := $(X)
Let's detect the circular expansion in Kconfig, too.
On the other hand, a function that recurses itself is a commonly-used
programming technique. So, Make does not check recursion in the
reference with 'call'. For example, the following code continues
running eternally:
X = $(call X)
Y := $(X)
Kconfig allows circular expansion if one or more arguments are given,
but terminates when the same function is recursively invoked 1000 times,
assuming it is a programming mistake.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
The special variables, $(filename) and $(lineno), are expanded to a
file name and its line number being parsed, respectively.
Suggested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Syntax:
$(info,<text>)
$(warning-if,<condition>,<text>)
$(error-if,<condition>,<text)
The 'info' function prints a message to stdout as in Make.
The 'warning-if' and 'error-if' are similar to 'warning' and 'error'
in Make, but take the condition parameter. They are effective only
when the <condition> part is y.
Kconfig does not implement the lazy expansion as used in the 'if'
'and, 'or' functions in Make. In other words, Kconfig does not
support conditional expansion. The unconditional 'error' function
would always terminate the parsing, hence would be useless in Kconfig.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Make expands the lefthand side of assignment statements. In fact,
Kbuild relies on it since kernel makefiles mostly look like this:
obj-$(CONFIG_FOO) += foo.o
Do likewise in Kconfig.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Support += operator. This appends a space and the text on the
righthand side to a variable.
The timing of the evaluation of the righthand side depends on the
flavor of the variable. If the lefthand side was originally defined
as a simple variable, the righthand side is expanded immediately.
Otherwise, the expansion is deferred. Appending something to an
undefined variable results in a recursive variable.
To implement this, we need to remember the flavor of variables.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
The previous commit added variable and user-defined function. They
work similarly in the sense that the evaluation is deferred until
they are used.
This commit adds another type of variable, simply expanded variable,
as we see in Make.
The := operator defines a simply expanded variable, expanding the
righthand side immediately. This works like traditional programming
language variables.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Now, we got a basic ability to test compiler capability in Kconfig.
config CC_HAS_STACKPROTECTOR
def_bool $(shell,($(CC) -Werror -fstack-protector -E -x c /dev/null -o /dev/null 2>/dev/null) && echo y || echo n)
This works, but it is ugly to repeat this long boilerplate.
We want to describe like this:
config CC_HAS_STACKPROTECTOR
bool
default $(cc-option,-fstack-protector)
It is straight-forward to add a new function, but I do not like to
hard-code specialized functions like that. Hence, here is another
feature, user-defined function. This works as a textual shorthand
with parameterization.
A user-defined function is defined by using the = operator, and can
be referenced in the same way as built-in functions. A user-defined
function in Make is referenced like $(call my-func,arg1,arg2), but I
omitted the 'call' to make the syntax shorter.
The definition of a user-defined function contains $(1), $(2), etc.
in its body to reference the parameters. It is grammatically valid
to pass more or fewer arguments when calling it. We already exploit
this feature in our makefiles; scripts/Kbuild.include defines cc-option
which takes two arguments at most, but most of the callers pass only
one argument.
By the way, a variable is supported as a subset of this feature since
a variable is "a user-defined function with zero argument". In this
context, I mean "variable" as recursively expanded variable. I will
add a different flavored variable in the next commit.
The code above can be written as follows:
[Example Code]
success = $(shell,($(1)) >/dev/null 2>&1 && echo y || echo n)
cc-option = $(success,$(CC) -Werror $(1) -E -x c /dev/null -o /dev/null)
config CC_HAS_STACKPROTECTOR
def_bool $(cc-option,-fstack-protector)
[Result]
$ make -s alldefconfig && tail -n 1 .config
CONFIG_CC_HAS_STACKPROTECTOR=y
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Currently, any statement line starts with a keyword with TF_COMMAND
flag. So, the following three lines are dead code.
alloc_string(yytext, yyleng);
zconflval.string = text;
return T_WORD;
If a T_WORD token is returned in this context, it will cause syntax
error in the parser anyway.
The next commit will support the assignment statement where a line
starts with an arbitrary identifier. So, I want the lexer to switch
to the PARAM state only when it sees a command keyword.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Now that 'shell' function is supported, this can be self-contained in
Kconfig.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ulf Magnusson <ulfalizer@gmail.com>
This accepts a single command to execute. It returns the standard
output from it.
[Example code]
config HELLO
string
default "$(shell,echo hello world)"
config Y
def_bool $(shell,echo y)
[Result]
$ make -s alldefconfig && tail -n 2 .config
CONFIG_HELLO="hello world"
CONFIG_Y=y
Caveat:
Like environments, functions are expanded in the lexer. You cannot
pass symbols to function arguments. This is a limitation to simplify
the implementation. I want to avoid the dynamic function evaluation,
which would introduce much more complexity.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
This commit adds a new concept 'function' to do more text processing
in Kconfig.
A function call looks like this:
$(function,arg1,arg2,arg3,...)
This commit adds the basic infrastructure to expand functions.
Change the text expansion helpers to take arguments.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
If "mainmenu" is not specified, "Linux Kernel Configuration" is used
as a default prompt.
Given that Kconfig is used in other projects than Linux, let's use
a more generic prompt, "Main menu".
Suggested-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
There is no more caller of sym_expand_string_value().
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Now that environments are expanded in the lexer, conf_parse() does
not need to expand them explicitly.
The hack introduced by commit 0724a7c32a ("kconfig: Don't leak
main menus during parsing") can go away.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ulf Magnusson <ulfalizer@gmail.com>
There are two callers of file_lookup(), but there is no more reason
to expand the given path.
[1] zconf_initscan()
This is used to open the first Kconfig. sym_expand_string_value()
has never been used in a useful way here; before opening the first
Kconfig file, obviously there is no symbol to expand. If you use
expand_string_value() instead, environments in KBUILD_KCONFIG would
be expanded, but I do not see practical benefits for that.
[2] zconf_nextfile()
This is used to open the next file from 'source' statement.
Symbols in the path like "arch/$SRCARCH/Kconfig" needed expanding,
but it was replaced with the direct environment expansion. The
environment has already been expanded before the token is passed
to the parser.
By the way, file_lookup() was already buggy; it expanded a given path,
but it used the path before expansion for look-up:
if (!strcmp(name, file->name)) {
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ulf Magnusson <ulfalizer@gmail.com>
To get access to environment variables, Kconfig needs to define a
symbol using "option env=" syntax. It is tedious to add a symbol entry
for each environment variable given that we need to define much more
such as 'CC', 'AS', 'srctree' etc. to evaluate the compiler capability
in Kconfig.
Adding '$' for symbol references is grammatically inconsistent.
Looking at the code, the symbols prefixed with 'S' are expanded by:
- conf_expand_value()
This is used to expand 'arch/$ARCH/defconfig' and 'defconfig_list'
- sym_expand_string_value()
This is used to expand strings in 'source' and 'mainmenu'
All of them are fixed values independent of user configuration. So,
they can be changed into the direct expansion instead of symbols.
This change makes the code much cleaner. The bounce symbols 'SRCARCH',
'ARCH', 'SUBARCH', 'KERNELVERSION' are gone.
sym_init() hard-coding 'UNAME_RELEASE' is also gone. 'UNAME_RELEASE'
should be replaced with an environment variable.
ARCH_DEFCONFIG is a normal symbol, so it should be simply referenced
without '$' prefix.
The new syntax is addicted by Make. The variable reference needs
parentheses, like $(FOO), but you can omit them for single-letter
variables, like $F. Yet, in Makefiles, people tend to use the
parenthetical form for consistency / clarification.
At this moment, only the environment variable is supported, but I will
extend the concept of 'variable' later on.
The variables are expanded in the lexer so we can simplify the token
handling on the parser side.
For example, the following code works.
[Example code]
config MY_TOOLCHAIN_LIST
string
default "My tools: CC=$(CC), AS=$(AS), CPP=$(CPP)"
[Result]
$ make -s alldefconfig && tail -n 1 .config
CONFIG_MY_TOOLCHAIN_LIST="My tools: CC=gcc, AS=as, CPP=gcc -E"
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Kbuild provides a couple of ways to specify CROSS_COMPILE:
[1] Command line
[2] Environment
[3] arch/*/Makefile (only some architectures)
[4] CONFIG_CROSS_COMPILE
[4] is problematic for the compiler capability tests in Kconfig.
CONFIG_CROSS_COMPILE allows users to change the compiler prefix from
'make menuconfig', etc. It means, the compiler options would have
to be all re-calculated everytime CONFIG_CROSS_COMPILE is changed.
To avoid complexity and performance issues, I'd like to evaluate
the shell commands statically, i.e. only parsing Kconfig files.
I guess the majority is [1] or [2]. Currently, there are only
5 defconfig files that specify CONFIG_CROSS_COMPILE.
arch/arm/configs/lpc18xx_defconfig
arch/hexagon/configs/comet_defconfig
arch/nds32/configs/defconfig
arch/openrisc/configs/or1ksim_defconfig
arch/openrisc/configs/simple_smp_defconfig
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
The kbuild cache was introduced to remember the result of shell
commands, some of which are expensive to compute, such as
$(call cc-option,...).
However, this turned out not so clever as I had first expected.
Actually, it is problematic. For example, "$(CC) -print-file-name"
is cached. If the compiler is updated, the stale search path causes
build error, which is difficult to figure out. Another problem
scenario is cache files could be touched while install targets are
running under the root permission. We can patch them if desired,
but the build infrastructure is getting uglier and uglier.
Now, we are going to move compiler flag tests to the configuration
phase. If this is completed, the result of compiler tests will be
naturally cached in the .config file. We will not have performance
issues of incremental building since this testing only happens at
Kconfig time.
To start this work with a cleaner code base, remove the kbuild
cache first.
Revert the following commits:
Commit 9a234a2e38 ("kbuild: create directory for make cache only when necessary")
Commit e17c400ae1 ("kbuild: shrink .cache.mk when it exceeds 1000 lines")
Commit 4e56207130 ("kbuild: Cache a few more calls to the compiler")
Commit 3298b690b2 ("kbuild: Add a cache for generated variables")
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
The localization support is broken and appears unused.
There is no google hits on the update-po-config target.
And there is no recent (5 years) activity related to the localization.
So lets just drop this as it is no longer used.
Suggested-by: Ulf Magnusson <ulfalizer@gmail.com>
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
The mconf (or its infrastructure, lxdiaglog) depends on the ncurses.
Move and rename check-lxdialog.sh to mconf-cfg.sh to make it work in
the same way as for qconf and gconf.
This commit fixes some more weirdnesses.
The nconf also needs ncurses packages. HOSTLOADLIBES_nconf is set
to the libraries needed for nconf, but the cflags is not explicitly
set. Actually, nconf relies on the check-lxdialog.sh for the proper
cflags:
HOST_EXTRACFLAGS += $(shell $(CONFIG_SHELL) $(check-lxdialog) -ccflags) \
-DLOCALE
The code above passes the ncurses flags to all objects, even for conf,
qconf, gconf. Let's pass the ncurses flags only to mconf and nconf.
Currently, the presence of ncurses is not checked for nconf. Let's
show a prompt like the mconf case.
According to Randy's report, the shell scripts still need to carry
the fallback code in case the pkg-config fails to find the ncurses
packages.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Refactor the package checks for gconf in the same way as for qconf.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Currently, the necessary package checks for building qconf is
surrounded by ifeq ($(MAKECMDGOALS),xconfig) ... endif.
Then, Make will restart when .tmp_qtcheck is generated.
To simplify the Makefile, move the scripting to a separate file,
and use filechk. The shell script is executed everytime xconfig
is run, but it is not a costly script.
In the old code, 'pkg-config --exists' only checked Qt5Core / QtCore,
but the set of necessary packages should be checked.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
filechk displays two short logs; CHK for creating a temporary file,
and UPD for really updating the target.
IMHO, the build system can be quiet when the target file has not
been updated.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
- enable -fno-tree-loop-im only when supported
- add -fno-PIE option before the asm-goto test
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJbCkFPAAoJED2LAQed4NsGZVwP/juiS21QZAiQbUdX3FRcyYGs
+FDNLOwNdSU18QkdVrcJ4tG8hxBZqhIU0kq1MVE72Yo10xX8u7ssUJ0ttrUo5qIb
vlXtnqMOaZgLWnoNMGlDVPnNBxZh2UscbvjVGa5m9eqXrCU9AtQiCCoSceRtka12
tOBbfeTeJ8Ab2BKfzHcuqS+DSURkQGTyG4q1ZMxmdtIsltbZIez/zauRtAU/ULKx
Ed6HAdNiiMXRwsXnAwcGnJe9FyW7UPjZOdLn0vSizZQe8BJ+H+EotZy7FO8L407w
lgLVccCSZEFAilJRR+Xa1pMlg1KwSINcMK9BVOjIeeZL0kAIaC1zzVaPEbZ1MyDA
HKtX/MeDGX52ZW9SBCFQYKVsZQecYtyr27Z+c+8Af37sB3/ffBSeQc7YilsIGjSZ
MWARYbkOAcUif8IG6ymnEv2a4IOcD4rYNMkUfs8vXeJjejiP5rhA8zxWYng1DRmw
0g4x2iQeY7erUu/elflNa94e+PSgnwnmzWdloBqcmOtGxV+K+9BVaNsVmchyMAzt
PbQq1T8zodfr2+Jsf+yj1rWv3fLnahYh/WVAKj1rB/+Q31sYfvPlEmzayk2k9enK
Sgu5amtl64tgZD3zcSs1Ik39Ioe7s1Kf0W1Li8f2v1JR5t38UX5zkOa+O5w+sq77
NSBoCCRtn0eY3j/wo5kS
=r3P/
-----END PGP SIGNATURE-----
Merge tag 'kbuild-fixes-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild fixes from Masahiro Yamada:
- enable '-fno-tree-loop-im' only when supported
- add '-fno-PIE' option before the asm-goto test
* tag 'kbuild-fixes-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
Makefile: disable PIE before testing asm goto
kbuild: gcov: enable -fno-tree-loop-im if supported
A few more fixes for v4.17:
- A fix for a crash in scm_call_atomic on qcom platforms
- Display fix for Allwinner A10
- A fix that re-enables ethernet on Allwinner H3 (C.H.I.P et al)
- A fix for eMMC corruption on hikey
- i2c-gpio descriptor tables for ixp4xx
+ a small typo fix
-----BEGIN PGP SIGNATURE-----
iQJDBAABCAAtFiEElf+HevZ4QCAJmMQ+jBrnPN6EHHcFAlsJynEPHG9sb2ZAbGl4
b20ubmV0AAoJEIwa5zzehBx3QEkP/A5dGXeQkArCWPvWoFr+20KjIS07f7F8olNy
9JKG3R2uEZsqjD3c6HFkd1abTtUQmgg/hmpxakAI8vbypA4gsq9jyFC6TxqsBSyz
uw7hQ5XcGA99pQXp8jYUrazi/XnG9Wm8LLBslsx75wJwNikzlAl6PStKDFcz0Pr6
A9JXWnqFY50YRzUr4y9GrSo3o4dvVniF3PUFEwnYliUI5qszph2/rwaE2zLQt/PT
X0DMA4v+c+4ngS5TGipY4vFjRyvsOv/NeDQzGTvGcU6QMdP4ZEsQBrye6BqowmaD
DqaoSHvsi7Lel4u29p5KyBKrM0bAhtFX+iCGiqTfkKwRWHkh7CHombUk2qX/9OJW
oB9orkKgiP35xAL5xFmB5tf03s0tQ8/qicE72tGW/TVIEBX/l+ymD76DH4rmYvRw
wNZ+HwHrMVkYgVG0TQIxxEgkXbPsyDbk3DbNbQkHf/pV5+PsMrp0iSo7oaglsS9Y
NYTRA/DQCldzhv68YRoMBh5gD4oE5iK3e3c4nLm80vd7zj8YsuXnc4+55a8PrHfs
oVg0PE5fVlP3AVRJW09ikdf03U7m0AFX/fFKHrAwWylT1+Z1KSJhM4ZaXGgdvuOV
asFUenzF3WF6Nsx+smL/vLzr/AvvYeq80Q9OdLWQl4056HurkrpL/E2HVj4MYaoW
WKKRdfzX
=mga+
-----END PGP SIGNATURE-----
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"A few more fixes for v4.17:
- a fix for a crash in scm_call_atomic on qcom platforms
- display fix for Allwinner A10
- a fix that re-enables ethernet on Allwinner H3 (C.H.I.P et al)
- a fix for eMMC corruption on hikey
- i2c-gpio descriptor tables for ixp4xx
... plus a small typo fix"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: Fix i2c-gpio GPIO descriptor tables
arm64: dts: hikey: Fix eMMC corruption regression
firmware: qcom: scm: Fix crash in qcom_scm_call_atomic1()
ARM: sun8i: v3s: fix spelling mistake: "disbaled" -> "disabled"
ARM: dts: sun4i: Fix incorrect clocks for displays
ARM: dts: sun8i: h3: Re-enable EMAC on Orange Pi One
Pull x86 store buffer fixes from Thomas Gleixner:
"Two fixes for the SSBD mitigation code:
- expose SSBD properly to guests. This got broken when the CPU
feature flags got reshuffled.
- simplify the CPU detection logic to avoid duplicate entries in the
tables"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation: Simplify the CPU bug detection logic
KVM/VMX: Expose SSBD properly to guests
Pull scheduler fixes from Thomas Gleixner:
"Three fixes for scheduler and kthread code:
- allow calling kthread_park() on an already parked thread
- restore the sched_pi_setprio() tracepoint behaviour
- clarify the unclear string for the scheduling domain debug output"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched, tracing: Fix trace_sched_pi_setprio() for deboosting
kthread: Allow kthread_park() on a parked kthread
sched/topology: Clarify root domain(s) debug string
I used bad names in my clumsiness when rewriting many board
files to use GPIO descriptors instead of platform data. A few
had the platform_device ID set to -1 which would indeed give
the device name "i2c-gpio".
But several had it set to >=0 which gives the names
"i2c-gpio.0", "i2c-gpio.1" ...
Fix the offending instances in the ARM tree. Sorry for the
mess.
Fixes: b2e6355559 ("i2c: gpio: Convert to use descriptors")
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: Simon Guinot <simon.guinot@sequanux.org>
Reported-by: Simon Guinot <simon.guinot@sequanux.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
PPC:
- Close a hole which could possibly lead to the host timebase getting
out of sync.
- Three fixes relating to PTEs and TLB entries for radix guests.
- Fix a bug which could lead to an interrupt never getting delivered
to the guest, if it is pending for a guest vCPU when the vCPU gets
offlined.
s390:
- Fix false negatives in VSIE validity check (Cc stable)
x86:
- Fix time drift of VMX preemption timer when a guest uses LAPIC timer
in periodic mode (Cc stable)
- Unconditionally expose CPUID.IA32_ARCH_CAPABILITIES to allow
migration from hosts that don't need retpoline mitigation (Cc stable)
- Fix guest crashes on reboot by properly coupling CR4.OSXSAVE and
CPUID.OSXSAVE (Cc stable)
- Report correct RIP after Hyper-V hypercall #UD (introduced in -rc6)
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJbCXxHAAoJEED/6hsPKofon5oIAKTwpbpBi0UKIyYcHQ2pwIoP
+qITTZUGGhEaIfe+aDkzE4vxVIA2ywYCbaC2+OSy4gNVThnytRL8WuhLyV8WLmlC
sDVSQ87RWaN8mW6hEJ95qXMS7FS0TsDJdytaw+c8OpODrsykw1XMSyV2rMLb0sMT
SmfioO2kuDx5JQGyiAPKFFXKHjAnnkH+OtffNemAEHGoPpenJ4qLRuXvrjQU8XT6
tVARIBZsutee5ITIsBKVDmI2n98mUoIe9na21M7N2QaJ98IF+qRz5CxZyL1CgvFk
tHqG8PZ/bqhnmuIIR5Di919UmhamOC3MODsKUVeciBLDS6LHlhado+HEpj6B8mI=
=ygB7
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"PPC:
- Close a hole which could possibly lead to the host timebase getting
out of sync.
- Three fixes relating to PTEs and TLB entries for radix guests.
- Fix a bug which could lead to an interrupt never getting delivered
to the guest, if it is pending for a guest vCPU when the vCPU gets
offlined.
s390:
- Fix false negatives in VSIE validity check (Cc stable)
x86:
- Fix time drift of VMX preemption timer when a guest uses LAPIC
timer in periodic mode (Cc stable)
- Unconditionally expose CPUID.IA32_ARCH_CAPABILITIES to allow
migration from hosts that don't need retpoline mitigation (Cc
stable)
- Fix guest crashes on reboot by properly coupling CR4.OSXSAVE and
CPUID.OSXSAVE (Cc stable)
- Report correct RIP after Hyper-V hypercall #UD (introduced in
-rc6)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: fix #UD address of failed Hyper-V hypercalls
kvm: x86: IA32_ARCH_CAPABILITIES is always supported
KVM: x86: Update cpuid properly when CR4.OSXAVE or CR4.PKE is changed
x86/kvm: fix LAPIC timer drift when guest uses periodic mode
KVM: s390: vsie: fix < 8k check for the itdba
KVM: PPC: Book 3S HV: Do ptesync in radix guest exit path
KVM: PPC: Book3S HV: XIVE: Resend re-routed interrupts on CPU priority change
KVM: PPC: Book3S HV: Make radix clear pte when unmapping
KVM: PPC: Book3S HV: Make radix use correct tlbie sequence in kvmppc_radix_tlbie_page
KVM: PPC: Book3S HV: Snapshot timebase offset on guest entry
This patch is a partial revert of
commit abd7d0972a ("arm64: dts: hikey: Enable HS200 mode on eMMC")
which has been causing eMMC corruption on my HiKey board.
Symptoms usually looked like:
mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
...
mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0)
mmc0: new HS200 MMC card at address 0001
...
dwmmc_k3 f723d000.dwmmc0: Unexpected command timeout, state 3
mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0)
mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0)
mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0)
print_req_error: I/O error, dev mmcblk0, sector 8810504
Aborting journal on device mmcblk0p10-8.
mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0)
mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0)
mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0)
mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0)
EXT4-fs error (device mmcblk0p10): ext4_journal_check_start:61: Detected aborted journal
EXT4-fs (mmcblk0p10): Remounting filesystem read-only
And quite often this would result in a disk that wouldn't properly
boot even with older kernels.
It seems the max-frequency property added by the above patch is
causing the problem, so remove it.
Cc: Ryan Grachek <ryan@edited.us>
Cc: Wei Xu <xuwei5@hisilicon.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: YongQin Liu <yongqin.liu@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Wei Xu <xuwei04@gmail.com>
Merge misc fixes from Andrew Morton:
"16 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kasan: fix memory hotplug during boot
kasan: free allocated shadow memory on MEM_CANCEL_ONLINE
checkpatch: fix macro argument precedence test
init/main.c: include <linux/mem_encrypt.h>
kernel/sys.c: fix potential Spectre v1 issue
mm/memory_hotplug: fix leftover use of struct page during hotplug
proc: fix smaps and meminfo alignment
mm: do not warn on offline nodes unless the specific node is explicitly requested
mm, memory_hotplug: make has_unmovable_pages more robust
mm/kasan: don't vfree() nonexistent vm_area
MAINTAINERS: change hugetlbfs maintainer and update files
ipc/shm: fix shmat() nil address after round-down when remapping
Revert "ipc/shm: Fix shmat mmap nil-page protection"
idr: fix invalid ptr dereference on item delete
ocfs2: revert "ocfs2/o2hb: check len for bio_add_page() to avoid getting incorrect bio"
mm: fix nr_rotate_swap leak in swapon() error case
Pull networking fixes from David Miller:
"Let's begin the holiday weekend with some networking fixes:
1) Whoops need to restrict cfg80211 wiphy names even more to 64
bytes. From Eric Biggers.
2) Fix flags being ignored when using kernel_connect() with SCTP,
from Xin Long.
3) Use after free in DCCP, from Alexey Kodanev.
4) Need to check rhltable_init() return value in ipmr code, from Eric
Dumazet.
5) XDP handling fixes in virtio_net from Jason Wang.
6) Missing RTA_TABLE in rtm_ipv4_policy[], from Roopa Prabhu.
7) Need to use IRQ disabling spinlocks in mlx4_qp_lookup(), from Jack
Morgenstein.
8) Prevent out-of-bounds speculation using indexes in BPF, from
Daniel Borkmann.
9) Fix regression added by AF_PACKET link layer cure, from Willem de
Bruijn.
10) Correct ENIC dma mask, from Govindarajulu Varadarajan.
11) Missing config options for PMTU tests, from Stefano Brivio"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (48 commits)
ibmvnic: Fix partial success login retries
selftests/net: Add missing config options for PMTU tests
mlx4_core: allocate ICM memory in page size chunks
enic: set DMA mask to 47 bit
ppp: remove the PPPIOCDETACH ioctl
ipv4: remove warning in ip_recv_error
net : sched: cls_api: deal with egdev path only if needed
vhost: synchronize IOTLB message with dev cleanup
packet: fix reserve calculation
net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands
net/mlx5e: When RXFCS is set, add FCS data into checksum calculation
bpf: properly enforce index mask to prevent out-of-bounds speculation
net/mlx4: Fix irq-unsafe spinlock usage
net: phy: broadcom: Fix bcm_write_exp()
net: phy: broadcom: Fix auxiliary control register reads
net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy
net/mlx4: fix spelling mistake: "Inrerface" -> "Interface" and rephrase message
ibmvnic: Only do H_EOI for mobility events
tuntap: correctly set SOCKWQ_ASYNC_NOSPACE
virtio-net: fix leaking page for gso packet during mergeable XDP
...
Using module_init() is wrong. E.g. ACPI adds and onlines memory before
our memory notifier gets registered.
This makes sure that ACPI memory detected during boot up will not result
in a kernel crash.
Easily reproducible with QEMU, just specify a DIMM when starting up.
Link: http://lkml.kernel.org/r/20180522100756.18478-3-david@redhat.com
Fixes: 786a895991 ("kasan: disable memory hotplug")
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have to free memory again when we cancel onlining, otherwise a later
onlining attempt will fail.
Link: http://lkml.kernel.org/r/20180522100756.18478-2-david@redhat.com
Fixes: fa69b5989b ("mm/kasan: add support for memory hotplug")
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In commit c7753208a9 ("x86, swiotlb: Add memory encryption support") a
call to function `mem_encrypt_init' was added. Include prototype
defined in header <linux/mem_encrypt.h> to prevent a warning reported
during compilation with W=1:
init/main.c:494:20: warning: no previous prototype for `mem_encrypt_init' [-Wmissing-prototypes]
Link: http://lkml.kernel.org/r/20180522195533.31415-1-malat@debian.org
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <lauraa@codeaurora.org>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Gargi Sharma <gs051095@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
`resource' can be controlled by user-space, hence leading to a potential
exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
kernel/sys.c:1474 __do_compat_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap)
kernel/sys.c:1455 __do_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap)
Fix this by sanitizing *resource* before using it to index
current->signal->rlim
Notice that given that speculation windows are large, the policy is to
kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Link: http://lkml.kernel.org/r/20180515030038.GA11822@embeddedor.com
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The case of a new numa node got missed in avoiding using the node info
from page_struct during hotplug. In this path we have a call to
register_mem_sect_under_node (which allows us to specify it is hotplug
so don't change the node), via link_mem_sections which unfortunately
does not.
Fix is to pass check_nid through link_mem_sections as well and disable
it in the new numa node path.
Note the bug only 'sometimes' manifests depending on what happens to be
in the struct page structures - there are lots of them and it only needs
to match one of them.
The result of the bug is that (with a new memory only node) we never
successfully call register_mem_sect_under_node so don't get the memory
associated with the node in sysfs and meminfo for the node doesn't
report it.
It came up whilst testing some arm64 hotplug patches, but appears to be
universal. Whilst I'm triggering it by removing then reinserting memory
to a node with no other elements (thus making the node disappear then
appear again), it appears it would happen on hotplugging memory where
there was none before and it doesn't seem to be related the arm64
patches.
These patches call __add_pages (where most of the issue was fixed by
Pavel's patch). If there is a node at the time of the __add_pages call
then all is well as it calls register_mem_sect_under_node from there
with check_nid set to false. Without a node that function returns
having not done the sysfs related stuff as there is no node to use.
This is expected but it is the resulting path that fails...
Exact path to the problem is as follows:
mm/memory_hotplug.c: add_memory_resource()
The node is not online so we enter the 'if (new_node)' twice, on the
second such block there is a call to link_mem_sections which calls
into
drivers/node.c: link_mem_sections() which calls
drivers/node.c: register_mem_sect_under_node() which calls
get_nid_for_pfn and keeps trying until the output of that matches
the expected node (passed all the way down from
add_memory_resource)
It is effectively the same fix as the one referred to in the fixes tag
just in the code path for a new node where the comments point out we
have to rerun the link creation because it will have failed in
register_new_memory (as there was no node at the time). (actually that
comment is wrong now as we don't have register_new_memory any more it
got renamed to hotplug_memory_register in Pavel's patch).
Link: http://lkml.kernel.org/r/20180504085311.1240-1-Jonathan.Cameron@huawei.com
Fixes: fc44f7f923 ("mm/memory_hotplug: don't read nid from struct page during hotplug")
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The 4.17-rc /proc/meminfo and /proc/<pid>/smaps look ugly: single-digit
numbers (commonly 0) are misaligned.
Remove seq_put_decimal_ull_width()'s leftover optimization for single
digits: it's wrong now that num_to_str() takes care of the width.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1805241554210.1326@eggly.anvils
Fixes: d1be35cb6f ("proc: add seq_put_decimal_ull_width to speed up /proc/pid/smaps")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Andrei Vagin <avagin@openvz.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oscar has noticed that we splat
WARNING: CPU: 0 PID: 64 at ./include/linux/gfp.h:467 vmemmap_alloc_block+0x4e/0xc9
[...]
CPU: 0 PID: 64 Comm: kworker/u4:1 Tainted: G W E 4.17.0-rc5-next-20180517-1-default+ #66
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Workqueue: kacpi_hotplug acpi_hotplug_work_fn
Call Trace:
vmemmap_populate+0xf2/0x2ae
sparse_mem_map_populate+0x28/0x35
sparse_add_one_section+0x4c/0x187
__add_pages+0xe7/0x1a0
add_pages+0x16/0x70
add_memory_resource+0xa3/0x1d0
add_memory+0xe4/0x110
acpi_memory_device_add+0x134/0x2e0
acpi_bus_attach+0xd9/0x190
acpi_bus_scan+0x37/0x70
acpi_device_hotplug+0x389/0x4e0
acpi_hotplug_work_fn+0x1a/0x30
process_one_work+0x146/0x340
worker_thread+0x47/0x3e0
kthread+0xf5/0x130
ret_from_fork+0x35/0x40
when adding memory to a node that is currently offline.
The VM_WARN_ON is just too loud without a good reason. In this
particular case we are doing
alloc_pages_node(node, GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_NOWARN, order)
so we do not insist on allocating from the given node (it is more a
hint) so we can fall back to any other populated node and moreover we
explicitly ask to not warn for the allocation failure.
Soften the warning only to cases when somebody asks for the given node
explicitly by __GFP_THISNODE.
Link: http://lkml.kernel.org/r/20180523125555.30039-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Oscar Salvador <osalvador@techadventures.net>
Tested-by: Oscar Salvador <osalvador@techadventures.net>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oscar has reported:
: Due to an unfortunate setting with movablecore, memblocks containing bootmem
: memory (pages marked by get_page_bootmem()) ended up marked in zone_movable.
: So while trying to remove that memory, the system failed in do_migrate_range
: and __offline_pages never returned.
:
: This can be reproduced by running
: qemu-system-x86_64 -m 6G,slots=8,maxmem=8G -numa node,mem=4096M -numa node,mem=2048M
: and movablecore=4G kernel command line
:
: linux kernel: BIOS-provided physical RAM map:
: linux kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
: linux kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
: linux kernel: BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
: linux kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
: linux kernel: BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
: linux kernel: BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
: linux kernel: BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
: linux kernel: BIOS-e820: [mem 0x0000000100000000-0x00000001bfffffff] usable
: linux kernel: NX (Execute Disable) protection: active
: linux kernel: SMBIOS 2.8 present.
: linux kernel: DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org
: linux kernel: Hypervisor detected: KVM
: linux kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
: linux kernel: e820: remove [mem 0x000a0000-0x000fffff] usable
: linux kernel: last_pfn = 0x1c0000 max_arch_pfn = 0x400000000
:
: linux kernel: SRAT: PXM 0 -> APIC 0x00 -> Node 0
: linux kernel: SRAT: PXM 1 -> APIC 0x01 -> Node 1
: linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
: linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
: linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x13fffffff]
: linux kernel: ACPI: SRAT: Node 1 PXM 1 [mem 0x140000000-0x1bfffffff]
: linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x1c0000000-0x43fffffff] hotplug
: linux kernel: NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x0
: linux kernel: NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x13fffffff] -> [mem 0
: linux kernel: NODE_DATA(0) allocated [mem 0x13ffd6000-0x13fffffff]
: linux kernel: NODE_DATA(1) allocated [mem 0x1bffd3000-0x1bfffcfff]
:
: zoneinfo shows that the zone movable is placed into both numa nodes:
: Node 0, zone Movable
: pages free 160140
: min 1823
: low 2278
: high 2733
: spanned 262144
: present 262144
: managed 245670
: Node 1, zone Movable
: pages free 448427
: min 3827
: low 4783
: high 5739
: spanned 524288
: present 524288
: managed 515766
Note how only Node 0 has a hutplugable memory region which would rule it
out from the early memblock allocations (most likely memmap). Node1
will surely contain memmaps on the same node and those would prevent
offlining to succeed. So this is arguably a configuration issue.
Although one could argue that we should be more clever and rule early
allocations from the zone movable. This would be correct but probably
not worth the effort considering what a hack movablecore is.
Anyway, We could do better for those cases though. We rely on
start_isolate_page_range resp. has_unmovable_pages to do their job.
The first one isolates the whole range to be offlined so that we do not
allocate from it anymore and the later makes sure we are not stumbling
over non-migrateable pages.
has_unmovable_pages is overly optimistic, however. It doesn't check all
the pages if we are withing zone_movable because we rely that those
pages will be always migrateable. As it turns out we are still not
perfect there. While bootmem pages in zonemovable sound like a clear
bug which should be fixed let's remove the optimization for now and warn
if we encounter unmovable pages in zone_movable in the meantime. That
should help for now at least.
Btw. this wasn't a real problem until commit 72b39cfc4d ("mm,
memory_hotplug: do not fail offlining too early") because we used to
have a small number of retries and then failed. This turned out to be
too fragile though.
Link: http://lkml.kernel.org/r/20180523125555.30039-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Oscar Salvador <osalvador@techadventures.net>
Tested-by: Oscar Salvador <osalvador@techadventures.net>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KASAN uses different routines to map shadow for hot added memory and
memory obtained in boot process. Attempt to offline memory onlined by
normal boot process leads to this:
Trying to vfree() nonexistent vm area (000000005d3b34b9)
WARNING: CPU: 2 PID: 13215 at mm/vmalloc.c:1525 __vunmap+0x147/0x190
Call Trace:
kasan_mem_notifier+0xad/0xb9
notifier_call_chain+0x166/0x260
__blocking_notifier_call_chain+0xdb/0x140
__offline_pages+0x96a/0xb10
memory_subsys_offline+0x76/0xc0
device_offline+0xb8/0x120
store_mem_state+0xfa/0x120
kernfs_fop_write+0x1d5/0x320
__vfs_write+0xd4/0x530
vfs_write+0x105/0x340
SyS_write+0xb0/0x140
Obviously we can't call vfree() to free memory that wasn't allocated via
vmalloc(). Use find_vm_area() to see if we can call vfree().
Unfortunately it's a bit tricky to properly unmap and free shadow
allocated during boot, so we'll have to keep it. If memory will come
online again that shadow will be reused.
Matthew asked: how can you call vfree() on something that isn't a
vmalloc address?
vfree() is able to free any address returned by
__vmalloc_node_range(). And __vmalloc_node_range() gives you any
address you ask. It doesn't have to be an address in [VMALLOC_START,
VMALLOC_END] range.
That's also how the module_alloc()/module_memfree() works on
architectures that have designated area for modules.
[aryabinin@virtuozzo.com: improve comments]
Link: http://lkml.kernel.org/r/dabee6ab-3a7a-51cd-3b86-5468718e0390@virtuozzo.com
[akpm@linux-foundation.org: fix typos, reflow comment]
Link: http://lkml.kernel.org/r/20180201163349.8700-1-aryabinin@virtuozzo.com
Fixes: fa69b5989b ("mm/kasan: add support for memory hotplug")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Paul Menzel <pmenzel+linux-kasan-dev@molgen.mpg.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current hugetlbfs maintainer has not been active for more than a few
years. I have been been active in this area for more than two years and
plan to remain active in the foreseeable future.
Also, update the hugetlbfs entry to include linux-mm mail list and
additional hugetlbfs related files. hugetlb.c and hugetlb.h are not
100% hugetlbfs, but a majority of their content is hugetlbfs related.
Link: http://lkml.kernel.org/r/20180518225236.19079-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Nadia Yvette Chambers <nyc@holomorphy.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
shmat()'s SHM_REMAP option forbids passing a nil address for; this is in
fact the very first thing we check for. Andrea reported that for
SHM_RND|SHM_REMAP cases we can end up bypassing the initial addr check,
but we need to check again if the address was rounded down to nil. As
of this patch, such cases will return -EINVAL.
Link: http://lkml.kernel.org/r/20180503204934.kk63josdu6u53fbd@linux-n805
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "ipc/shm: shmat() fixes around nil-page".
These patches fix two issues reported[1] a while back by Joe and Andrea
around how shmat(2) behaves with nil-page.
The first reverts a commit that it was incorrectly thought that mapping
nil-page (address=0) was a no no with MAP_FIXED. This is not the case,
with the exception of SHM_REMAP; which is address in the second patch.
I chose two patches because it is easier to backport and it explicitly
reverts bogus behaviour. Both patches ought to be in -stable and ltp
testcases need updated (the added testcase around the cve can be
modified to just test for SHM_RND|SHM_REMAP).
[1] lkml.kernel.org/r/20180430172152.nfa564pvgpk3ut7p@linux-n805
This patch (of 2):
Commit 95e91b831f ("ipc/shm: Fix shmat mmap nil-page protection")
worked on the idea that we should not be mapping as root addr=0 and
MAP_FIXED. However, it was reported that this scenario is in fact
valid, thus making the patch both bogus and breaks userspace as well.
For example X11's libint10.so relies on shmat(1, SHM_RND) for lowmem
initialization[1].
[1] https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/os-support/linux/int10/linux.c#n347
Link: http://lkml.kernel.org/r/20180503203243.15045-2-dave@stgolabs.net
Fixes: 95e91b831f ("ipc/shm: Fix shmat mmap nil-page protection")
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the radix tree underlying the IDR happens to be full and we attempt
to remove an id which is larger than any id in the IDR, we will call
__radix_tree_delete() with an uninitialised 'slot' pointer, at which
point anything could happen. This was easiest to hit with a single
entry at id 0 and attempting to remove a non-0 id, but it could have
happened with 64 entries and attempting to remove an id >= 64.
Roman said:
The syzcaller test boils down to opening /dev/kvm, creating an
eventfd, and calling a couple of KVM ioctls. None of this requires
superuser. And the result is dereferencing an uninitialized pointer
which is likely a crash. The specific path caught by syzbot is via
KVM_HYPERV_EVENTD ioctl which is new in 4.17. But I guess there are
other user-triggerable paths, so cc:stable is probably justified.
Matthew added:
We have around 250 calls to idr_remove() in the kernel today. Many of
them pass an ID which is embedded in the object they're removing, so
they're safe. Picking a few likely candidates:
drivers/firewire/core-cdev.c looks unsafe; the ID comes from an ioctl.
drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c is similar
drivers/atm/nicstar.c could be taken down by a handcrafted packet
Link: http://lkml.kernel.org/r/20180518175025.GD6361@bombadil.infradead.org
Fixes: 0a835c4f09 ("Reimplement IDR and IDA using the radix tree")
Reported-by: <syzbot+35666cba7f0a337e2e79@syzkaller.appspotmail.com>
Debugged-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>