commit 33def8498f ("treewide: Convert macro and uses of
__section(foo) to __section("foo")") removed the stringification of the
section name and now requires quotes around the named section.
Update checkpatch to not remove any quotes when suggesting conversion
of __attribute__((section("name"))) to __section("name")
Miscellanea:
o Add section to the hash with __section replacement
o Remove separate test for __attribute__((section
o Remove the limitation on converting attributes containing only
known, possible conversions. Any unknown attribute types are now
left as-is and known types are converted and moved before
__attribute__ and removed from within the __attribute__((list...)).
[joe@perches.com: eliminate the separate test below the possible conversions loop]
Link: https://lkml.kernel.org/r/58e9d55e933dc8fdc6af489f2ad797fa8eb13e44.camel@perches.com
Link: https://lkml.kernel.org/r/c04dd1c810e8d6a68e6a632e3191ae91651c8edf.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the trailing error message from the fixed lines.
Link: https://lkml.kernel.org/r/20201017142546.28988-1-trix@redhat.com
Signed-off-by: Tom Rix <trix@redhat.com>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is generally preferred that the macros from
include/linux/compiler_attributes.h are used, unless there is a reason not
to.
checkpatch currently checks __attribute__ for each of packed, aligned,
section, printf, scanf, and weak. Other declarations in
compiler_attributes.h are not handled.
Add a generic test to check the presence of such attributes. Some
attributes require more specific handling and are kept separate.
Also add fixes to the generic attributes check to substitute the correct
conversions.
New attributes which are now handled are:
__always_inline__
__assume_aligned__(a, ## __VA_ARGS__)
__cold__
__const__
__copy__(symbol)
__designated_init__
__externally_visible__
__gnu_inline__
__malloc__
__mode__(x)
__no_caller_saved_registers__
__noclone__
__noinline__
__nonstring__
__noreturn__
__pure__
__unused__
__used__
Declarations which contain multiple attributes like
__attribute__((__packed__, __cold__)) are also handled except when proper
conversions for one or more attributes of the list cannot be determined.
Link: https://lore.kernel.org/linux-kernel-mentees/3ec15b41754b01666d94b76ce51b9832c2dd577a.camel@perches.com/
Link: https://lkml.kernel.org/r/20201025193103.23223-1-dwaipayanray1@gmail.com
Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
Suggested-by: Joe Perches <joe@perches.com>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
switch/case use of break after a return, goto or break is unnecessary.
There is an existing warning for the return and goto uses, so add
break and a --fix option too.
Link: https://lkml.kernel.org/r/d9ea654104d55f590fb97d252d64a66b23c1a096.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Tom Rix <trix@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are about 100,000 uses of 'static const <type>' but about 400 uses
of 'static <type> const' in the kernel where type is not a pointer.
The kernel almost always uses "static const" over "const static" as there
is a compiler warning for that declaration style.
But there is no compiler warning for "static <type> const".
So add a checkpatch warning for the atypical declaration uses of.
const static <type> <foo>
and
static <type> const <foo>
For example:
$ ./scripts/checkpatch.pl -f --emacs --quiet --nosummary -types=static_const arch/arm/crypto/aes-ce-glue.c
arch/arm/crypto/aes-ce-glue.c:75: WARNING: Move const after static - use 'static const u8'
#75: FILE: arch/arm/crypto/aes-ce-glue.c:75:
+ static u8 const rcon[] = {
Link: https://lkml.kernel.org/r/4b863be68e679546b40d50b97a4a806c03056a1c.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Presence of hexadecimal address or symbol results in false warning
message by checkpatch.pl.
For example, running checkpatch on commit b8ad540dd4 ("mptcp: fix
memory leak in mptcp_subflow_create_socket()") results in warning:
WARNING:REPEATED_WORD: Possible repeated word: 'ff'
00 00 00 00 00 00 00 00 00 2f 30 0a 81 88 ff ff ........./0.....
Similarly, the presence of list command output in commit results in
an unnecessary warning.
For example, running checkpatch on commit 899e5ffbf2 ("perf record:
Introduce --switch-output-event") gives:
WARNING:REPEATED_WORD: Possible repeated word: 'root'
dr-xr-x---. 12 root root 4096 Apr 27 17:46 ..
Here, it reports 'ff' and 'root' to be repeated, but it is in fact part
of some address or code, where it has to be repeated.
In these cases, the intent of the warning to find stylistic issues in
commit messages is not met and the warning is just completely wrong in
this case.
To avoid these warnings, add an additional regex check for the directory
permission pattern and avoid checking the line for this class of
warning. Similarly, to avoid hex pattern, check if the word consists of
hex symbols and skip this warning if it is not among the common english
words formed using hex letters.
A quick evaluation on v5.6..v5.8 showed that this fix reduces
REPEATED_WORD warnings by the frequency of 1890.
A quick manual check found all cases are related to hex output or list
command outputs in commit messages.
Link: https://lkml.kernel.org/r/20201024102253.13614-1-yashsri421@gmail.com
Signed-off-by: Aditya Srivastava <yashsri421@gmail.com>
Acked-by: Joe Perches <joe@perches.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word test")
moved the repeated word test to check for more file types. But after
this, if checkpatch.pl is run on MAINTAINERS, it generates several
new warnings of the type:
WARNING: Possible repeated word: 'git'
For example:
WARNING: Possible repeated word: 'git'
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml.git
So, the pattern "git git://..." is a false positive in this case.
There are several other combinations which may produce a wrong warning
message, such as "@size size", ":Begin begin", etc.
Extend repeated word check to compare the characters before and after
the word matches.
If there is a non whitespace character before the first word or a non
whitespace character excluding punctuation characters after the second
word, then the check is skipped and the warning is avoided.
Also add case insensitive word matching to the repeated word check.
Link: https://lore.kernel.org/linux-kernel-mentees/81b6a0bb2c7b9256361573f7a13201ebcd4876f1.camel@perches.com/
Link: https://lkml.kernel.org/r/20201017162732.152351-1-dwaipayanray1@gmail.com
Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
Suggested-by: Joe Perches <joe@perches.com>
Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Joe Perches <joe@perches.com>
Cc: Aditya Srivastava <yashsri421@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
LZ4 final literal copy could be overlapped when doing
in-place decompression, so it's unsafe to just use memcpy()
on an optimized memcpy approach but memmove() instead.
Upstream LZ4 has updated this years ago [1] (and the impact
is non-sensible [2] plus only a few bytes remain), this commit
just synchronizes LZ4 upstream code to the kernel side as well.
It can be observed as EROFS in-place decompression failure
on specific files when X86_FEATURE_ERMS is unsupported,
memcpy() optimization of commit 59daa706fb ("x86, mem:
Optimize memcpy by avoiding memory false dependece") will
be enabled then.
Currently most modern x86-CPUs support ERMS, these CPUs just
use "rep movsb" approach so no problem at all. However, it can
still be verified with forcely disabling ERMS feature...
arch/x86/lib/memcpy_64.S:
ALTERNATIVE_2 "jmp memcpy_orig", "", X86_FEATURE_REP_GOOD, \
- "jmp memcpy_erms", X86_FEATURE_ERMS
+ "jmp memcpy_orig", X86_FEATURE_ERMS
We didn't observe any strange on arm64/arm/x86 platform before
since most memcpy() would behave in an increasing address order
("copy upwards" [3]) and it's the correct order of in-place
decompression but it really needs an update to memmove() for sure
considering it's an undefined behavior according to the standard
and some unique optimization already exists in the kernel.
[1] 33cb8518ac
[2] https://github.com/lz4/lz4/pull/717#issuecomment-497818921
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=12518
Link: https://lkml.kernel.org/r/20201122030749.2698994-1-hsiangkao@redhat.com
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Nick Terrell <terrelln@fb.com>
Cc: Yann Collet <yann.collet.73@gmail.com>
Cc: Miao Xie <miaoxie@huawei.com>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: Li Guifu <bluce.liguifu@huawei.com>
Cc: Guo Xuenan <guoxuenan@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use proper conversion functions. kstrto*() variants exist for all
standard types.
Link: https://lkml.kernel.org/r/20201122123410.GB92364@localhost.localdomain
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In lkdtm.h, files targeted in comments are named "lkdtm_file.c" while
there are named "file.c" in directory.
Link: https://lkml.kernel.org/r/20201122162451.27551-6-laniel_francis@privacyrequired.com
Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This new test ensures that fortified strscpy has the same behavior than
vanilla strscpy (e.g. returning -E2BIG when src content is truncated).
Finally, it generates a crash at runtime because there is a write overflow
in destination string.
Link: https://lkml.kernel.org/r/20201122162451.27551-5-laniel_francis@privacyrequired.com
Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The fortified version of strscpy ensures the following before vanilla strscpy
is called:
1. There is no read overflow because we either size is smaller than
src length or we shrink size to src length by calling fortified
strnlen.
2. There is no write overflow because we either failed during
compilation or at runtime by checking that size is smaller than dest
size.
Link: https://lkml.kernel.org/r/20201122162451.27551-4-laniel_francis@privacyrequired.com
Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add code to test both:
- runtime detection of the overrun of a structure. This covers the
__builtin_object_size(x, 0) case. This test is called FORTIFY_OBJECT.
- runtime detection of the overrun of a char array within a structure.
This covers the __builtin_object_size(x, 1) case which can be used
for some string functions. This test is called FORTIFY_SUBOBJECT.
Link: https://lkml.kernel.org/r/20201122162451.27551-3-laniel_francis@privacyrequired.com
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "Fortify strscpy()", v7.
This patch implements a fortified version of strscpy() enabled by setting
CONFIG_FORTIFY_SOURCE=y. The new version ensures the following before
calling vanilla strscpy():
1. There is no read overflow because either size is smaller than src
length or we shrink size to src length by calling fortified strnlen().
2. There is no write overflow because we either failed during
compilation or at runtime by checking that size is smaller than dest
size. Note that, if src and dst size cannot be got, the patch defaults
to call vanilla strscpy().
The patches adds the following:
1. Implement the fortified version of strscpy().
2. Add a new LKDTM test to ensures the fortified version still returns
the same value as the vanilla one while panic'ing when there is a write
overflow.
3. Correct some typos in LKDTM related file.
I based my modifications on top of two patches from Daniel Axtens which
modify calls to __builtin_object_size, in fortified string functions, to
ensure the true size of char * are returned and not the surrounding
structure size.
About performance, I measured the slow down of fortified strscpy(), using
the vanilla one as baseline. The hardware I used is an Intel i3 2130 CPU
clocked at 3.4 GHz. I ran "Linux 5.10.0-rc4+ SMP PREEMPT" inside qemu
3.10 with 4 CPU cores. The following code, called through LKDTM, was used
as a benchmark:
#define TIMES 10000
char *src;
char dst[7];
int i;
ktime_t begin;
src = kstrdup("foobar", GFP_KERNEL);
if (src == NULL)
return;
begin = ktime_get();
for (i = 0; i < TIMES; i++)
strscpy(dst, src, strlen(src));
pr_info("%d fortified strscpy() tooks %lld", TIMES, ktime_get() - begin);
begin = ktime_get();
for (i = 0; i < TIMES; i++)
__real_strscpy(dst, src, strlen(src));
pr_info("%d vanilla strscpy() tooks %lld", TIMES, ktime_get() - begin);
kfree(src);
I called the above code 30 times to compute stats for each version (in ns,
round to int):
| version | mean | std | median | 95th |
| --------- | ------- | ------ | ------- | ------- |
| fortified | 245_069 | 54_657 | 216_230 | 331_122 |
| vanilla | 172_501 | 70_281 | 143_539 | 219_553 |
On average, fortified strscpy() is approximately 1.42 times slower than
vanilla strscpy(). For the 95th percentile, the fortified version is
about 1.50 times slower.
So, clearly the stats are not in favor of fortified strscpy(). But, the
fortified version loops the string twice (one in strnlen() and another in
vanilla strscpy()) while the vanilla one only loops once. This can
explain why fortified strscpy() is slower than the vanilla one.
This patch (of 5):
When the fortify feature was first introduced in commit 6974f0c455
("include/linux/string.h: add the option of fortified string.h
functions"), Daniel Micay observed:
* It should be possible to optionally use __builtin_object_size(x, 1) for
some functions (C strings) to detect intra-object overflows (like
glibc's _FORTIFY_SOURCE=2), but for now this takes the conservative
approach to avoid likely compatibility issues.
This is a case that often cannot be caught by KASAN. Consider:
struct foo {
char a[10];
char b[10];
}
void test() {
char *msg;
struct foo foo;
msg = kmalloc(16, GFP_KERNEL);
strcpy(msg, "Hello world!!");
// this copy overwrites foo.b
strcpy(foo.a, msg);
}
The questionable copy overflows foo.a and writes to foo.b as well. It
cannot be detected by KASAN. Currently it is also not detected by
fortify, because strcpy considers __builtin_object_size(x, 0), which
considers the size of the surrounding object (here, struct foo). However,
if we switch the string functions over to use __builtin_object_size(x, 1),
the compiler will measure the size of the closest surrounding subobject
(here, foo.a), rather than the size of the surrounding object as a whole.
See https://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking.html for more
info.
Only do this for string functions: we cannot use it on things like memcpy,
memmove, memcmp and memchr_inv due to code like this which purposefully
operates on multiple structure members: (arch/x86/kernel/traps.c)
/*
* regs->sp points to the failing IRET frame on the
* ESPFIX64 stack. Copy it to the entry stack. This fills
* in gpregs->ss through gpregs->ip.
*
*/
memmove(&gpregs->ip, (void *)regs->sp, 5*8);
This change passes an allyesconfig on powerpc and x86, and an x86 kernel
built with it survives running with syz-stress from syzkaller, so it seems
safe so far.
Link: https://lkml.kernel.org/r/20201122162451.27551-1-laniel_francis@privacyrequired.com
Link: https://lkml.kernel.org/r/20201122162451.27551-2-laniel_francis@privacyrequired.com
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A few architecture specific string.h functions used to be implemented in
terms of preprocessor defines to the corresponding compiler builtins.
Since this is no longer the case, remove unused #undefs.
Only memcmp is still defined in terms of builtins for a few arches.
Link: https://github.com/ClangBuiltLinux/linux/issues/428
Link: https://lkml.kernel.org/r/20201120041113.89382-1-ndesaulniers@google.com
Fixes: 5f074f3e19 ("lib/string.c: implement a basic bcmp")
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Alexandru Ardelean <alexandru.ardelean@analog.com>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As discussed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445 the
const_ilog2 macro generates a lot of code which interferes badly with GCC
inlining heuristics, until it can be proven that the ilog2 argument can or
can't be simplified into a constant.
It can be expressed using __builtin_clzll builtin which is supported by
GCC 3.4 and later and when used only in the __builtin_constant_p guarded
code it ought to always fold back to a constant. Other compilers support
the same builtin for many years too.
Other option would be to change the const_ilog2 macro, though as the
description says it is meant to be used also in C constant expressions,
and while GCC will fold it to constant with constant argument even in
those, perhaps it is better to avoid using extensions in that case.
[akpm@linux-foundation.org: coding style fixes]
Link: https://lkml.kernel.org/r/20201120125154.GB3040@hirez.programming.kicks-ass.net
Link: https://lkml.kernel.org/r/20201021132718.GB2176@tucnak
Signed-off-by: Jakub Jelinek <jakub@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Test get_option() for a starter which is provided by cmdline.c.
[akpm@linux-foundation.org: fix warning by constifying cmdline_test_values]
[andriy.shevchenko@linux.intel.com: type of expected returned values should be int]
Link: https://lkml.kernel.org/r/20201116104244.15472-1-andriy.shevchenko@linux.intel.com
[andriy.shevchenko@linux.intel.com: provide meaningful MODULE_LICENSE()]
Link: https://lkml.kernel.org/r/20201116104257.15527-1-andriy.shevchenko@linux.intel.com
Link: https://lkml.kernel.org/r/20201112180732.75589-6-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Vitor Massaru Iha <vitor@massaru.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Gow <davidgow@google.com>
Cc: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In the future we would like to use get_option() to only validate the
string and parse it separately. To achieve this, allow NULL to be an
output for get_option().
Link: https://lkml.kernel.org/r/20201112180732.75589-5-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Gow <davidgow@google.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Vitor Massaru Iha <vitor@massaru.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When string doesn't have an integer and starts from hyphen get_option()
may return interesting results. Fix it to return 0.
The simple_strtoull() is used due to absence of simple_strtoul() in a boot
code on some architectures.
Note, the Fixes tag below is rather for anthropological curiosity.
Link: https://lkml.kernel.org/r/20201112180732.75589-4-andriy.shevchenko@linux.intel.com
Fixes: f68565831e72 ("Import 2.4.0-test2pre3")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Gow <davidgow@google.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Vitor Massaru Iha <vitor@massaru.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On PREEMPT_RT the locks are quite different so they can't be tested as it
is done below. The alternative is to test for the waitlock within
rtmutex.
This is the bare minimun to get it compiled. Problems which exist on
PREEMP_RT:
- none of the locks (spinlock_t, rwlock_t, mutex_t, rw_semaphore) may
be acquired with disabled preemption or interrupts.
If I read the code correct the it is possible to acquire a mutex_t
with disabled interrupts.
I don't know how to obtain a lock pointer. Technically they are not
exported to userland.
- memory can not be allocated with disabled preemption or interrupts
even with GFP_ATOMIC.
Link: https://lkml.kernel.org/r/20201028181041.xyeothhkouc3p4md@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use array_size() helper instead of the open-coded version in jhash2().
These sorts of multiplication factors need to be wrapped in array_size().
Also, use the preferred form for passing the size of an object type.
Link: https://lkml.kernel.org/r/cb8a682e4bba4dbddd2bd8aca7f8c02fea89639b.1601565471.git.gustavoars@kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make use of the flex_array_size() helper to calculate the size of a
flexible array member within an enclosing structure.
This helper offers defense-in-depth against potential integer overflows,
while at the same time makes it explicitly clear that we are dealing with
a flexible array member.
Link: https://lkml.kernel.org/r/186e37fe07196ee41a0e562fa8a8cb7a01112ec5.1601565471.git.gustavoars@kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The test module to check that free_pages() does not leak memory does not
provide any feedback whatsoever its state or progress, but may take some
time on slow machines. Add the printing of messages upon starting each
phase of the test, and upon completion.
Link: https://lkml.kernel.org/r/20201018140445.20972-1-geert@linux-m68k.org
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is no need to return int type out of boolean expression.
Link: https://lkml.kernel.org/r/20201027180936.20806-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cleanup: use #elif instead of #end and #elif.
Link: https://lkml.kernel.org/r/20201015150736.GA91603@rlk
Signed-off-by: Hui Su <sh_def@163.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kernel.h is being used as a dump for all kinds of stuff for a long time.
Here is the attempt to start cleaning it up by splitting out
mathematical helpers.
At the same time convert users in header and lib folder to use new
header. Though for time being include new header back to kernel.h to
avoid twisted indirected includes for existing users.
[sfr@canb.auug.org.au: fix powerpc build]
Link: https://lkml.kernel.org/r/20201029150809.13059608@canb.auug.org.au
Link: https://lkml.kernel.org/r/20201028173212.41768-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When building mpc885_ads_defconfig with gcc 10.1,
the function get_order() appears 50 times in vmlinux:
[linux]# ppc-linux-objdump -x vmlinux | grep get_order | wc -l
50
[linux]# size vmlinux
text data bss dec hex filename
3842620 675624 135160 4653404 47015c vmlinux
In the old days, marking a function 'static inline' was forcing GCC to
inline, but since commit ac7c3e4ff4 ("compiler: enable
CONFIG_OPTIMIZE_INLINING forcibly") GCC may decide to not inline a
function.
It looks like GCC 10 is taking poor decisions on this.
get_order() compiles into the following tiny function, occupying 20
bytes of text.
0000007c <get_order>:
7c: 38 63 ff ff addi r3,r3,-1
80: 54 63 a3 3e rlwinm r3,r3,20,12,31
84: 7c 63 00 34 cntlzw r3,r3
88: 20 63 00 20 subfic r3,r3,32
8c: 4e 80 00 20 blr
By forcing get_order() to be __always_inline, the size of text is
reduced by 1940 bytes, that is almost twice the space occupied by
50 times get_order()
[linux-powerpc]# size vmlinux
text data bss dec hex filename
3840680 675588 135176 4651444 46f9b4 vmlinux
Link: https://lkml.kernel.org/r/96c6172d619c51acc5c1c4884b80785c59af4102.1602949927.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We don't need pde_get()'s return value, so make pde_get() return nothing
Link: https://lkml.kernel.org/r/20201211061944.GA2387571@rlk
Signed-off-by: Hui Su <sh_def@163.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 1fde6f21d9 ("proc: fix /proc/net/* after setns(2)") only forced
revalidation of regular files under /proc/net/
However, /proc/net/ is unusual in the sense of /proc/net/foo handlers
take netns pointer from parent directory which is old netns.
Steps to reproduce:
(void)open("/proc/net/sctp/snmp", O_RDONLY);
unshare(CLONE_NEWNET);
int fd = open("/proc/net/sctp/snmp", O_RDONLY);
read(fd, &c, 1);
Read will read wrong data from original netns.
Patch forces lookup on every directory under /proc/net .
Link: https://lkml.kernel.org/r/20201205160916.GA109739@localhost.localdomain
Fixes: 1da4d377f9 ("proc: revalidate misc dentries")
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: "Rantala, Tommi T. (Nokia - FI/Espoo)" <tommi.t.rantala@nokia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Similar to speculation store bypass, show information about the indirect
branch speculation mode of a task in /proc/$pid/status.
For testing/benchmarking, I needed to see whether IB (Indirect Branch)
speculation (see Spectre-v2) is enabled on a task, to see whether an
IBPB instruction should be executed on an address space switch.
Unfortunately, this information isn't available anywhere else and
currently the only way to get it is to hack the kernel to expose it
(like this change). It also helped expose a bug with conditional IB
speculation on certain CPUs.
Another place this could be useful is to audit the system when using
sanboxing. With this change, I can confirm that seccomp-enabled
process have IB speculation force disabled as expected when the kernel
command line parameter `spectre_v2_user=seccomp`.
Since there's already a 'Speculation_Store_Bypass' field, I used that
as precedent for adding this one.
[amistry@google.com: remove underscores from field name to workaround documentation issue]
Link: https://lkml.kernel.org/r/20201106131015.v2.1.I7782b0cedb705384a634cfd8898eb7523562da99@changeid
Link: https://lkml.kernel.org/r/20201030172731.1.I7782b0cedb705384a634cfd8898eb7523562da99@changeid
Signed-off-by: Anand K Mistry <amistry@google.com>
Cc: Anthony Steinhauser <asteinhauser@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Anand K Mistry <amistry@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Delete repeated words in fs/proc/.
{the, which}
where "which which" was changed to "with which".
Link: https://lkml.kernel.org/r/20201028191525.13413-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
in_interrupt() is true for a variety of things including bottom half
disabled regions. Deducing hard interrupt context from it is dubious at
best.
Use in_irq() which is true if called in hard interrupt context. Otherwise
calling irq_exit() would do more harm than good.
Link: https://lkml.kernel.org/r/20201113135832.2202833-1-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Serge Belyshev <belyshev@depni.sinp.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On PowerPC, when dymically removing memory from a system we can see in the
console a lot of messages like this:
[ 186.575389] Offlined Pages 4096
This message is displayed on each LMB (256MB) removed, which means that we
removing 1TB of memory, this message is displayed 4096 times.
Moving it to DEBUG to not flood the console.
Link: https://lkml.kernel.org/r/20201211150157.91399-1-ldufour@linux.ibm.com
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Scott Cheloha <cheloha@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The scenario on which "Free swap = -4kB" happens in my system, which is caused
by several get_swap_pages racing with each other and show_swap_cache_info
happens simutaniously. No need to add a lock on get_swap_page_of_type as we
remove "Presub/PosAdd" here.
ProcessA ProcessB ProcessC
ngoals = 1 ngoals = 1
avail = nr_swap_pages(1) avail = nr_swap_pages(1)
nr_swap_pages(1) -= ngoals
nr_swap_pages(0) -= ngoals
nr_swap_pages = -1
Link: https://lkml.kernel.org/r/1607050340-4535-1-git-send-email-zhaoyang.huang@unisoc.com
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull exec-update-lock update from Eric Biederman:
"The key point of this is to transform exec_update_mutex into a
rw_semaphore so readers can be separated from writers.
This makes it easier to understand what the holders of the lock are
doing, and makes it harder to contend or deadlock on the lock.
The real deadlock fix wound up in perf_event_open"
* 'exec-update-lock-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
exec: Transform exec_update_mutex into a rw_semaphore
Pull execve updates from Eric Biederman:
"This set of changes ultimately fixes the interaction of posix file
lock and exec. Fundamentally most of the change is just moving where
unshare_files is called during exec, and tweaking the users of
files_struct so that the count of files_struct is not unnecessarily
played with.
Along the way fcheck and related helpers were renamed to more
accurately reflect what they do.
There were also many other small changes that fell out, as this is the
first time in a long time much of this code has been touched.
Benchmarks haven't turned up any practical issues but Al Viro has
observed a possibility for a lot of pounding on task_lock. So I have
some changes in progress to convert put_files_struct to always rcu
free files_struct. That wasn't ready for the merge window so that will
have to wait until next time"
* 'exec-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits)
exec: Move io_uring_task_cancel after the point of no return
coredump: Document coredump code exclusively used by cell spufs
file: Remove get_files_struct
file: Rename __close_fd_get_file close_fd_get_file
file: Replace ksys_close with close_fd
file: Rename __close_fd to close_fd and remove the files parameter
file: Merge __alloc_fd into alloc_fd
file: In f_dupfd read RLIMIT_NOFILE once.
file: Merge __fd_install into fd_install
proc/fd: In fdinfo seq_show don't use get_files_struct
bpf/task_iter: In task_file_seq_get_next use task_lookup_next_fd_rcu
proc/fd: In proc_readfd_common use task_lookup_next_fd_rcu
file: Implement task_lookup_next_fd_rcu
kcmp: In get_file_raw_ptr use task_lookup_fd_rcu
proc/fd: In tid_fd_mode use task_lookup_fd_rcu
file: Implement task_lookup_fd_rcu
file: Rename fcheck lookup_fd_rcu
file: Replace fcheck_files with files_lookup_fd_rcu
file: Factor files_lookup_fd_locked out of fcheck_files
file: Rename __fcheck_files to files_lookup_fd_raw
...
Pull signal cleanup from Eric Biederman:
"Remove a never used HP-UX compatibility from parisc headers and
consolidating the SA_* flags definitions into a generic header as much
as possible.
We only have 32 SA_* flag bits total, so we need to be careful. But as
this is the first addition in a decade or so I think we are fine for
the forseeable future"
* 'signal-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
signal/parisc: Remove parisc specific definition of __ARCH_UAPI_SA_FLAGS
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCX9dpfgAKCRCRxhvAZXjc
oo5kAP9PrqQAfEe9+CNlnOb4ZawcZaa3osUkr/ZkfoxI/dO2awEAgGCgWQ5PLtQF
gtfz6I5IT2sc3G4D+nGZxef6Q29J2Qc=
=fZNu
-----END PGP SIGNATURE-----
Merge tag 'close-range-openat2-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull close_range/openat2 updates from Christian Brauner:
"This contains a fix for openat2() to make RESOLVE_BENEATH and
RESOLVE_IN_ROOT mutually exclusive. It doesn't make sense to specify
both at the same time. The openat2() selftests have been extended to
verify that these two flags can't be specified together.
This also adds the CLOSE_RANGE_CLOEXEC flag to close_range() which
allows to mark a range of file descriptors as close-on-exec without
actually closing them.
This is useful in general but the use-case that triggered the patch is
installing a seccomp profile in the calling task before exec. If the
seccomp profile wants to block the close_range() syscall it obviously
can't use it to close all fds before exec. If it calls close_range()
before installing the seccomp profile it needs to take care not to
close fds that it will still need before the exec meaning it would
have to call close_range() multiple times on different ranges and then
still fall back to closing fds one by one right before the exec.
CLOSE_RANGE_CLOEXEC allows to solve this problem relying on the exec
codepath to get rid of the unwanted fds. The close_range() tests have
been expanded to verify that CLOSE_RANGE_CLOEXEC works"
* tag 'close-range-openat2-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
selftests: core: add tests for CLOSE_RANGE_CLOEXEC
fs, close_range: add flag CLOSE_RANGE_CLOEXEC
selftests: openat2: add RESOLVE_ conflict test
openat2: reject RESOLVE_BENEATH|RESOLVE_IN_ROOT
Pull regset updates from Al Viro:
"Dead code removal, mostly.
The only exception is a bit of cleanups on itanic (getting rid of
redundant stack unwinds - each access_uarea() call does it and we call
that 7 times in a row in ptrace_[sg]etregs(), *after* having done it
ourselves in the caller; location where the user registers have been
spilled won't change under us, and we can bloody well just call
access_elf_reg() directly, giving it the unw_frame_info we'd
calculated for our own purposes)"
* 'regset.followup' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
c6x: kill ELF_CORE_COPY_FPREGS
whack-a-mole: USE_ELF_CORE_DUMP
[ia64] ptrace_[sg]etregs(): use access_elf_reg() instead of access_uarea()
[ia64] missed cleanups from switch to regset coredumps
arm: kill dump_task_regs()
Pull epoll updates from Al Viro:
"Deal with epoll loop check/removal races sanely (among other things).
The solution merged last cycle (pinning a bunch of struct file
instances) had been forced by the wrong data structures; untangling
that takes a bunch of preparations, but it's worth doing - control
flow in there is ridiculously overcomplicated. Memory footprint has
also gone down, while we are at it.
This is not all I want to do in the area, but since I didn't get
around to posting the followups they'll have to wait for the next
cycle"
* 'work.epoll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (27 commits)
epoll: take epitem list out of struct file
epoll: massage the check list insertion
lift rcu_read_lock() into reverse_path_check()
convert ->f_ep_links/->fllink to hlist
ep_insert(): move creation of wakeup source past the fl_ep_links insertion
fold ep_read_events_proc() into the only caller
take the common part of ep_eventpoll_poll() and ep_item_poll() into helper
ep_insert(): we only need tep->mtx around the insertion itself
ep_insert(): don't open-code ep_remove() on failure exits
lift locking/unlocking ep->mtx out of ep_{start,done}_scan()
ep_send_events_proc(): fold into the caller
lift the calls of ep_send_events_proc() into the callers
lift the calls of ep_read_events_proc() into the callers
ep_scan_ready_list(): prepare to splitup
ep_loop_check_proc(): saner calling conventions
get rid of ep_push_nested()
ep_loop_check_proc(): lift pushing the cookie into callers
clean reverse_path_check_proc() a bit
reverse_path_check_proc(): don't bother with cookies
reverse_path_check_proc(): sane arguments
...
- get rid of magical page->mapping type marks;
- switch to inplace I/O under low memory scenario;
- return the correct block number for bmap();
- some minor cleanups.
-----BEGIN PGP SIGNATURE-----
iIsEABYIADMWIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCX9iA5xUcaHNpYW5na2Fv
QHJlZGhhdC5jb20ACgkQOTcx3B+15gT+gAD+N8HcFqJk0vgLih5ud1TmM9tWlYY0
7UYQnvRn6OuXDEUA+waXg+zutWVzHBP6cnVHGmcZVp3elsZB1U05sg2TIKkP
=m8At
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"This cycle we got rid of magical page->mapping type marks for
temporary pages which had some concern before, now such usage is
replaced with specific page->private.
Also switch to inplace I/O instead of allocating extra cached pages to
avoid direct reclaim under low memory scenario.
There are some bmap bugfix and minor cleanups as well"
* tag 'erofs-for-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: avoid using generic_block_bmap
erofs: force inplace I/O under low memory scenario
erofs: simplify try_to_claim_pcluster()
erofs: insert to managed cache after adding to pcl
erofs: get rid of magical Z_EROFS_MAPPING_STAGING
erofs: remove a void EROFS_VERSION macro set in Makefile
- Improve support for re-exporting NFS mounts
- Replace NFSv4 XDR decoding C macros with xdr_stream helpers
- Support for multiple RPC/RDMA chunks per RPC transaction
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAl/Q4dIACgkQM2qzM29m
f5fInw//eDrmXBEhxbzcgeqNilGU5Qkn4INJtAcOGwPcw5Kjp4UVNGFpZNPqIDSf
FP0Yw0d/rW7UggwCviPcs/adLTasU9skq1jgAv8d0ig4DtPbeqFo6BvbY+G2JxVF
EfTeHzr6w6er8HRqyuLN4hjm1rQIpQlDHaYU4QcMs4fjPVv88eYLiwnYGYf3X46i
vBYstu1IRxHhg2x4O833xmiL6VbkZDQoWwDjGICylxUBcNUtAmq/sETjTa4JVEJj
4vgXdcJmAFjNgAOrmoR3DISsr9mvCvKN9g3C0+hHiRERTGEon//HzvscWH74wT48
o0LUW0ZWgpmunTcmiSNeeiHNsUXJyy3A/xyEdteqqnvSxulxlqkQzb15Eb+92+6n
BHGT/sOz1zz+/l9NCpdeEl5AkSA9plV8Iqd/kzwFwe1KwHMjldeMw/mhMut8EM2j
b54EMsp40ipITAwBHvcygCXiWAn/mPex6bCr17Dijo6MsNLsyd+cDsazntbNzwz3
RMGMf2TPOi8tWswrTUS9J5xKk5LAEWX/6Z/hTA1YlsB3PfrhXO97ztrytxvoO/bp
M0NREA+NNMn/JyyL8FT3ID5peaLVHhA1GHw9CcUw3C7OVzmsEg29D4zNo02dF1TC
LIyekp0kbSGGY1jLOeMLsa6Jr+2+40CcctsooVkRA+3rN0tJQvw=
=1uP3
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.11' of git://git.linux-nfs.org/projects/cel/cel-2.6
Pull nfsd updates from Chuck Lever:
"Several substantial changes this time around:
- Previously, exporting an NFS mount via NFSD was considered to be an
unsupported feature. With v5.11, the community has attempted to
make re-exporting a first-class feature of NFSD.
This would enable the Linux in-kernel NFS server to be used as an
intermediate cache for a remotely-located primary NFS server, for
example, even with other NFS server implementations, like a NetApp
filer, as the primary.
- A short series of patches brings support for multiple RPC/RDMA data
chunks per RPC transaction to the Linux NFS server's RPC/RDMA
transport implementation.
This is a part of the RPC/RDMA spec that the other premiere
NFS/RDMA implementation (Solaris) has had for a very long time, and
completes the implementation of RPC/RDMA version 1 in the Linux
kernel's NFS server.
- Long ago, NFSv4 support was introduced to NFSD using a series of C
macros that hid dprintk's and goto's. Over time, the kernel's XDR
implementation has been greatly improved, but these C macros have
remained and become fallow. A series of patches in this pull
request completely replaces those macros with the use of current
kernel XDR infrastructure. Benefits include:
- More robust input sanitization in NFSD's NFSv4 XDR decoders.
- Make it easier to use common kernel library functions that use
XDR stream APIs (for example, GSS-API).
- Align the structure of the source code with the RFCs so it is
easier to learn, verify, and maintain our XDR implementation.
- Removal of more than a hundred hidden dprintk() call sites.
- Removal of some explicit manipulation of pages to help make the
eventual transition to xdr->bvec smoother.
- On top of several related fixes in 5.10-rc, there are a few more
fixes to get the Linux NFSD implementation of NFSv4.2 inter-server
copy up to speed.
And as usual, there is a pinch of seasoning in the form of a
collection of unrelated minor bug fixes and clean-ups.
Many thanks to all who contributed this time around!"
* tag 'nfsd-5.11' of git://git.linux-nfs.org/projects/cel/cel-2.6: (131 commits)
nfsd: Record NFSv4 pre/post-op attributes as non-atomic
nfsd: Set PF_LOCAL_THROTTLE on local filesystems only
nfsd: Fix up nfsd to ensure that timeout errors don't result in ESTALE
exportfs: Add a function to return the raw output from fh_to_dentry()
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
nfsd: allow filesystems to opt out of subtree checking
nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations
Revert "nfsd4: support change_attr_type attribute"
nfsd4: don't query change attribute in v2/v3 case
nfsd: minor nfsd4_change_attribute cleanup
nfsd: simplify nfsd4_change_info
nfsd: only call inode_query_iversion in the I_VERSION case
nfs_common: need lock during iterate through the list
NFSD: Fix 5 seconds delay when doing inter server copy
NFSD: Fix sparse warning in nfs4proc.c
SUNRPC: Remove XDRBUF_SPARSE_PAGES flag in gss_proxy upcall
sunrpc: clean-up cache downcall
nfsd: Fix message level for normal termination
NFSD: Remove macros that are no longer used
NFSD: Replace READ* macros in nfsd4_decode_compound()
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEIodevzQLVs53l6BhNqiEXrVAjGQFAl/Xo/YACgkQNqiEXrVA
jGSxhQ//b9s9DQUW6TNumGl5+zN4AAokiWXNo69fyqVVZMnSO2+QS64aY0bOxQh3
56pfSul1Ay1gkLw8urlaJ8RZX1SZu+FUkbn8SGgw31ypy9UbqAeq4eT1UgvIUs0L
COO5fGgpF2M+WGW33RWyWstvZBvldqNzn/I4886fCevZtVmm5hC8yZFvOfMqWyKr
UnfTDpPlG1aBBcBMJWGPjxuGE0Aqd+WBfElC1gXM6sAcbucdPeLV2DEtrDl1WKDU
5lkfV2+f+LvfO7ZzHmh3iJg4DCjFTkfP71mi/DX4uRwJrPEEE9SeMQYob4DygZRb
l14xskFrKvbjRdj+R0HEhyAqiiDFTm8zLPjR3JL3xXFrvypoBzh6r1LI+KaQaIs1
N7/cUHEyBXv6QXRnnzbAOM0jxQzkNknAxP4IULg7bGA5xm37TP5cc7M6VpaM+rKU
ZY3Re4ja7adLq8MwvNP4jCUoAmu6B8b7PYESicKglcty+yzm+TzIIIBcZUPXoJQA
w5U8ZSGqI9uD47DrUR+KwVfGnJwqogyGrhJscgvA4PNnxprF35BShDWSCGjrZIQC
AQ3bt2BRcZ8Ad8XQWvsgAdhxeUWV22s3Dz6U5gwMxAKVzksVJrPuA6EKMeDlls+w
cxp6hOMfjU1RGvSCK8nczT0C+EB+LKi6GBvQvGnVS28w7+WeTWY=
=lMUu
-----END PGP SIGNATURE-----
Merge tag 'jfs-5.11' of git://github.com/kleikamp/linux-shaggy
Pull jfs updates from David Kleikamp:
"A few jfs fixes"
* tag 'jfs-5.11' of git://github.com/kleikamp/linux-shaggy:
jfs: Fix array index bounds check in dbAdjTree
jfs: Fix memleak in dbAdjCtl
jfs: delete duplicated words + other fixes
This set includes more low level communication layer cleanups.
The main change is the listening socket is no longer handled as
a special case of node connection sockets. There is one small
fix for checking the number of local connections.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJf16V5AAoJEDgbc8f8gGmq3Q0P/1dkBfle74X26xyACzKlb4iS
s/1wN1s0eMcxv2gRwWsw1q30OuNKkvZhVaI2a5djx8KimrR3Xizy6jcnopLrlna9
NHL9chLu1NbBI98ad8UugYciwukyRBI0ZgE5hMuef1LeLXlSFWWu8VTv92ejXTxf
O2B7/oJuFAjQtWGysOVKNOJLoUIKKmQngyRVfMuhC8p6bauL1+ljAQv/s3dfM6wg
dNHAb6BKgBmEmbzdBDuUahBfdh528Ih1+zMuaRx3yIUoVcGagnACjypcNefX8Bhk
IU+JHjgHg8v3KazovIxFV6KDoS8c0aQ9Lt3GO0zfDqN/joHNtAF4FtSdfM5SQUzB
LSbVYXS5nr5sfJ+Jg+dA7EzVF7ZtQEcfkFQLkwVhbOCLkDoTAQ0Qulqk7n7I5wMZ
4FQRqCKNhawAvTS9Bv5CoVldk/48c9Bhbum6Y9FsisiMxXtoJeG1H0UWOlLqYexV
eF99fxT+fWW85QPb0zeZblzO0uw8hbTuc4EYCW1ZsRXV7nxblG/T6SK1keZqKL6C
+edWJNIJgP7PPyfLkVrsTSNYwRUOBKYIyNjy/iZfist6ueWYtH3p2pbKuVt1UT5O
R6UGFlGP/NgtcSibQkSBW/FXXgCzy056+jVYFJVcniKnSO9sq5519DB3kSWnm7Uk
ZxjfxMCG9pWxLVrwftcl
=ORtZ
-----END PGP SIGNATURE-----
Merge tag 'dlm-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
"This set includes more low level communication layer cleanups.
The main change is the listening socket is no longer handled as a
special case of node connection sockets. There is one small fix for
checking the number of local connections"
* tag 'dlm-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
fs: dlm: check on existing node address
fs: dlm: constify addr_compare
fs: dlm: fix check for multi-homed hosts
fs: dlm: listen socket out of connection hash
fs: dlm: refactor sctp sock parameter
fs: dlm: move shutdown action to node creation
fs: dlm: move connect callback in node creation
fs: dlm: add helper for init connection
fs: dlm: handle non blocked connect event
fs: dlm: flush othercon at close
fs: dlm: add get buffer error handling
fs: dlm: define max send buffer
fs: dlm: fix proper srcu api call
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl/XdB4ACgkQxWXV+ddt
WDv41g//dOkrwjAVBfDUwRT/yKqojyEsZB1aNyHlPHFw8KEw5oIW7wxR4oqXi2ed
/i9KIJe4E9AfqAiexhLvA+Wyt/Sgwz+k4ys82PKhhRNQn7LE4tvhSBUu6JYJDU09
6I1jagya7ILa8akFXZTmVbXdliI4Ab+pcXWAmQYK/xPVDxYTSsBf4o4MilNBA9FS
lTwwBh5GTEtIkubr2yVd3pKfF4fT2g1hd+yglpHaOzpcrLMNN4hj4sUFlLbx/FlJ
MWo+914cSNKJoebbnqhK9djD9hggaaXnNooqfBOXUhZN0VN9rQoKb5tW+TREQmFm
shrmBSqN7CaqKfSOMZs7WOnTuTvmV/825PnLqDqcTUaLw+BgdyacpO9WflgfSs16
Cdvagr1SqbrSQ/3WYCpbqPLDNP3XuZ6+m5OWizf6fhyo8xdFcUHZgRC8qejDlycy
V/zP0c5OYOMi5vo6x/zhrD7Uft7xoFUVcSJCe8WPri082d9LbA2BqwCsullD60PQ
K/fsmlHs5Uxxy3MFgBPVDdWGgaa9rQ2vXequezbozBIIeeVL+Q9zkeyBFSYuFeE8
HToRE9B9BUEUh+p1JxPjOdFH/m+sKe1WMdmRLQthMzfOiNWW7pp/nL5rl4BUVmjm
58dQS73Cj/YNdBomRJXPPtgKIJPAWRrzU/JBcwAdMoKy57oh9NQ=
=5YAS
-----END PGP SIGNATURE-----
Merge tag 'for-5.11-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"We have a mix of all kinds of changes, feature updates, core stuff,
performance improvements and lots of cleanups and preparatory changes.
User visible:
- export filesystem generation in sysfs
- new features for mount option 'rescue':
- what's currently supported is exported in sysfs
- 'ignorebadroots'/'ibadroots' - continue even if some essential
tree roots are not usable (extent, uuid, data reloc, device,
csum, free space)
- 'ignoredatacsums'/'idatacsums' - skip checksum verification on
data
- 'all' - now enables 'ignorebadroots' + 'ignoredatacsums' +
'nologreplay'
- export read mirror policy settings to sysfs, new policies will be
added in the future
- remove inode number cache feature (mount -o inode_cache), obsoleted
in 5.9
User visible fixes:
- async discard scheduling fixes on high loads
- update inode byte counter atomically so stat() does not report
wrong value in some cases
- free space tree fixes:
- correctly report status of v2 after remount
- clear v1 cache inodes when v2 is newly enabled after remount
Core:
- switch own tree lock implementation to standard rw semaphore:
- one-level lock nesting is not required anymore, the last use of
this was in free space that's now loaded asynchronously
- own implementation of adaptive spinning before taking mutex has
been part of rwsem
- performance seems to be better in general, much better (+tens
of percents) for some workloads
- lockdep does not complain
- finish direct IO conversion to iomap infrastructure, remove
temporary workaround for DSYNC after iomap API updates
- preparatory work to support data and metadata blocks smaller than
page:
- generalize code that assumes sectorsize == PAGE_SIZE, lots of
refactoring
- planned namely for 64K pages (eg. arm64, ppc64)
- scrub read-only support
- preparatory work for zoned allocation mode (SMR/ZBC/ZNS friendly):
- disable incompatible features
- round-robin superblock write
- free space cache (v1) is loaded asynchronously, remove tree path
recursion
- slightly improved time tacking for transaction kthread wake ups
Performance improvements (note that the numbers depend on load type or
other features and weren't run on the same machine):
- skip unnecessary work:
- do not start readahead for csum tree when scrubbing non-data
block groups
- do not start and wait for delalloc on snapshot roots on
transaction commit
- fix race when defragmenting leads to unnecessary IO
- dbench speedups (+throughput%/-max latency%):
- skip unnecessary searches for xattrs when logging an inode
(+10.8/-8.2)
- stop incrementing log batch when joining log transaction (1-2)
- unlock path before checking if extent is shared during nocow
writeback (+5.0/-20.5), on fio load +9.7% throughput/-9.8%
runtime
- several tree log improvements, eg. removing unnecessary
operations, fixing races that lead to additional work
(+12.7/-8.2)
- tree-checker error branches annotated with unlikely() (+3%
throughput)
Other:
- cleanups
- lockdep fixes
- more btrfs_inode conversions
- error variable cleanups"
* tag 'for-5.11-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (198 commits)
btrfs: scrub: allow scrub to work with subpage sectorsize
btrfs: scrub: support subpage data scrub
btrfs: scrub: support subpage tree block scrub
btrfs: scrub: always allocate one full page for one sector for RAID56
btrfs: scrub: reduce width of extent_len/stripe_len from 64 to 32 bits
btrfs: refactor btrfs_lookup_bio_sums to handle out-of-order bvecs
btrfs: remove btrfs_find_ordered_sum call from btrfs_lookup_bio_sums
btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors
btrfs: update num_extent_pages to support subpage sized extent buffer
btrfs: don't allow tree block to cross page boundary for subpage support
btrfs: calculate inline extent buffer page size based on page size
btrfs: factor out btree page submission code to a helper
btrfs: make btrfs_verify_data_csum follow sector size
btrfs: pass bio_offset to check_data_csum() directly
btrfs: rename bio_offset of extent_submit_bio_start_t to dio_file_offset
btrfs: fix lockdep warning when creating free space tree
btrfs: skip space_cache v1 setup when not using it
btrfs: remove free space items when disabling space cache v1
btrfs: warn when remount will not change the free space tree
btrfs: use superblock state to print space_cache mount option
...
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEES8DXskRxsqGE6vXTAA5oQRlWghUFAl/XZLITHGpsYXl0b25A
a2VybmVsLm9yZwAKCRAADmhBGVaCFcqBD/9M40l1rZ5cq62f4j9/17jd8TDOfCCu
VFhngc7DzvlVQMSktvoQLJlRs/SDFQGr88RrzWp6xAwJO9F60/4zVFSrbfYfjEid
3hhIq8WioZotsGH3OWLArLUFLlLjtuNAP7WnLmacrqkx3y3BKGe5spKn9bxBxlgf
trRtXITf8fJ5K8eSooRYf28YyugRDa+Ue/Pe0TjWudzgcCp1dlWxQKt9Ag0N+q+E
6t5W5MgWWkfVcCX8Z2foL7I6Iqq4dqBfwZcopYjFHB9B+E6TN9rr6GA88xtKEaWG
4nSZ7GKksu1oNb3amFdE5IWFYuAuLh2+TQGaJdhzcX08CstdhuPPRehuvCCW5I8l
A9719WR6BW+KHHq4Id4eqpFR0g6y5Lx1JqBCsfIORuqna3pu19d9z+idVH50/TUw
gGVRs7txfSU0NPIpQaX2z96S3ZQZZmelSIzj9+sYIPe5u8LCBtO8PVyT/N0qXvzL
nf5t7rZGaTrUcGSeuPki01AhHbUNEx9EFnMJ5QuuXhPRq7WlP+BoQmLolRtuRxiF
KcMvvpWjgD9MfkHWOFDsTnQCquQk8mb0R7YcFWbomMmxI3JQdDly3JjKn519LQvO
mb320naW/oxnXHsaMHMM08azHsB+KhY84tW9c2iPB29swvTmOUrxXyhxdcFE3ayr
UezM2hjt/zT61w==
=rDoO
-----END PGP SIGNATURE-----
Merge tag 'locks-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull file locking fixes from Jeff Layton:
"A fix for some undefined integer overflow behavior, a typo in a
comment header, and a fix for a potential deadlock involving internal
senders of SIGIO/SIGURG"
* tag 'locks-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
fcntl: Fix potential deadlock in send_sig{io, urg}()
locks: fix a typo at a kernel-doc markup
locks: Fix UBSAN undefined behaviour in flock64_to_posix_lock
This extracts the "nameserver" previoiusly used only by the virtio rpmsg
transport to work ontop of any rpmsg implementation and clarifies the
endianness of the data types used in rpmsg.
-----BEGIN PGP SIGNATURE-----
iQJPBAABCAA5FiEEBd4DzF816k8JZtUlCx85Pw2ZrcUFAl/Y+oUbHGJqb3JuLmFu
ZGVyc3NvbkBsaW5hcm8ub3JnAAoJEAsfOT8Nma3Fv2UP/3VpDHLTD3UTNcqc5uc7
3cyWZRydhecLjLxJ5EfY3rCQ4ELPuHX0cyDgdSg/zJsKvcSgYcGnZ+ZVxCsAxhi1
nKrsOX9SIasdxGOs6+uI4Iu2fRW7UFooxT7K1YHL6CdW5W5RMqn36rniM32XLexw
eZ3Un0bnEjILJptNcZsNr+7IjCqJkJSoVGFBg3iRlAcMo5QqZ95OkEUDBMOijLoX
gKtV5Bnu8/QeMrq/U0MHrkvYgH3+9BIHn8WdueOVENtL3NFlTKZ4SQzSly3eJ2iR
Kohi8Zwt1hmKaS7YM471xM5G3gQTm131LOo2OifXih/Puj2Uj9W5zjPmmmDP6Zi8
WYYsCc7J+GlHjz/Sj20VURq0WI4FMFluhgOQuhByylI68xUph4gARSF7PXiKa3fO
B1KJ3dGnHx8u/xFBF5+oDmj/rQ4oFOj+1qvwx4TF9xISp8d0+Rr5OdqiBWOv9O5E
sktaXT4v0nSl8fRDPZf3abCvAASF1w9qJvnhWvBFY1NYkAbsKc3yXJW8XDKvR9RV
DVaeQBTt4ytpvQGOBs7EJ/D9rnqeqZSf4AlTn/TvsVnzPjaAEHyLVc0fRkDNhwvx
QAYCLd5lY/XyH/Yz+AZSMLM+H5D92kUpqIo0cfE+LgqF8b+sPBJf4Y521c4xtUH5
H1XHkaTlV7xV+vB+13vBuhj/
=yGKj
-----END PGP SIGNATURE-----
Merge tag 'rpmsg-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc
Pull rpmsg updates from Bjorn Andersson:
"This extracts the 'nameserver' previously used only by the virtio
rpmsg transport to work ontop of any rpmsg implementation and
clarifies the endianness of the data types used in rpmsg"
* tag 'rpmsg-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc:
rpmsg: Turn name service into a stand alone driver
rpmsg: Make rpmsg_{register|unregister}_device() public
rpmsg: virtio: Add rpmsg channel device ops
rpmsg: core: Add channel creation internal API
rpmsg: virtio: Rename rpmsg_create_channel
rpmsg: Move structure rpmsg_ns_msg to header file
rpmsg: virtio: Move from virtio to rpmsg byte conversion
rpmsg: Introduce __rpmsg{16|32|64} types