This is a preparatory patch for following work. Move the F_SETPIPE_SZ
limit-checking logic from pipe_fcntl() into pipe_set_size(). This
simplifies the code a little, and allows for reworking required in
a later patch that fixes the limit checking in pipe_set_size()
Link: http://lkml.kernel.org/r/3701b2c5-2c52-2c3e-226d-29b9deb29b50@gmail.com
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: <socketpair@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Jens Axboe <axboe@fb.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "pipe: fix limit handling", v2.
When changing a pipe's capacity with fcntl(F_SETPIPE_SZ), various limits
defined by /proc/sys/fs/pipe-* files are checked to see if unprivileged
users are exceeding limits on memory consumption.
While documenting and testing the operation of these limits I noticed
that, as currently implemented, these checks have a number of problems:
(1) When increasing the pipe capacity, the checks against the limits
in /proc/sys/fs/pipe-user-pages-{soft,hard} are made against
existing consumption, and exclude the memory required for the
increased pipe capacity. The new increase in pipe capacity can then
push the total memory used by the user for pipes (possibly far) over
a limit. This can also trigger the problem described next.
(2) The limit checks are performed even when the new pipe capacity
is less than the existing pipe capacity. This can lead to problems
if a user sets a large pipe capacity, and then the limits are
lowered, with the result that the user will no longer be able to
decrease the pipe capacity.
(3) As currently implemented, accounting and checking against the
limits is done as follows:
(a) Test whether the user has exceeded the limit.
(b) Make new pipe buffer allocation.
(c) Account new allocation against the limits.
This is racey. Multiple processes may pass point (a) simultaneously,
and then allocate pipe buffers that are accounted for only in step
(c). The race means that the user's pipe buffer allocation could be
pushed over the limit (by an arbitrary amount, depending on how
unlucky we were in the race). [Thanks to Vegard Nossum for spotting
this point, which I had missed.]
This patch series addresses these three problems.
This patch (of 8):
This is a minor preparatory patch. After subsequent patches,
round_pipe_size() will be called from pipe_set_size(), so place
round_pipe_size() above pipe_set_size().
Link: http://lkml.kernel.org/r/91a91fdb-a959-ba7f-b551-b62477cc98a1@gmail.com
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: <socketpair@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Jens Axboe <axboe@fb.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cmd part of this struct is the same as an index of itself within
_ioctls[]. In fact this cmd is unused, so we can drop this part.
Link: http://lkml.kernel.org/r/20160831033414.9910.66697.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Having this in autofs_i.h gives illusion that uncommenting this enables
pr_debug(), but it doesn't enable all the pr_debug() in autofs because
inclusion order matters.
XFS has the same DEBUG macro in its core header fs/xfs/xfs.h, however XFS
seems to have a rule to include this prior to other XFS headers as well as
kernel headers. This is not the case with autofs, and DEBUG could be
enabled via Makefile, so autofs should just get rid of this comment to
make the code less confusing. It's a comment, so there is literally no
functional difference.
Link: http://lkml.kernel.org/r/20160831033409.9910.77067.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since linux/auto_dev-ioctl.h wasn't included in include/linux/Kbuild
it wasn't moved to uapi/linux as part of the uapi series.
Link: http://lkml.kernel.org/r/20160812024901.12352.10984.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
linux/limits.h should be included by uapi instead of linux/auto_fs.h
so as not to cause compile error in userspace.
# cat << EOF > ./test1.c
> #include <stdio.h>
> #include <linux/auto_fs.h>
> int main(void) {
> return 0;
> }
> EOF
# gcc -Wall -g ./test1.c
In file included from ./test1.c:2:0:
/usr/include/linux/auto_fs.h:54:12: error: 'NAME_MAX' undeclared here (not in a function)
char name[NAME_MAX+1];
^
Link: http://lkml.kernel.org/r/20160812024856.12352.24092.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Ian Kent <ikent@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All other warnings use "cmd(0x%08x)" and this is the only one with
"cmd(%d)". (below comes from my userspace debug program, but not
automount daemon)
[ 1139.905676] autofs4:pid:1640:check_dev_ioctl_version: ioctl control interface version mismatch: kernel(1.0), user(0.0), cmd(-1072131215)
Link: http://lkml.kernel.org/r/20160812024851.12352.75458.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Ian Kent <ikent@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
No functional changes, based on the following justification.
1. Make the code more consistent using the ioctl vector _ioctls[],
rather than assigning NULL only for this ioctl command.
2. Remove goto done; for better maintainability in the long run.
3. The existing code is based on the fact that validate_dev_ioctl()
sets ioctl version for any command, but AUTOFS_DEV_IOCTL_VERSION_CMD
should explicitly set it regardless of the default behavior.
Link: http://lkml.kernel.org/r/20160812024846.12352.9885.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Ian Kent <ikent@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The count of miscellaneous device ioctls in fs/autofs4/autofs_i.h is wrong.
The number of ioctls is the difference between AUTOFS_DEV_IOCTL_VERSION_CMD
and AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD (14) not the difference between
AUTOFS_IOC_COUNT and 11 (21).
[kusumi.tomohiro@gmail.com: fix typo that made the count macro negative]
Link: http://lkml.kernel.org/r/20160831033420.9910.16809.stgit@pluto.themaw.net
Link: http://lkml.kernel.org/r/20160812024841.12352.11975.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This isn't a return value, so change the message to indicate the status is
the result of may_umount().
(or locate pr_debug() after put_user() with the same message)
Link: http://lkml.kernel.org/r/20160812024836.12352.74628.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Ian Kent <ikent@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sync with changes made by commit 730c9eeca9 ("autofs4: improve
parameter usage") which introduced an union for various ioctl commands
instead of having statically named arg1,2.
This commit simply replaces arg1,2 with the corresponding fields without
changing semantics.
Link: http://lkml.kernel.org/r/20160812024831.12352.24667.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Ian Kent <ikent@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The explanation on how ioctl handles devid seems incorrect. Userspace who
calls this ioctl has no input regarding devid, and ioctl implementation
retrieves devid via superblock.
Link: http://lkml.kernel.org/r/20160812024825.12352.13486.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Ian Kent <ikent@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This macro was never used by neither kernel nor userspace, and also
doesn't represent "devid length" in bytes. (unless it was added to mean
something else).
Link: http://lkml.kernel.org/r/20160812024820.12352.21210.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Ian Kent <ikent@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
These two were left from commit aa55ddf340 ("autofs4: remove unused
ioctls") which removed unused ioctls.
Link: http://lkml.kernel.org/r/20160812024810.12352.96377.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Ian Kent <ikent@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kfree dentry data allocated by autofs4_new_ino() with autofs4_free_ino()
instead of raw kfree. (since we have the interface to free autofs_info*)
This patch was modified to remove the need to set the dentry info field to
NULL dew to a change in the previous patch.
Link: http://lkml.kernel.org/r/20160812024805.12352.43650.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The inode allocation failure case in autofs4_dir_symlink() frees the
autofs dentry info of the dentry without setting ->d_fsdata to NULL.
That could lead to a double free so just get rid of the free and leave it
to ->d_release().
Link: http://lkml.kernel.org/r/20160812024759.12352.10653.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's invalid if the given mode is neither dir nor link, so warn on else
case.
Link: http://lkml.kernel.org/r/20160812024754.12352.8536.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Somewhere along the line the error handling gotos have become incorrect.
Link: http://lkml.kernel.org/r/20160812024749.12352.15100.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch does what the below comment says. It could be and it's
considered better to do this first before various functions get called
during initialization.
/* Couldn't this be tested earlier? */
Link: http://lkml.kernel.org/r/20160812024744.12352.43075.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
autofs4_kill_sb() doesn't need to be declared as extern, and no other
functions in .h are explicitly declared as extern.
Link: http://lkml.kernel.org/r/20160812024739.12352.99354.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
plus minor whitespace fixes.
Link: http://lkml.kernel.org/r/20160812024734.12352.17122.stgit@pluto.themaw.net
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
asm-generic headers are generic implementations for architecture specific
code and should not be included by common code. Thus use the asm/ version
of sections.h to get at the linker sections.
Link: http://lkml.kernel.org/r/1473602302-6208-1-git-send-email-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The function calls with octal permissions commonly span multiple lines.
The current test is line oriented and fails to find some matches.
Make the test use the $stat variable instead of the $line variable to span
multiple lines.
Also add a few functions to the known functions with permissions list.
Move the SYMBOLIC_PERMS test to a separate section to find all the S_<FOO>
permissions in any form not just those that have specific function names.
This can now find and fix permissions uses like:
.mode = S_<FOO> | S_<BAR>;
Link: http://lkml.kernel.org/r/b51bab60530912aae4ac420119d465c5b206f19f.1475030406.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Ramiro Oliveira <roliveir@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is possible for a multiple line macro definition to have a false positive
report when an argument is used on a line after a continuation \.
This line might have a leading '+' as the initial character that could be
confused by checkpatch as an operator.
Avoid the leading character on multiple line macro definitions.
Link: http://lkml.kernel.org/r/60229d13399f9b6509db5a32e30d4c16951a60cd.1473836073.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a test for macro arguents that have a non-comma leading or trailing
operator where the argument isn't parenthesized to avoid possible precedence
issues.
Link: http://lkml.kernel.org/r/47715508972f8d786f435e583ff881dbeee3a114.1473745855.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If a macro argument is used multiple times in the macro definition, the
macro argument may have an unexpected side-effect.
Add a test (MACRO_ARG_REUSE) for that condition which is only
emitted with command-line option --strict.
Link: http://lkml.kernel.org/r/b6d67a87cafcafd15499e91780dc63b15dec0aa0.1473744906.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
An "uninitialized value" is emitted when a block comment starts on
the same line as a statement.
Fix this and make the test use a little fewer cpu cycles too.
Link: http://lkml.kernel.org/r/3c9993320c2182d37f53ac540878cfef59c3f62d.1473365956.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Reported-by: Charlemagne Lasse <charlemagnelasse@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adding -f to the get_maintainer.pl invocation means git isn't invoked
by get_maintainer.pl for known filenames.
This reduces the overall time to run checkpatch.
Link: http://lkml.kernel.org/r/22991e3a295aeb399b43af0478b6e5809106ccee.1472684066.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Using const is generally a good idea.
Julia Lawall has created a list of always const and almost always const
structs in the kernel sources.
Link: https://lkml.org/lkml/2016/8/28/95
Add the most frequently used (> 50 cases) that are almost always or
always const.
Link: http://lkml.kernel.org/r/1e16020f8027654db0095bbfbcc11da51025365c.1472664220.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make it easier to add new structs that should be const.
Link: http://lkml.kernel.org/r/e5a8da43e7c11525bafbda1ca69a8323614dd942.1472664220.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
< sigh > Comment these tests out.
These are just too enticing to people that don't verify that
both source and dest addresses really must be __aligned(2).
It helps make Dan Carpenter happy too.
Link: http://lkml.kernel.org/r/dc32ec66d24647f4cdf824c8dfbbc59aa7ce7b7d.1472665676.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Greg <gvrose8192@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
S_<FOO> uses should be avoided where octal is more intelligible.
Linus didst say:
: It's *much* easier to parse and understand the octal numbers, while the
: symbolic macro names are just random line noise and hard as hell to
: understand. You really have to think about it.
:
: So we should rather go the other way: convert existing bad symbolic
: permission bit macro use to just use the octal numbers.
:
: The symbolic names are good for the *other* bits (ie sticky bit, and the
: inode mode _type_ numbers etc), but for the permission bits, the symbolic
: names are just insane crap. Nobody sane should ever use them. Not in the
: kernel, not in user space.
(http://lkml.kernel.org/r/CA+55aFw5v23T-zvDZp-MmD_EYxF8WbafwwB59934FV7g21uMGQ@mail.gmail.com)
Link: http://lkml.kernel.org/r/7232ef011d05a92f4caa86a5e9830d87966a2eaf.1470180926.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use get_maintainer to check the status of individual files. If
"obsolete", suggest leaving the files alone.
Link: http://lkml.kernel.org/r/7ceaa510dc9d2df05ec4b456baed7bb1415550b3.1471889575.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: SF Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Today there are platforms with many CPUs (up to 4K). Trying to boot only
part of the CPUs may result in too long string.
For example lets take NPS platform that is part of arch/arc. This
platform have SMP system with 256 cores each with 16 HW threads (SMT
machine) where HW thread appears as CPU to the kernel. In this example
there is total of 4K CPUs. When one tries to boot only part of the HW
threads from each core the string representing the map may be long... For
example if for sake of performance we decided to boot only first half of
HW threads of each core the map will look like:
0-7,16-23,32-39,...,4080-4087
This patch introduce new syntax to accommodate with such use case. I
added an optional postfix to a range of CPUs which will choose according
to given modulo the desired range of reminders i.e.:
<cpus range>:sed_size/group_size
For example, above map can be described in new syntax like this:
0-4095:8/16
Note that this patch is backward compatible with current syntax.
[akpm@linux-foundation.org: rework documentation]
Link: http://lkml.kernel.org/r/1473579629-4283-1-git-send-email-noamca@mellanox.com
Signed-off-by: Noam Camus <noamca@mellanox.com>
Cc: David Decotigny <decot@googlers.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: David S. Miller <davem@davemloft.net>
Cc: Pan Xinhui <xinhui@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Set "overflow" bit upon encountering it instead of postponing to the end
of the conversion. Somehow gcc unwedges itself and generates better code:
$ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux
_parse_integer 177 139 -38
Inspired by patch from Zhaoxiu Zeng.
Link: http://lkml.kernel.org/r/20160826221920.GA1909@p183.telecom.by
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make isdigit into a simple range checking inline function:
return '0' <= c && c <= '9';
This code is 1 branch, not 2 because any reasonable compiler can
optimize this code into SUB+CMP, so the code
while (isdigit((c = *s++)))
...
remains 1 branch per iteration HOWEVER it suddenly doesn't do table
lookup priming cacheline nobody cares about.
Link: http://lkml.kernel.org/r/20160826190047.GA12536@p183.telecom.by
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The strncpy_from_user() accessor is effectively a copy_from_user()
specialised to copy strings, terminating early at a NUL byte if possible.
In other respects it is identical, and can be used to copy an arbitrarily
large buffer from userspace into the kernel. Conceptually, it exposes a
similar attack surface.
As with copy_from_user(), we check the destination range when the kernel
is built with KASAN, but unlike copy_from_user() we do not check the
destination buffer when using HARDENED_USERCOPY. As strncpy_from_user()
calls get_user() in a loop, we must call check_object_size() explicitly.
This patch adds this instrumentation to strncpy_from_user(), per the same
rationale as with the regular copy_from_user(). In the absence of
hardened usercopy this will have no impact as the instrumentation expands
to an empty static inline function.
Link: http://lkml.kernel.org/r/1472221903-31181-1-git-send-email-mark.rutland@arm.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are four cases I can see where we could end up with a NULL 'slot' in
radix_tree_next_slot(). This unit test exercises all four of them, making
sure that if in the future we have an unsafe path through
radix_tree_next_slot(), we'll catch it.
Here are details on the four cases:
1) radix_tree_iter_retry() via a non-tagged iteration like
radix_tree_for_each_slot(). In this case we currently aren't seeing a bug
because radix_tree_iter_retry() sets
iter->next_index = iter->index;
which means that in in the else case in radix_tree_next_slot(), 'count' is
zero, so we skip over the while() loop and effectively just return NULL
without ever dereferencing 'slot'.
2) radix_tree_iter_retry() via tagged iteration like
radix_tree_for_each_tagged(). This case was giving us NULL pointer
dereferences in testing, and was fixed with this commit:
commit 3cb9185c67 ("radix-tree: fix radix_tree_iter_retry() for tagged
iterators.")
This fix doesn't explicitly check for 'slot' being NULL, though, it works
around the NULL pointer dereference by instead zeroing iter->tags in
radix_tree_iter_retry(), which makes us bail out of the if() case in
radix_tree_next_slot() before we dereference 'slot'.
3) radix_tree_iter_next() via via a non-tagged iteration like
radix_tree_for_each_slot(). This currently happens in shmem_tag_pins()
and shmem_partial_swap_usage().
As with non-tagged iteration, 'count' in the else case of
radix_tree_next_slot() is zero, so we skip over the while() loop and
effectively just return NULL without ever dereferencing 'slot'.
4) radix_tree_iter_next() via tagged iteration like
radix_tree_for_each_tagged(). This happens in shmem_wait_for_pins().
radix_tree_iter_next() zeros out iter->tags, so we end up exiting
radix_tree_next_slot() here:
if (flags & RADIX_TREE_ITER_TAGGED) {
void *canon = slot;
iter->tags >>= 1;
if (unlikely(!iter->tags))
return NULL;
Link: http://lkml.kernel.org/r/20160815194237.25967-3-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are four cases I can see where we could end up with a NULL 'slot' in
radix_tree_next_slot(). Yet radix_tree_next_slot() never actually checks
whether 'slot' is NULL. It just happens that for the cases where 'slot'
is NULL, some other combination of factors prevents us from dereferencing
it.
It would be very easy for someone to unwittingly change one of these
factors without realizing that we are implicitly depending on it to save
us from a NULL pointer dereference.
Add a comment documenting the things that allow 'slot' to be safely passed
as NULL to radix_tree_next_slot().
Here are details on the four cases:
1) radix_tree_iter_retry() via a non-tagged iteration like
radix_tree_for_each_slot(). In this case we currently aren't seeing a bug
because radix_tree_iter_retry() sets
iter->next_index = iter->index;
which means that in in the else case in radix_tree_next_slot(), 'count' is
zero, so we skip over the while() loop and effectively just return NULL
without ever dereferencing 'slot'.
2) radix_tree_iter_retry() via tagged iteration like
radix_tree_for_each_tagged(). This case was giving us NULL pointer
dereferences in testing, and was fixed with this commit:
commit 3cb9185c67 ("radix-tree: fix radix_tree_iter_retry() for tagged
iterators.")
This fix doesn't explicitly check for 'slot' being NULL, though, it works
around the NULL pointer dereference by instead zeroing iter->tags in
radix_tree_iter_retry(), which makes us bail out of the if() case in
radix_tree_next_slot() before we dereference 'slot'.
3) radix_tree_iter_next() via via a non-tagged iteration like
radix_tree_for_each_slot(). This currently happens in shmem_tag_pins()
and shmem_partial_swap_usage().
As with non-tagged iteration, 'count' in the else case of
radix_tree_next_slot() is zero, so we skip over the while() loop and
effectively just return NULL without ever dereferencing 'slot'.
4) radix_tree_iter_next() via tagged iteration like
radix_tree_for_each_tagged(). This happens in shmem_wait_for_pins().
radix_tree_iter_next() zeros out iter->tags, so we end up exiting
radix_tree_next_slot() here:
if (flags & RADIX_TREE_ITER_TAGGED) {
void *canon = slot;
iter->tags >>= 1;
if (unlikely(!iter->tags))
return NULL;
Link: http://lkml.kernel.org/r/20160815194237.25967-2-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows
with the number of fds passed. We had a customer report page allocation
failures of order-4 for this allocation. This is a costly order, so it might
easily fail, as the VM expects such allocation to have a lower-order fallback.
Such trivial fallback is vmalloc(), as the memory doesn't have to be physically
contiguous and the allocation is temporary for the duration of the syscall
only. There were some concerns, whether this would have negative impact on the
system by exposing vmalloc() to userspace. Although an excessive use of vmalloc
can cause some system wide performance issues - TLB flushes etc. - a large
order allocation is not for free either and an excessive reclaim/compaction can
have a similar effect. Also note that the size is effectively limited by
RLIMIT_NOFILE which defaults to 1024 on the systems I checked. That means the
bitmaps will fit well within single page and thus the vmalloc() fallback could
be only excercised for processes where root allows a higher limit.
Note that the poll(2) syscall seems to use a linked list of order-0 pages, so
it doesn't need this kind of fallback.
[eric.dumazet@gmail.com: fix failure path logic]
[akpm@linux-foundation.org: use proper type for size]
Link: http://lkml.kernel.org/r/20160927084536.5923-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been whitelisted
for zeroing SCSI UNMAP. Punch still requires that FALLOC_FL_KEEP_SIZE is
set. A length that goes past the end of the device will be clamped to the
device size if KEEP_SIZE is set; or will return -EINVAL if not. Both
start and length must be aligned to the device's logical block size.
Since the semantics of fallocate are fairly well established already, wire
up the two pieces. The other fallocate variants (collapse range, insert
range, and allocate blocks) are not supported.
Link: http://lkml.kernel.org/r/147518379992.22791.8849838163218235007.stgit@birch.djwong.org
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Mike Snitzer <snitzer@redhat.com> # tweaked header
Cc: Brian Foster <bfoster@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make sure that the offset and length arguments that we're using to
construct WRITE SAME and DISCARD requests are actually aligned to the
logical block size. Failure to do this causes other errors in other parts
of the block layer or the SCSI layer because disks don't support partial
logical block writes.
Link: http://lkml.kernel.org/r/147518379026.22791.4437508871355153928.stgit@birch.djwong.org
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Mike Snitzer <snitzer@redhat.com> # tweaked header
Cc: Brian Foster <bfoster@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "fallocate for block devices", v11.
This is a patchset to fix page cache coherency with BLKZEROOUT and
implement fallocate for block devices.
The first patch is a fix to the existing BLKZEROOUT ioctl to invalidate
the page cache if the zeroing command to the underlying device succeeds.
Without this patch we still have the pagecache coherence bug that's been
in the kernel forever.
The second patch changes the internal block device functions to reject
attempts to discard or zeroout that are not aligned to the logical block
size. Previously, we only checked that the start/len parameters were
512-byte aligned, which caused kernel BUG_ONs for unaligned IOs to 4k-LBA
devices.
The third patch creates an fallocate handler for block devices, wires up
the FALLOC_FL_PUNCH_HOLE flag to zeroing-discard, and connects
FALLOC_FL_ZERO_RANGE to write-same so that we can have a consistent
fallocate interface between files and block devices. It also allows the
combination of PUNCH_HOLE and NO_HIDE_STALE to invoke non-zeroing discard.
Test cases for the new block device fallocate are now in xfstests as
generic/349-351.
This patch (of 3):
Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
returning stale cache contents at a later time.
Link: http://lkml.kernel.org/r/147518378313.22791.16649519283678515021.stgit@birch.djwong.org
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In the dlm_migrate_request_handler(), when `ret' is -EEXIST, the mle
should be freed, otherwise the memory will be leaked.
Link: http://lkml.kernel.org/r/71604351584F6A4EBAE558C676F37CA4A3D3522A@H3CMLB12-EX.srv.huawei-3com.com
Signed-off-by: Guozhonghua <guozhonghua@h3c.com>
Reviewed-by: Mark Fasheh <mfasheh@versity.com>
Cc: Eric Ren <zren@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
it actually worked only when requested area ended on the page boundary...
Reported-by: Marco Grassi <marco.gra@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking fixes from David Miller:
1) Netfilter list handling fix, from Linus.
2) RXRPC/AFS bug fixes from David Howells (oops on call to serviceless
endpoints, build warnings, missing notifications, etc.) From David
Howells.
3) Kernel log message missing newlines, from Colin Ian King.
4) Don't enter direct reclaim in netlink dumps, the idea is to use a
high order allocation first and fallback quickly to a 0-order
allocation if such a high-order one cannot be done cheaply and
without reclaim. From Eric Dumazet.
5) Fix firmware download errors in btusb bluetooth driver, from Ethan
Hsieh.
6) Missing Kconfig deps for QCOM_EMAC, from Geert Uytterhoeven.
7) Fix MDIO_XGENE dup Kconfig entry. From Laura Abbott.
8) Constrain ipv6 rtr_solicits sysctl values properly, from Maciej
Żenczykowski.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
netfilter: Fix slab corruption.
be2net: Enable VF link state setting for BE3
be2net: Fix TX stats for TSO packets
be2net: Update Copyright string in be_hw.h
be2net: NCSI FW section should be properly updated with ethtool for BE3
be2net: Provide an alternate way to read pf_num for BEx chips
wan/fsl_ucc_hdlc: Fix size used in dma_free_coherent()
net: macb: NULL out phydev after removing mdio bus
xen-netback: make sure that hashes are not send to unaware frontends
Fixing a bug in team driver due to incorrect 'unsigned int' to 'int' conversion
MAINTAINERS: add myself as a maintainer of xen-netback
ipv6 addrconf: disallow rtr_solicits < -1
Bluetooth: btusb: Fix atheros firmware download error
drivers: net: phy: Correct duplicate MDIO_XGENE entry
ethernet: qualcomm: QCOM_EMAC should depend on HAS_DMA and HAS_IOMEM
net: ethernet: mediatek: remove hwlro property in the device tree
net: ethernet: mediatek: get hw lro capability by the chip id instead of by the dtsi
net: ethernet: mediatek: get the chip id by ETHDMASYS registers
net: bgmac: Fix errant feature flag check
netlink: do not enter direct reclaim from netlink_dump()
...