Commit Graph

32820 Commits

Author SHA1 Message Date
Linus Torvalds
20e2aa8126 Merge tag 'perf-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "Three fixes/updates for perf:

   - Fix the perf event cgroup tracking which tries to track the cgroup
     even for disabled events.

   - Add Ice Lake server support for uncore events

   - Disable pagefaults when retrieving the physical address in the
     sampling code"

* tag 'perf-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Disable page faults when getting phys address
  perf/x86/intel/uncore: Add Ice Lake server uncore support
  perf/cgroup: Correct indirection in perf_less_group_idx()
  perf/core: Fix event cgroup tracking
2020-04-12 10:05:24 -07:00
Linus Torvalds
652fa53caa Merge tag 'locking-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
 "Three small fixes/updates for the locking core code:

   - Plug a task struct reference leak in the percpu rswem
     implementation.

   - Document the refcount interaction with PID_MAX_LIMIT

   - Improve the 'invalid wait context' data dump in lockdep so it
     contains all information which is required to decode the problem"

* tag 'locking-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/lockdep: Improve 'invalid wait context' splat
  locking/refcount: Document interaction with PID_MAX_LIMIT
  locking/percpu-rwsem: Fix a task_struct refcount
2020-04-12 09:47:10 -07:00
Linus Torvalds
75e7188397 Merge tag 'dma-mapping-5.7-1' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:

 - fix an integer truncation in dma_direct_get_required_mask
   (Kishon Vijay Abraham)

 - fix the display of dma mapping types (Grygorii Strashko)

* tag 'dma-mapping-5.7-1' of git://git.infradead.org/users/hch/dma-mapping:
  dma-debug: fix displaying of dma allocation type
  dma-direct: fix data truncation in dma_direct_get_required_mask()
2020-04-11 11:34:36 -07:00
Linus Torvalds
5b8b9d0c6d Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:

 - Almost all of the rest of MM (memcg, slab-generic, slab, pagealloc,
   gup, hugetlb, pagemap, memremap)

 - Various other things (hfs, ocfs2, kmod, misc, seqfile)

* akpm: (34 commits)
  ipc/util.c: sysvipc_find_ipc() should increase position index
  kernel/gcov/fs.c: gcov_seq_next() should increase position index
  fs/seq_file.c: seq_read(): add info message about buggy .next functions
  drivers/dma/tegra20-apb-dma.c: fix platform_get_irq.cocci warnings
  change email address for Pali Rohár
  selftests: kmod: test disabling module autoloading
  selftests: kmod: fix handling test numbers above 9
  docs: admin-guide: document the kernel.modprobe sysctl
  fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once()
  kmod: make request_module() return an error when autoloading is disabled
  mm/memremap: set caching mode for PCI P2PDMA memory to WC
  mm/memory_hotplug: add pgprot_t to mhp_params
  powerpc/mm: thread pgprot_t through create_section_mapping()
  x86/mm: introduce __set_memory_prot()
  x86/mm: thread pgprot_t through init_memory_mapping()
  mm/memory_hotplug: rename mhp_restrictions to mhp_params
  mm/memory_hotplug: drop the flags field from struct mhp_restrictions
  mm/special: create generic fallbacks for pte_special() and pte_mkspecial()
  mm/vma: introduce VM_ACCESS_FLAGS
  mm/vma: define a default value for VM_DATA_DEFAULT_FLAGS
  ...
2020-04-10 17:57:48 -07:00
Vasily Averin
f4d74ef622 kernel/gcov/fs.c: gcov_seq_next() should increase position index
If seq_file .next function does not change position index, read after
some lseek can generate unexpected output.

https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: NeilBrown <neilb@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Waiman Long <longman@redhat.com>
Link: http://lkml.kernel.org/r/f65c6ee7-bd00-f910-2f8a-37cc67e4ff88@virtuozzo.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10 15:36:22 -07:00
Eric Biggers
d7d27cfc5c kmod: make request_module() return an error when autoloading is disabled
Patch series "module autoloading fixes and cleanups", v5.

This series fixes a bug where request_module() was reporting success to
kernel code when module autoloading had been completely disabled via
'echo > /proc/sys/kernel/modprobe'.

It also addresses the issues raised on the original thread
(https://lkml.kernel.org/lkml/20200310223731.126894-1-ebiggers@kernel.org/T/#u)
bydocumenting the modprobe sysctl, adding a self-test for the empty path
case, and downgrading a user-reachable WARN_ONCE().

This patch (of 4):

It's long been possible to disable kernel module autoloading completely
(while still allowing manual module insertion) by setting
/proc/sys/kernel/modprobe to the empty string.

This can be preferable to setting it to a nonexistent file since it
avoids the overhead of an attempted execve(), avoids potential
deadlocks, and avoids the call to security_kernel_module_request() and
thus on SELinux-based systems eliminates the need to write SELinux rules
to dontaudit module_request.

However, when module autoloading is disabled in this way,
request_module() returns 0.  This is broken because callers expect 0 to
mean that the module was successfully loaded.

Apparently this was never noticed because this method of disabling
module autoloading isn't used much, and also most callers don't use the
return value of request_module() since it's always necessary to check
whether the module registered its functionality or not anyway.

But improperly returning 0 can indeed confuse a few callers, for example
get_fs_type() in fs/filesystems.c where it causes a WARNING to be hit:

	if (!fs && (request_module("fs-%.*s", len, name) == 0)) {
		fs = __get_fs_type(name, len);
		WARN_ONCE(!fs, "request_module fs-%.*s succeeded, but still no fs?\n", len, name);
	}

This is easily reproduced with:

	echo > /proc/sys/kernel/modprobe
	mount -t NONEXISTENT none /

It causes:

	request_module fs-NONEXISTENT succeeded, but still no fs?
	WARNING: CPU: 1 PID: 1106 at fs/filesystems.c:275 get_fs_type+0xd6/0xf0
	[...]

This should actually use pr_warn_once() rather than WARN_ONCE(), since
it's also user-reachable if userspace immediately unloads the module.
Regardless, request_module() should correctly return an error when it
fails.  So let's make it return -ENOENT, which matches the error when
the modprobe binary doesn't exist.

I've also sent patches to document and test this case.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jessica Yu <jeyu@kernel.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Ben Hutchings <benh@debian.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200310223731.126894-1-ebiggers@kernel.org
Link: http://lkml.kernel.org/r/20200312202552.241885-1-ebiggers@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10 15:36:22 -07:00
Sergey Senozhatsky
ab6f762f0f printk: queue wake_up_klogd irq_work only if per-CPU areas are ready
printk_deferred(), similarly to printk_safe/printk_nmi, does not
immediately attempt to print a new message on the consoles, avoiding
calls into non-reentrant kernel paths, e.g. scheduler or timekeeping,
which potentially can deadlock the system.

Those printk() flavors, instead, rely on per-CPU flush irq_work to print
messages from safer contexts.  For same reasons (recursive scheduler or
timekeeping calls) printk() uses per-CPU irq_work in order to wake up
user space syslog/kmsg readers.

However, only printk_safe/printk_nmi do make sure that per-CPU areas
have been initialised and that it's safe to modify per-CPU irq_work.
This means that, for instance, should printk_deferred() be invoked "too
early", that is before per-CPU areas are initialised, printk_deferred()
will perform illegal per-CPU access.

Lech Perczak [0] reports that after commit 1b710b1b10 ("char/random:
silence a lockdep splat with printk()") user-space syslog/kmsg readers
are not able to read new kernel messages.

The reason is printk_deferred() being called too early (as was pointed
out by Petr and John).

Fix printk_deferred() and do not queue per-CPU irq_work before per-CPU
areas are initialized.

Link: https://lore.kernel.org/lkml/aa0732c6-5c4e-8a8b-a1c1-75ebe3dca05b@camlintechnologies.com/
Reported-by: Lech Perczak <l.perczak@camlintechnologies.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Jann Horn <jannh@google.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10 13:18:57 -07:00
Linus Torvalds
87ad46e601 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull proc fix from Eric Biederman:
 "A brown paper bag slipped through my proc changes, and syzcaller
  caught it when the code ended up in your tree.

  I have opted to fix it the simplest cleanest way I know how, so there
  is no reasonable chance for the bug to repeat"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  proc: Use a dedicated lock in struct pid
2020-04-10 12:59:56 -07:00
Linus Torvalds
bbec2a2dc3 Merge tag 'pm-5.7-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
 "Rework compat ioctl handling in the user space hibernation interface
  (Christoph Hellwig) and fix a typo in a function name in the cpuidle
  haltpoll driver (Yihao Wu)"

* tag 'pm-5.7-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpuidle-haltpoll: Fix small typo
  PM / sleep: handle the compat case in snapshot_set_swap_area()
  PM / sleep: move SNAPSHOT_SET_SWAP_AREA handling into a helper
2020-04-10 09:50:00 -07:00
Linus Torvalds
c0cc271173 Merge tag 'modules-for-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull module updates from Jessica Yu:
 "Only a small cleanup this time around: a trivial conversion of
  zero-length arrays to flexible arrays"

* tag 'modules-for-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  kernel: module: Replace zero-length array with flexible-array member
2020-04-09 12:52:34 -07:00
Eric W. Biederman
63f818f46a proc: Use a dedicated lock in struct pid
syzbot wrote:
> ========================================================
> WARNING: possible irq lock inversion dependency detected
> 5.6.0-syzkaller #0 Not tainted
> --------------------------------------------------------
> swapper/1/0 just changed the state of lock:
> ffffffff898090d8 (tasklist_lock){.+.?}-{2:2}, at: send_sigurg+0x9f/0x320 fs/fcntl.c:840
> but this lock took another, SOFTIRQ-unsafe lock in the past:
>  (&pid->wait_pidfd){+.+.}-{2:2}
>
>
> and interrupts could create inverse lock ordering between them.
>
>
> other info that might help us debug this:
>  Possible interrupt unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(&pid->wait_pidfd);
>                                local_irq_disable();
>                                lock(tasklist_lock);
>                                lock(&pid->wait_pidfd);
>   <Interrupt>
>     lock(tasklist_lock);
>
>  *** DEADLOCK ***
>
> 4 locks held by swapper/1/0:

The problem is that because wait_pidfd.lock is taken under the tasklist
lock.  It must always be taken with irqs disabled as tasklist_lock can be
taken from interrupt context and if wait_pidfd.lock was already taken this
would create a lock order inversion.

Oleg suggested just disabling irqs where I have added extra calls to
wait_pidfd.lock.  That should be safe and I think the code will eventually
do that.  It was rightly pointed out by Christian that sharing the
wait_pidfd.lock was a premature optimization.

It is also true that my pre-merge window testing was insufficient.  So
remove the premature optimization and give struct pid a dedicated lock of
it's own for struct pid things.  I have verified that lockdep sees all 3
paths where we take the new pid->lock and lockdep does not complain.

It is my current day dream that one day pid->lock can be used to guard the
task lists as well and then the tasklist_lock won't need to be held to
deliver signals.  That will require taking pid->lock with irqs disabled.

Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/lkml/00000000000011d66805a25cd73f@google.com/
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Reported-by: syzbot+343f75cdeea091340956@syzkaller.appspotmail.com
Reported-by: syzbot+832aabf700bc3ec920b9@syzkaller.appspotmail.com
Reported-by: syzbot+f675f964019f884dbd0f@syzkaller.appspotmail.com
Reported-by: syzbot+a9fb1457d720a55d6dc5@syzkaller.appspotmail.com
Fixes: 7bc3e6e55a ("proc: Use a list of inodes to flush from proc")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-04-09 12:15:35 -05:00
Grygorii Strashko
9bb50ed747 dma-debug: fix displaying of dma allocation type
The commit 2e05ea5cdc ("dma-mapping: implement dma_map_single_attrs using
dma_map_page_attrs") removed "dma_debug_page" enum, but missed to update
type2name string table. This causes incorrect displaying of dma allocation
type.
Fix it by removing "page" string from type2name string table and switch to
use named initializers.

Before (dma_alloc_coherent()):
k3-ringacc 4b800000.ringacc: scather-gather idx 2208 P=d1140000 N=d114 D=d1140000 L=40 DMA_BIDIRECTIONAL dma map error check not applicable
k3-ringacc 4b800000.ringacc: scather-gather idx 2216 P=d1150000 N=d115 D=d1150000 L=40 DMA_BIDIRECTIONAL dma map error check not applicable

After:
k3-ringacc 4b800000.ringacc: coherent idx 2208 P=d1140000 N=d114 D=d1140000 L=40 DMA_BIDIRECTIONAL dma map error check not applicable
k3-ringacc 4b800000.ringacc: coherent idx 2216 P=d1150000 N=d115 D=d1150000 L=40 DMA_BIDIRECTIONAL dma map error check not applicable

Fixes: 2e05ea5cdc ("dma-mapping: implement dma_map_single_attrs using dma_map_page_attrs")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-04-08 21:46:57 +02:00
Kishon Vijay Abraham I
cdcda0d1f8 dma-direct: fix data truncation in dma_direct_get_required_mask()
The upper 32-bit physical address gets truncated inadvertently
when dma_direct_get_required_mask() invokes phys_to_dma_direct().
This results in dma_addressing_limited() return incorrect value
when used in platforms with LPAE enabled.
Fix it here by explicitly type casting 'max_pfn' to phys_addr_t
in order to prevent overflow of intermediate value while evaluating
'(max_pfn - 1) << PAGE_SHIFT'.

Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-04-08 20:52:24 +02:00
Peter Zijlstra
9a019db0b6 locking/lockdep: Improve 'invalid wait context' splat
The 'invalid wait context' splat doesn't print all the information
required to reconstruct / validate the error, specifically the
irq-context state is missing.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-08 12:05:07 +02:00
Qian Cai
d22cc7f67d locking/percpu-rwsem: Fix a task_struct refcount
The following commit:

  7f26482a87 ("locking/percpu-rwsem: Remove the embedded rwsem")

introduced task_struct memory leaks due to messing up the task_struct
refcount.

At the beginning of percpu_rwsem_wake_function(), it calls get_task_struct(),
but if the trylock failed, it will remain in the waitqueue. However, it
will run percpu_rwsem_wake_function() again with get_task_struct() to
increase the refcount but then only call put_task_struct() once the trylock
succeeded.

Fix it by adjusting percpu_rwsem_wake_function() a bit to guard against
when percpu_rwsem_wait() observing !private, terminating the wait and
doing a quick exit() while percpu_rwsem_wake_function() then doing
wake_up_process(p) as a use-after-free.

Fixes: 7f26482a87 ("locking/percpu-rwsem: Remove the embedded rwsem")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200330213002.2374-1-cai@lca.pw
2020-04-08 12:05:06 +02:00
Jiri Olsa
d3296fb372 perf/core: Disable page faults when getting phys address
We hit following warning when running tests on kernel
compiled with CONFIG_DEBUG_ATOMIC_SLEEP=y:

 WARNING: CPU: 19 PID: 4472 at mm/gup.c:2381 __get_user_pages_fast+0x1a4/0x200
 CPU: 19 PID: 4472 Comm: dummy Not tainted 5.6.0-rc6+ #3
 RIP: 0010:__get_user_pages_fast+0x1a4/0x200
 ...
 Call Trace:
  perf_prepare_sample+0xff1/0x1d90
  perf_event_output_forward+0xe8/0x210
  __perf_event_overflow+0x11a/0x310
  __intel_pmu_pebs_event+0x657/0x850
  intel_pmu_drain_pebs_nhm+0x7de/0x11d0
  handle_pmi_common+0x1b2/0x650
  intel_pmu_handle_irq+0x17b/0x370
  perf_event_nmi_handler+0x40/0x60
  nmi_handle+0x192/0x590
  default_do_nmi+0x6d/0x150
  do_nmi+0x2f9/0x3c0
  nmi+0x8e/0xd7

While __get_user_pages_fast() is IRQ-safe, it calls access_ok(),
which warns on:

  WARN_ON_ONCE(!in_task() && !pagefault_disabled())

Peter suggested disabling page faults around __get_user_pages_fast(),
which gets rid of the warning in access_ok() call.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200407141427.3184722-1-jolsa@kernel.org
2020-04-08 11:33:46 +02:00
Ian Rogers
24fb6b8e7c perf/cgroup: Correct indirection in perf_less_group_idx()
The void* in perf_less_group_idx() is to a member in the array which points
at a perf_event*, as such it is a perf_event**.

Reported-By: John Sperbeck <jsperbeck@google.com>
Fixes: 6eef8a7116 ("perf/core: Use min_heap in visit_groups_merge()")
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200321164331.107337-1-irogers@google.com
2020-04-08 11:33:45 +02:00
Peter Zijlstra
33238c5045 perf/core: Fix event cgroup tracking
Song reports that installing cgroup events is broken since:

  db0503e4f6 ("perf/core: Optimize perf_install_in_event()")

The problem being that cgroup events try to track cpuctx->cgrp even
for disabled events, which is pointless and actively harmful since the
above commit. Rework the code to have explicit enable/disable hooks
for cgroup events, such that we can limit cgroup tracking to active
events.

More specifically, since the above commit disabled events are no
longer added to their context from the 'right' CPU, and we can't
access things like the current cgroup for a remote CPU.

Cc: <stable@vger.kernel.org> # v5.5+
Fixes: db0503e4f6 ("perf/core: Optimize perf_install_in_event()")
Reported-by: Song Liu <songliubraving@fb.com>
Tested-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200318193337.GB20760@hirez.programming.kicks-ass.net
2020-04-08 11:33:44 +02:00
Gustavo A. R. Silva
6524d79413 kernel/gcov/fs.c: replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language extension
to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning in
case the flexible array does not occur last in the structure, which will
help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by this
change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied.  As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Link: http://lkml.kernel.org/r/20200302224851.GA26467@embeddedor
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:44 -07:00
Gustavo A. R. Silva
7ff87182d1 gcov: gcc_3_4: replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language extension
to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning in
case the flexible array does not occur last in the structure, which will
help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by this
change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied.  As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Link: http://lkml.kernel.org/r/20200302224501.GA14175@embeddedor
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:44 -07:00
Gustavo A. R. Silva
fba4168ede gcov: gcc_4_7: replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language extension
to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning in
case the flexible array does not occur last in the structure, which will
help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by this
change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied.  As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Link: http://lkml.kernel.org/r/20200213152241.GA877@embeddedor
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:44 -07:00
Qiujun Huang
06d4f8152a kernel/kmod.c: fix a typo "assuems" -> "assumes"
There is a typo in comment.  Fix it.  s/assuems/assumes/

Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Link: http://lkml.kernel.org/r/1585891029-6450-1-git-send-email-hqjagain@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:44 -07:00
Will Deacon
0bd476e6c6 kallsyms: unexport kallsyms_lookup_name() and kallsyms_on_each_symbol()
kallsyms_lookup_name() and kallsyms_on_each_symbol() are exported to
modules despite having no in-tree users and being wide open to abuse by
out-of-tree modules that can use them as a method to invoke arbitrary
non-exported kernel functions.

Unexport kallsyms_lookup_name() and kallsyms_on_each_symbol().

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Quentin Perret <qperret@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: http://lkml.kernel.org/r/20200221114404.14641-4-will@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:44 -07:00
Masahiro Yamada
889b3c1245 compiler: remove CONFIG_OPTIMIZE_INLINING entirely
Commit ac7c3e4ff4 ("compiler: enable CONFIG_OPTIMIZE_INLINING
forcibly") made this always-on option. We released v5.4 and v5.5
including that commit.

Remove the CONFIG option and clean up the code now.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Miller <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200220110807.32534-2-masahiroy@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:42 -07:00
Nathan Chancellor
63174f61df kernel/extable.c: use address-of operator on section symbols
Clang warns:

../kernel/extable.c:37:52: warning: array comparison always evaluates to
a constant [-Wtautological-compare]
        if (main_extable_sort_needed && __stop___ex_table > __start___ex_table) {
                                                          ^
1 warning generated.

These are not true arrays, they are linker defined symbols, which are just
addresses.  Using the address of operator silences the warning and does
not change the resulting assembly with either clang/ld.lld or gcc/ld
(tested with diff + objdump -Dr).

Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/892
Link: http://lkml.kernel.org/r/20200219202036.45702-1-natechancellor@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:42 -07:00
Alexey Dobriyan
d919b33daf proc: faster open/read/close with "permanent" files
Now that "struct proc_ops" exist we can start putting there stuff which
could not fly with VFS "struct file_operations"...

Most of fs/proc/inode.c file is dedicated to make open/read/.../close
reliable in the event of disappearing /proc entries which usually happens
if module is getting removed.  Files like /proc/cpuinfo which never
disappear simply do not need such protection.

Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such
"permanent" files.

Enable "permanent" flag for

	/proc/cpuinfo
	/proc/kmsg
	/proc/modules
	/proc/slabinfo
	/proc/stat
	/proc/sysvipc/*
	/proc/swaps

More will come once I figure out foolproof way to prevent out module
authors from marking their stuff "permanent" for performance reasons
when it is not.

This should help with scalability: benchmark is "read /proc/cpuinfo R times
by N threads scattered over the system".

	N	R	t, s (before)	t, s (after)
	-----------------------------------------------------
	64	4096	1.582458	1.530502	-3.2%
	256	4096	6.371926	6.125168	-3.9%
	1024	4096	25.64888	24.47528	-4.6%

Benchmark source:

#include <chrono>
#include <iostream>
#include <thread>
#include <vector>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

const int NR_CPUS = sysconf(_SC_NPROCESSORS_ONLN);
int N;
const char *filename;
int R;

int xxx = 0;

int glue(int n)
{
	cpu_set_t m;
	CPU_ZERO(&m);
	CPU_SET(n, &m);
	return sched_setaffinity(0, sizeof(cpu_set_t), &m);
}

void f(int n)
{
	glue(n % NR_CPUS);

	while (*(volatile int *)&xxx == 0) {
	}

	for (int i = 0; i < R; i++) {
		int fd = open(filename, O_RDONLY);
		char buf[4096];
		ssize_t rv = read(fd, buf, sizeof(buf));
		asm volatile ("" :: "g" (rv));
		close(fd);
	}
}

int main(int argc, char *argv[])
{
	if (argc < 4) {
		std::cerr << "usage: " << argv[0] << ' ' << "N /proc/filename R
";
		return 1;
	}

	N = atoi(argv[1]);
	filename = argv[2];
	R = atoi(argv[3]);

	for (int i = 0; i < NR_CPUS; i++) {
		if (glue(i) == 0)
			break;
	}

	std::vector<std::thread> T;
	T.reserve(N);
	for (int i = 0; i < N; i++) {
		T.emplace_back(f, i);
	}

	auto t0 = std::chrono::system_clock::now();
	{
		*(volatile int *)&xxx = 1;
		for (auto& t: T) {
			t.join();
		}
	}
	auto t1 = std::chrono::system_clock::now();
	std::chrono::duration<double> dt = t1 - t0;
	std::cout << dt.count() << '
';

	return 0;
}

P.S.:
Explicit randomization marker is added because adding non-function pointer
will silently disable structure layout randomization.

[akpm@linux-foundation.org: coding style fixes]
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Joe Perches <joe@perches.com>
Link: http://lkml.kernel.org/r/20200222201539.GA22576@avx2
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:42 -07:00
Anshuman Khandual
03911132aa mm/vma: replace all remaining open encodings with is_vm_hugetlb_page()
This replaces all remaining open encodings with is_vm_hugetlb_page().

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Will Deacon <will@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/1582520593-30704-4-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:37 -07:00
Anshuman Khandual
3122e80efc mm/vma: make vma_is_accessible() available for general use
Lets move vma_is_accessible() helper to include/linux/mm.h which makes it
available for general use.  While here, this replaces all remaining open
encodings for VMA access check with vma_is_accessible().

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Guo Ren <guoren@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Guo Ren <guoren@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/1582520593-30704-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:37 -07:00
Li Xinhai
e39a4b332d mm: set vm_next and vm_prev to NULL in vm_area_dup()
Set ->vm_next and ->vm_prev to NULL to prevent potential misuse from the
new duplicated vma.

Currently, only in fork path there are misuse for handling anon_vma.  No
other bugs been revealed with this patch applied.

Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Link: http://lkml.kernel.org/r/1581150928-3214-4-git-send-email-lixinhai.lxh@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:37 -07:00
Li Xinhai
93949bb21b mm: don't prepare anon_vma if vma has VM_WIPEONFORK
Patch series "mm: Fix misuse of parent anon_vma in dup_mmap path".

This patchset fixes the misuse of parenet anon_vma, which mainly caused by
child vma's vm_next and vm_prev are left same as its parent after
duplicate vma.  Finally, code reached parent vma's neighbor by referring
pointer of child vma and executed wrong logic.

The first two patches fix relevant issues, and the third patch sets
vm_next and vm_prev to NULL when duplicate vma to prevent potential misuse
in future.

Effects of the first bug is that causes rmap code to check both parent and
child's page table, although a page couldn't be mapped by both parent and
child, because child vma has WIPEONFORK so all pages mapped by child are
'new' and not relevant to parent.

Effects of the second bug is that the relationship of anon_vma of parent
and child are totallyconvoluted.  It would cause 'son', 'grandson', ...,
etc, to share 'parent' anon_vma, which disobey the design rule of reusing
anon_vma (the rule to be followed is that reusing should among vma of same
process, and vma should not gone through fork).

So, both issues should cause unnecessary rmap walking and have unexpected
complexity.

These two issues would not be directly visible, I used debugging code to
check the anon_vma pointers of parent and child when inspecting the
suspicious implementation of issue #2, then find the problem.

This patch (of 3):

In dup_mmap(), anon_vma_prepare() is called for vma has VM_WIPEONFORK, and
parameter 'tmp' (i.e., the new vma of child) has same ->vm_next and
->vm_prev as its parent vma.  That allows anon_vma used by parent been
mistakenly shared by child (find_mergeable_anon_vma() will do this reuse
work).

Besides this issue, call anon_vma_prepare() should be avoided because we
don't copy page for this vma.  Preparing anon_vma will be handled during
fault.

Fixes: d2cd9ede6e ("mm,fork: introduce MADV_WIPEONFORK")
Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Link: http://lkml.kernel.org/r/1581150928-3214-2-git-send-email-lixinhai.lxh@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:37 -07:00
Christoph Hellwig
0f5c4c6e0e PM / sleep: handle the compat case in snapshot_set_swap_area()
Use in_compat_syscall to copy directly from the 32-bit ABI structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-04-06 21:42:36 +02:00
Christoph Hellwig
88a77559cc PM / sleep: move SNAPSHOT_SET_SWAP_AREA handling into a helper
Move the handling of the SNAPSHOT_SET_SWAP_AREA ioctl from the main
ioctl helper into a helper function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-04-06 21:42:36 +02:00
Linus Torvalds
ef05db16bb Merge tag 'pm-5.7-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
 "Additional power management updates.

  These fix a corner-case suspend-to-idle wakeup issue on systems where
  the ACPI SCI is shared with another wakeup source, add a kernel
  command line option to set pm_debug_messages via the kernel command
  line, add a document desctibing system-wide suspend and resume code
  flows, modify cpufreq Kconfig to choose schedutil as the preferred
  governor by default in a couple of cases and do some assorted
  cleanups.

  Specifics:

   - Fix corner-case suspend-to-idle wakeup issue on systems where the
     ACPI SCI is shared with another wakeup source (Hans de Goede).

   - Add document describing system-wide suspend and resume code flows
     to the admin guide (Rafael Wysocki).

   - Add kernel command line option to set pm_debug_messages (Chen Yu).

   - Choose schedutil as the preferred scaling governor by default on
     ARM big.LITTLE systems and on x86 systems using the intel_pstate
     driver in the passive mode (Linus Walleij, Rafael Wysocki).

   - Drop racy and redundant checks from the PM core's device_prepare()
     routine (Rafael Wysocki).

   - Make resume from hibernation take the hibernation_restore() return
     value into account (Dexuan Cui)"

* tag 'pm-5.7-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  platform/x86: intel_int0002_vgpio: Use acpi_register_wakeup_handler()
  ACPI: PM: Add acpi_[un]register_wakeup_handler()
  Documentation: PM: sleep: Document system-wide suspend code flows
  cpufreq: Select schedutil when using big.LITTLE
  PM: sleep: Add pm_debug_messages kernel command line option
  PM: sleep: core: Drop racy and redundant checks from device_prepare()
  PM: hibernate: Propagate the return value of hibernation_restore()
  cpufreq: intel_pstate: Select schedutil as the default governor
2020-04-06 10:14:39 -07:00
Linus Torvalds
b6ff10700d Merge tag 'fsnotify_for_v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
 "This implements the fanotify FAN_DIR_MODIFY event.

  This event reports the name in a directory under which a change
  happened and together with the directory filehandle and fstatat()
  allows reliable and efficient implementation of directory
  synchronization"

* tag 'fsnotify_for_v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fanotify: Fix the checks in fanotify_fsid_equal
  fanotify: report name info for FAN_DIR_MODIFY event
  fanotify: record name info for FAN_DIR_MODIFY event
  fanotify: Drop fanotify_event_has_fid()
  fanotify: prepare to report both parent and child fid's
  fanotify: send FAN_DIR_MODIFY event flavor with dir inode and name
  fanotify: divorce fanotify_path_event and fanotify_fid_event
  fanotify: Store fanotify handles differently
  fanotify: Simplify create_fd()
  fanotify: fix merging marks masks with FAN_ONDIR
  fanotify: merge duplicate events on parent and child
  fsnotify: replace inode pointer with an object id
  fsnotify: simplify arguments passing to fsnotify_parent()
  fsnotify: use helpers to access data by data_type
  fsnotify: funnel all dirent events through fsnotify_name()
  fsnotify: factor helpers fsnotify_dentry() and fsnotify_file()
  fsnotify: tidy up FS_ and FAN_ constants
2020-04-06 08:58:42 -07:00
Linus Torvalds
c48b07226b Merge tag 'perf-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more perf updates from Thomas Gleixner:
 "Perf updates all over the place:

  core:

   - Support for cgroup tracking in samples to allow cgroup based
     analysis

  tools:

   - Support for cgroup analysis

   - Commandline option and hotkey for perf top to change the sort order

   - A set of fixes all over the place

   - Various build system related improvements

   - Updates of the X86 pmu event JSON data

   - Documentation updates"

* tag 'perf-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
  perf python: Fix clang detection to strip out options passed in $CC
  perf tools: Support Python 3.8+ in Makefile
  perf script: Fix invalid read of directory entry after closedir()
  perf script report: Fix SEGFAULT when using DWARF mode
  perf script: add -S/--symbols documentation
  perf pmu-events x86: Use CPU_CLK_UNHALTED.THREAD in Kernel_Utilization metric
  perf events parser: Add missing Intel CPU events to parser
  perf script: Allow --symbol to accept hexadecimal addresses
  perf report/top TUI: Fix title line formatting
  perf top: Support hotkey to change sort order
  perf top: Support --group-sort-idx to change the sort order
  perf symbols: Fix arm64 gap between kernel start and module end
  perf build-test: Honour JOBS to override detection of number of cores
  perf script: Add --show-cgroup-events option
  perf top: Add --all-cgroups option
  perf record: Add --all-cgroups option
  perf record: Support synthesizing cgroup events
  perf report: Add 'cgroup' sort key
  perf cgroup: Maintain cgroup hierarchy
  perf tools: Basic support for CGROUP event
  ...
2020-04-05 12:26:24 -07:00
Linus Torvalds
d5ca32738f Merge tag 'timers-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 "Two timer subsystem fixes:

   - Prevent a use after free in the new lockdep state tracking for
     hrtimers

   - Add missing parenthesis in the VF pit timer driver"

* tag 'timers-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource/drivers/timer-vf-pit: Add missing parenthesis
  hrtimer: Don't dereference the hrtimer pointer after the callback
2020-04-05 12:06:51 -07:00
Linus Torvalds
aa1a8ce533 Merge tag 'trace-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
 "New tracing features:

   - The ring buffer is no longer disabled when reading the trace file.

     The trace_pipe file was made to be used for live tracing and
     reading as it acted like the normal producer/consumer. As the trace
     file would not consume the data, the easy way of handling it was to
     just disable writes to the ring buffer.

     This came to a surprise to the BPF folks who complained about lost
     events due to reading. This is no longer an issue. If someone wants
     to keep the old disabling there's a new option "pause-on-trace"
     that can be set.

   - New set_ftrace_notrace_pid file. PIDs in this file will not be
     traced by the function tracer.

     Similar to set_ftrace_pid, which makes the function tracer only
     trace those tasks with PIDs in the file, the set_ftrace_notrace_pid
     does the reverse.

   - New set_event_notrace_pid file. PIDs in this file will cause events
     not to be traced if triggered by a task with a matching PID.

     Similar to the set_event_pid file but will not be traced. Note,
     sched_waking and sched_switch events may still be traced if one of
     the tasks referenced by those events contains a PID that is allowed
     to be traced.

  Tracing related features:

   - New bootconfig option, that is attached to the initrd file.

     If bootconfig is on the command line, then the initrd file is
     searched looking for a bootconfig appended at the end.

   - New GPU tracepoint infrastructure to help the gfx drivers to get
     off debugfs (acked by Greg Kroah-Hartman)

  And other minor updates and fixes"

* tag 'trace-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (27 commits)
  tracing: Do not allocate buffer in trace_find_next_entry() in atomic
  tracing: Add documentation on set_ftrace_notrace_pid and set_event_notrace_pid
  selftests/ftrace: Add test to test new set_event_notrace_pid file
  selftests/ftrace: Add test to test new set_ftrace_notrace_pid file
  tracing: Create set_event_notrace_pid to not trace tasks
  ftrace: Create set_ftrace_notrace_pid to not trace tasks
  ftrace: Make function trace pid filtering a bit more exact
  ftrace/kprobe: Show the maxactive number on kprobe_events
  tracing: Have the document reflect that the trace file keeps tracing enabled
  ring-buffer/tracing: Have iterator acknowledge dropped events
  tracing: Do not disable tracing when reading the trace file
  ring-buffer: Do not disable recording when there is an iterator
  ring-buffer: Make resize disable per cpu buffer instead of total buffer
  ring-buffer: Optimize rb_iter_head_event()
  ring-buffer: Do not die if rb_iter_peek() fails more than thrice
  ring-buffer: Have rb_iter_head_event() handle concurrent writer
  ring-buffer: Add page_stamp to iterator for synchronization
  ring-buffer: Rename ring_buffer_read() to read_buffer_iter_advance()
  ring-buffer: Have ring_buffer_empty() not depend on tracing stopped
  tracing: Save off entry when peeking at next entry
  ...
2020-04-05 10:36:18 -07:00
Linus Torvalds
6f43bae382 Merge tag 'dma-mapping-5.7' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:

 - fix an integer overflow in the coherent pool (Kevin Grandemange)

 - provide support for in-place uncached remapping and use that for
   openrisc

 - fix the arm coherent allocator to take the bus limit into account

* tag 'dma-mapping-5.7' of git://git.infradead.org/users/hch/dma-mapping:
  ARM/dma-mapping: merge __dma_supported into arm_dma_supported
  ARM/dma-mapping: take the bus limit into account in __dma_alloc
  ARM/dma-mapping: remove get_coherent_dma_mask
  openrisc: use the generic in-place uncached DMA allocator
  dma-direct: provide a arch_dma_clear_uncached hook
  dma-direct: make uncached_kernel_address more general
  dma-direct: consolidate the error handling in dma_direct_alloc_pages
  dma-direct: remove the cached_kernel_address hook
  dma-coherent: fix integer overflow in the reserved-memory dma allocation
2020-04-04 10:12:47 -07:00
Linus Torvalds
ad0bf4eb91 Merge tag 's390-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:

 - Update maintainers. Niklas Schnelle takes over zpci and Vineeth
   Vijayan common io code.

 - Extend cpuinfo to include topology information.

 - Add new extended counters for IBM z15 and sampling buffer allocation
   rework in perf code.

 - Add control over zeroing out memory during system restart.

 - CCA protected key block version 2 support and other
   fixes/improvements in crypto code.

 - Convert to new fallthrough; annotations.

 - Replace zero-length arrays with flexible-arrays.

 - QDIO debugfs and other small improvements.

 - Drop 2-level paging support optimization for compat tasks. Varios mm
   cleanups.

 - Remove broken and unused hibernate / power management support.

 - Remove fake numa support which does not bring any benefits.

 - Exclude offline CPUs from CPU topology masks to be more consistent
   with other architectures.

 - Prevent last branching instruction address leaking to userspace.

 - Other small various fixes and improvements all over the code.

* tag 's390-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (57 commits)
  s390/mm: cleanup init_new_context() callback
  s390/mm: cleanup virtual memory constants usage
  s390/mm: remove page table downgrade support
  s390/qdio: set qdio_irq->cdev at allocation time
  s390/qdio: remove unused function declarations
  s390/ccwgroup: remove pm support
  s390/ap: remove power management code from ap bus and drivers
  s390/zcrypt: use kvmalloc instead of kmalloc for 256k alloc
  s390/mm: cleanup arch_get_unmapped_area() and friends
  s390/ism: remove pm support
  s390/cio: use fallthrough;
  s390/vfio: use fallthrough;
  s390/zcrypt: use fallthrough;
  s390: use fallthrough;
  s390/cpum_sf: Fix wrong page count in error message
  s390/diag: fix display of diagnose call statistics
  s390/ap: Remove ap device suspend and resume callbacks
  s390/pci: Improve handling of unset UID
  s390/pci: Fix zpci_alloc_domain() over allocation
  s390/qdio: pass ISC as parameter to chsc_sadc()
  ...
2020-04-04 09:45:50 -07:00
Ingo Molnar
7dc41b9b99 Merge tag 'perf-urgent-for-mingo-5.7-20200403' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes and improvements from Arnaldo Carvalho de Melo:

perf python:

  Arnaldo Carvalho de Melo:

  - Fix clang detection to strip out options passed in $CC.

build:

  He Zhe:

  - Normalize gcc parameter when generating arch errno table, fixing
    the build by removing options from $(CC).

  Sam Lunt:

  - Support Python 3.8+ in Makefile.

perf report/top:

  Arnaldo Carvalho de Melo:

  - Fix title line formatting.

perf script:

  Andreas Gerstmayr:

  - Fix SEGFAULT when using DWARF mode.

  - Fix invalid read of directory entry after closedir(), found with valgrind.

  Hagen Paul Pfeifer:

  - Introduce --deltatime option.

  Stephane Eranian:

  - Allow --symbol to accept hexadecimal addresses.

  Ian Rogers:

  - Add -S/--symbols documentation

  Namhyung Kim:

  - Add --show-cgroup-events option.

perf python:

  Arnaldo Carvalho de Melo:

  - Include rwsem.c in the python binding, needed by the cgroups improvements.

build-test:

  Arnaldo Carvalho de Melo:

  - Honour JOBS to override detection of number of cores

perf top:

  Jin Yao:

  - Support --group-sort-idx to change the sort order

  - perf top: Support hotkey to change sort order

perf pmu-events x86:

  Jin Yao:

  - Use CPU_CLK_UNHALTED.THREAD in Kernel_Utilization metric

perf symbols arm64:

  Kemeng Shi:

  - Fix arm64 gap between kernel start and module end

kernel perf subsystem:

  Namhyung Kim:

  - Add PERF_RECORD_CGROUP event and Add PERF_SAMPLE_CGROUP feature,
    to allow cgroup tracking, saving a link between cgroup path and
    its id number.

perf cgroup:

  Namhyung Kim:

  - Maintain cgroup hierarchy.

perf report:

  Namhyung Kim:

  - Add 'cgroup' sort key.

perf record:

  Namhyung Kim:

  - Support synthesizing cgroup events for pre-existing cgroups.

  - Add --all-cgroups option

Documentation:

  Tony Jones:

  - Update docs regarding kernel/user space unwinding.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-04 10:35:15 +02:00
Linus Torvalds
0ad5b053d4 Merge tag 'char-misc-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
 "Here is the big set of char/misc/other driver patches for 5.7-rc1.

  Lots of things in here, and it's later than expected due to some
  reverts to resolve some reported issues. All is now clean with no
  reported problems in linux-next.

  Included in here is:
   - interconnect updates
   - mei driver updates
   - uio updates
   - nvmem driver updates
   - soundwire updates
   - binderfs updates
   - coresight updates
   - habanalabs updates
   - mhi new bus type and core
   - extcon driver updates
   - some Kconfig cleanups
   - other small misc driver cleanups and updates

  As mentioned, all have been in linux-next for a while, and with the
  last two reverts, all is calm and good"

* tag 'char-misc-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (174 commits)
  Revert "driver core: platform: Initialize dma_parms for platform devices"
  Revert "amba: Initialize dma_parms for amba devices"
  amba: Initialize dma_parms for amba devices
  driver core: platform: Initialize dma_parms for platform devices
  bus: mhi: core: Drop the references to mhi_dev in mhi_destroy_device()
  bus: mhi: core: Initialize bhie field in mhi_cntrl for RDDM capture
  bus: mhi: core: Add support for reading MHI info from device
  misc: rtsx: set correct pcr_ops for rts522A
  speakup: misc: Use dynamic minor numbers for speakup devices
  mei: me: add cedar fork device ids
  coresight: do not use the BIT() macro in the UAPI header
  Documentation: provide IBM contacts for embargoed hardware
  nvmem: core: remove nvmem_sysfs_get_groups()
  nvmem: core: use is_bin_visible for permissions
  nvmem: core: use device_register and device_unregister
  nvmem: core: add root_only member to nvmem device struct
  extcon: axp288: Add wakeup support
  extcon: Mark extcon_get_edev_name() function as exported symbol
  extcon: palmas: Hide error messages if gpio returns -EPROBE_DEFER
  dt-bindings: extcon: usbc-cros-ec: convert extcon-usbc-cros-ec.txt to yaml format
  ...
2020-04-03 13:22:40 -07:00
Linus Torvalds
ff2ae607c6 Merge tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx
Pull SPDX updates from Greg KH:
 "Here are three SPDX patches for 5.7-rc1.

  One fixes up the SPDX tag for a single driver, while the other two go
  through the tree and add SPDX tags for all of the .gitignore files as
  needed.

  Nothing too complex, but you will get a merge conflict with your
  current tree, that should be trivial to handle (one file modified by
  two things, one file deleted.)

  All three of these have been in linux-next for a while, with no
  reported issues other than the merge conflict"

* tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
  ASoC: MT6660: make spdxcheck.py happy
  .gitignore: add SPDX License Identifier
  .gitignore: remove too obvious comments
2020-04-03 13:12:26 -07:00
Linus Torvalds
0adb8bc039 Merge branch 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "Nothing too interesting. Just two trivial patches"

* 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Mark up unlocked access to wq->first_flusher
  workqueue: Make workqueue_init*() return void
2020-04-03 12:27:36 -07:00
Linus Torvalds
d883600523 Merge branch 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:

 - Christian extended clone3 so that processes can be spawned into
   cgroups directly.

   This is not only neat in terms of semantics but also avoids grabbing
   the global cgroup_threadgroup_rwsem for migration.

 - Daniel added !root xattr support to cgroupfs.

   Userland already uses xattrs on cgroupfs for bookkeeping. This will
   allow delegated cgroups to support such usages.

 - Prateek tried to make cpuset hotplug handling synchronous but that
   led to possible deadlock scenarios. Reverted.

 - Other minor changes including release_agent_path handling cleanup.

* 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  docs: cgroup-v1: Document the cpuset_v2_mode mount option
  Revert "cpuset: Make cpuset hotplug synchronous"
  cgroupfs: Support user xattrs
  kernfs: Add option to enable user xattrs
  kernfs: Add removed_size out param for simple_xattr_set
  kernfs: kvmalloc xattr value instead of kmalloc
  cgroup: Restructure release_agent_path handling
  selftests/cgroup: add tests for cloning into cgroups
  clone3: allow spawning processes into cgroups
  cgroup: add cgroup_may_write() helper
  cgroup: refactor fork helpers
  cgroup: add cgroup_get_from_file() helper
  cgroup: unify attach permission checking
  cpuset: Make cpuset hotplug synchronous
  cgroup.c: Use built-in RCU list checking
  kselftest/cgroup: add cgroup destruction test
  cgroup: Clean up css_set task traversal
2020-04-03 11:30:20 -07:00
Linus Torvalds
f2c3bec3c9 Merge tag 'kgdb-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux
Pull kgdb updates from Daniel Thompson:
 "Pretty quiet this cycle. Just a couple of small fixes from myself both
  of which were reviewed by Doug Anderson to keep me honest (thanks)"

* tag 'kgdb-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
  kdb: Censor attempts to set PROMPT without ENABLE_MEM_READ
  kdb: Eliminate strncpy() warnings by replacing with strscpy()
2020-04-03 11:26:32 -07:00
Waiman Long
0c05b9bdbf docs: cgroup-v1: Document the cpuset_v2_mode mount option
The cpuset in cgroup v1 accepts a special "cpuset_v2_mode" mount
option that make cpuset.cpus and cpuset.mems behave more like those in
cgroup v2.  Document it to make other people more aware of this feature
that can be useful in some circumstances.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-04-03 11:42:56 -04:00
Tejun Heo
2b729fe7f3 Revert "cpuset: Make cpuset hotplug synchronous"
This reverts commit a49e4629b5 ("cpuset: Make cpuset hotplug synchronous") as
it may deadlock with cpu hotplug path.

Link: http://lkml.kernel.org/r/F0388D99-84D7-453B-9B6B-EEFF0E7BE4CC@lca.pw
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Qian Cai <cai@lca.pw>
Cc: Prateek Sood <prsood@codeaurora.org>
2020-04-03 11:32:13 -04:00
Steven Rostedt (VMware)
8e99cf91b9 tracing: Do not allocate buffer in trace_find_next_entry() in atomic
When dumping out the trace data in latency format, a check is made to peek
at the next event to compare its timestamp to the current one, and if the
delta is of a greater size, it will add a marker showing so. But to do this,
it needs to save the current event otherwise peeking at the next event will
remove the current event. To save the event, a temp buffer is used, and if
the event is bigger than the temp buffer, the temp buffer is freed and a
bigger buffer is allocated.

This allocation is a problem when called in atomic context. The only way
this gets called via atomic context is via ftrace_dump(). Thus, use a static
buffer of 128 bytes (which covers most events), and if the event is bigger
than that, simply return NULL. The callers of trace_find_next_entry() need
to handle a NULL case, as that's what would happen if the allocation failed.

Link: https://lore.kernel.org/r/20200326091256.GR11705@shao2-debian

Fixes: ff895103a8 ("tracing: Save off entry when peeking at next entry")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-04-03 11:30:50 -04:00
Linus Torvalds
6cad420cc6 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:
 "A large amount of MM, plenty more to come.

  Subsystems affected by this patch series:
   - tools
   - kthread
   - kbuild
   - scripts
   - ocfs2
   - vfs
   - mm: slub, kmemleak, pagecache, gup, swap, memcg, pagemap, mremap,
         sparsemem, kasan, pagealloc, vmscan, compaction, mempolicy,
         hugetlbfs, hugetlb"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (155 commits)
  include/linux/huge_mm.h: check PageTail in hpage_nr_pages even when !THP
  mm/hugetlb: fix build failure with HUGETLB_PAGE but not HUGEBTLBFS
  selftests/vm: fix map_hugetlb length used for testing read and write
  mm/hugetlb: remove unnecessary memory fetch in PageHeadHuge()
  mm/hugetlb.c: clean code by removing unnecessary initialization
  hugetlb_cgroup: add hugetlb_cgroup reservation docs
  hugetlb_cgroup: add hugetlb_cgroup reservation tests
  hugetlb: support file_region coalescing again
  hugetlb_cgroup: support noreserve mappings
  hugetlb_cgroup: add accounting for shared mappings
  hugetlb: disable region_add file_region coalescing
  hugetlb_cgroup: add reservation accounting for private mappings
  mm/hugetlb_cgroup: fix hugetlb_cgroup migration
  hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations
  hugetlb_cgroup: add hugetlb_cgroup reservation counter
  hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race
  hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
  mm/memblock.c: remove redundant assignment to variable max_addr
  mm: mempolicy: require at least one nodeid for MPOL_PREFERRED
  mm: mempolicy: use VM_BUG_ON_VMA in queue_pages_test_walk()
  ...
2020-04-02 13:55:34 -07:00
Linus Torvalds
d987ca1c6b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull exec/proc updates from Eric Biederman:
 "This contains two significant pieces of work: the work to sort out
  proc_flush_task, and the work to solve a deadlock between strace and
  exec.

  Fixing proc_flush_task so that it no longer requires a persistent
  mount makes improvements to proc possible. The removal of the
  persistent mount solves an old regression that that caused the hidepid
  mount option to only work on remount not on mount. The regression was
  found and reported by the Android folks. This further allows Alexey
  Gladkov's work making proc mount options specific to an individual
  mount of proc to move forward.

  The work on exec starts solving a long standing issue with exec that
  it takes mutexes of blocking userspace applications, which makes exec
  extremely deadlock prone. For the moment this adds a second mutex with
  a narrower scope that handles all of the easy cases. Which makes the
  tricky cases easy to spot. With a little luck the code to solve those
  deadlocks will be ready by next merge window"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (25 commits)
  signal: Extend exec_id to 64bits
  pidfd: Use new infrastructure to fix deadlocks in execve
  perf: Use new infrastructure to fix deadlocks in execve
  proc: io_accounting: Use new infrastructure to fix deadlocks in execve
  proc: Use new infrastructure to fix deadlocks in execve
  kernel/kcmp.c: Use new infrastructure to fix deadlocks in execve
  kernel: doc: remove outdated comment cred.c
  mm: docs: Fix a comment in process_vm_rw_core
  selftests/ptrace: add test cases for dead-locks
  exec: Fix a deadlock in strace
  exec: Add exec_update_mutex to replace cred_guard_mutex
  exec: Move exec_mmap right after de_thread in flush_old_exec
  exec: Move cleanup of posix timers on exec out of de_thread
  exec: Factor unshare_sighand out of de_thread and call it separately
  exec: Only compute current once in flush_old_exec
  pid: Improve the comment about waiting in zap_pid_ns_processes
  proc: Remove the now unnecessary internal mount of proc
  uml: Create a private mount of proc for mconsole
  uml: Don't consult current to find the proc_mnt in mconsole_proc
  proc: Use a list of inodes to flush from proc
  ...
2020-04-02 11:22:17 -07:00