Commit Graph

29187 Commits

Author SHA1 Message Date
Paul E. McKenney
6b3de7a172 rcutorture: Break up too-long rcu_torture_fwd_prog() function
This commit splits rcu_torture_fwd_prog_nr() and rcu_torture_fwd_prog_cr()
functions out of rcu_torture_fwd_prog() in order to reduce indentation
pain and because rcu_torture_fwd_prog() was getting a bit too long.
In addition, this will enable easier conditional execution of the
rcu_torture_fwd_prog_cr() function, which can give false-positive
failures in some NO_HZ_FULL configurations due to overloading the
housekeeping CPUs.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-12-01 12:45:34 -08:00
Paul E. McKenney
fc6f9c5778 rcutorture: Remove cbflood facility
Now that the forward-progress code does a full-bore continuous callback
flood lasting multiple seconds, there is little point in also posting a
mere 60,000 callbacks every second or so.  This commit therefore removes
the old cbflood testing.  Over time, it may be desirable to concurrently
do full-bore continuous callback floods on all CPUs simultaneously, but
one dragon at a time.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-12-01 12:45:33 -08:00
Paul E. McKenney
28cf5952f5 torture: Bring any extra CPUs online during kernel startup
Currently, the torture scripts rely on the initrd/init script to bring
any extra CPUs online, for example, in the case where the kernel and
qemu have different ideas about how many CPUs are present.  This works,
but is an unnecessary dependency on initrd, which needs to vary depending
on the distro.  This commit therefore causes torture_onoff() to check
for additional CPUs, attempting to bring any found online. Errors are
ignored, just as they are by the initrd/init script.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-12-01 12:45:32 -08:00
Paul E. McKenney
4871848531 rcutorture: Add call_rcu() flooding forward-progress tests
This commit adds a call_rcu() flooding loop to the forward-progress test.
This emulates tight userspace loops that force call_rcu() invocations,
for example, the infamous loop containing close(open()) that instigated
the addition of blimit.  If RCU does not make sufficient forward progress
in invoking the resulting flood of callbacks, rcutorture emits a warning.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-12-01 12:45:32 -08:00
Paul E. McKenney
eaaf055f27 Merge branches 'bug.2018.11.12a', 'consolidate.2018.12.01a', 'doc.2018.11.12a', 'fixes.2018.11.12a', 'initrd.2018.11.08b', 'sil.2018.11.12a' and 'srcu.2018.11.27a' into HEAD
bug.2018.11.12a:  Get rid of BUG_ON() and friends
consolidate.2018.12.01a:  Continued RCU flavor-consolidation cleanup
doc.2018.11.12a:  Documentation updates
fixes.2018.11.12a:  Miscellaneous fixes
initrd.2018.11.08b:  Automate creation of rcutorture initrd
sil.2018.11.12a:  Remove more spin_unlock_wait() calls
2018-12-01 12:43:16 -08:00
Paul E. McKenney
6932689e41 livepatch: Replace synchronize_sched() with synchronize_rcu()
Now that synchronize_rcu() waits for preempt-disable regions of code
as well as RCU read-side critical sections, synchronize_sched() can be
replaced by synchronize_rcu().  This commit therefore makes this change,
even though it is but a comment.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:38:50 -08:00
Paul E. McKenney
2af3024cd7 cgroups: Replace synchronize_sched() with synchronize_rcu()
Now that synchronize_rcu() waits for preempt-disable regions of code
as well as RCU read-side critical sections, synchronize_sched() can be
replaced by synchronize_rcu().  This commit therefore makes this change,
even though it is but a comment.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Dennis Zhou (Facebook)" <dennisszhou@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
2018-12-01 12:38:49 -08:00
Linus Torvalds
4b78317679 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull STIBP fallout fixes from Thomas Gleixner:
 "The performance destruction department finally got it's act together
  and came up with a cure for the STIPB regression:

   - Provide a command line option to control the spectre v2 user space
     mitigations. Default is either seccomp or prctl (if seccomp is
     disabled in Kconfig). prctl allows mitigation opt-in, seccomp
     enables the migitation for sandboxed processes.

   - Rework the code to handle the conditional STIBP/IBPB control and
     remove the now unused ptrace_may_access_sched() optimization
     attempt

   - Disable STIBP automatically when SMT is disabled

   - Optimize the switch_to() logic to avoid MSR writes and invocations
     of __switch_to_xtra().

   - Make the asynchronous speculation TIF updates synchronous to
     prevent stale mitigation state.

  As a general cleanup this also makes retpoline directly depend on
  compiler support and removes the 'minimal retpoline' option which just
  pretended to provide some form of security while providing none"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
  x86/speculation: Provide IBPB always command line options
  x86/speculation: Add seccomp Spectre v2 user space protection mode
  x86/speculation: Enable prctl mode for spectre_v2_user
  x86/speculation: Add prctl() control for indirect branch speculation
  x86/speculation: Prepare arch_smt_update() for PRCTL mode
  x86/speculation: Prevent stale SPEC_CTRL msr content
  x86/speculation: Split out TIF update
  ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS
  x86/speculation: Prepare for conditional IBPB in switch_mm()
  x86/speculation: Avoid __switch_to_xtra() calls
  x86/process: Consolidate and simplify switch_to_xtra() code
  x86/speculation: Prepare for per task indirect branch speculation control
  x86/speculation: Add command line control for indirect branch speculation
  x86/speculation: Unify conditional spectre v2 print functions
  x86/speculataion: Mark command line parser data __initdata
  x86/speculation: Mark string arrays const correctly
  x86/speculation: Reorder the spec_v2 code
  x86/l1tf: Show actual SMT state
  x86/speculation: Rework SMT state change
  sched/smt: Expose sched_smt_present static key
  ...
2018-12-01 12:35:48 -08:00
David Miller
e9ee9efc0d bpf: Add BPF_F_ANY_ALIGNMENT.
Often we want to write tests cases that check things like bad context
offset accesses.  And one way to do this is to use an odd offset on,
for example, a 32-bit load.

This unfortunately triggers the alignment checks first on platforms
that do not set CONFIG_EFFICIENT_UNALIGNED_ACCESS.  So the test
case see the alignment failure rather than what it was testing for.

It is often not completely possible to respect the original intention
of the test, or even test the same exact thing, while solving the
alignment issue.

Another option could have been to check the alignment after the
context and other validations are performed by the verifier, but
that is a non-trivial change to the verifier.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-30 21:38:48 -08:00
Linus Torvalds
d8f190ee83 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "31 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (31 commits)
  ocfs2: fix potential use after free
  mm/khugepaged: fix the xas_create_range() error path
  mm/khugepaged: collapse_shmem() do not crash on Compound
  mm/khugepaged: collapse_shmem() without freezing new_page
  mm/khugepaged: minor reorderings in collapse_shmem()
  mm/khugepaged: collapse_shmem() remember to clear holes
  mm/khugepaged: fix crashes due to misaccounted holes
  mm/khugepaged: collapse_shmem() stop if punched or truncated
  mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
  mm/huge_memory: splitting set mapping+index before unfreeze
  mm/huge_memory: rename freeze_page() to unmap_page()
  initramfs: clean old path before creating a hardlink
  kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace
  psi: make disabling/enabling easier for vendor kernels
  proc: fixup map_files test on arm
  debugobjects: avoid recursive calls with kmemleak
  userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set
  userfaultfd: shmem: add i_size checks
  userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas
  userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem
  ...
2018-11-30 18:45:49 -08:00
Linus Torvalds
1f817429b2 stackleak plugin fix
- Fix crash by not allowing kprobing of stackleak_erase() (Alexander Popov)
 -----BEGIN PGP SIGNATURE-----
 Comment: Kees Cook <kees@outflux.net>
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlwBcL4WHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJjEiD/9WErdSde1/jka9VX9SPdMFush9
 MCb1y0TpS+xlhF7Jyhxz9AJIRVwY0hXSO40RinYPJ4zgxXg+PjVJU/q2HhtGC4u1
 ZdUAblkw1WLEmYjKLuwyLcZmCAlodBpm+hMlYubAD2mfArtX4vLElcMd4livTz7x
 E4lJ/5WKVQzfadJ/onpltPxgQDVrov7u+uatrL7uFfb2geiCNV4lgeBqoJSx2QJd
 i8B7a28uifLS+ph5TJDx39nw7GJ4Iy4nnJofdX23kgKzfei5wef1mwdoaC6x93QL
 rrmhp3T4Mghnxi+txx+u8Bw7VxMiUnmlAoHQQcBQ0oLDMlTY+CQW3o32kvk+oHip
 5onLfQpHgn4kjd+Ns/qFFxr2oHYU0ODbEQ4taVXBu/f0jj0blQdEgLPWVwNpWS7l
 DoFezQ6qNjy1UJlbtg8t7dq1vUF2DTkaRj9JMPCqCBDX+zJrUZpOlCBoB+l4lC6e
 wJSh7m6QCNVh0fmIzctMenkio37rf2rEIp+l0/BMxFz0KsvlEBthb1x5dAlY6RK9
 Wq52oOLzduWTw3683lBnRYBFbsrDo9Y6LmY5wwkFFqx3TGZtcnomUtIqep8AE7YJ
 SH5QtrUUZKPLbPfHTr/k2G75daaJu8TIZ8cgmRmNe+o/LzH2u3IEnfYVteCxeeyj
 Bekcq9lWU5AVm4jNuQ==
 =OU+C
 -----END PGP SIGNATURE-----

Merge tag 'gcc-plugins-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull stackleak plugin fix from Kees Cook:
 "Fix crash by not allowing kprobing of stackleak_erase() (Alexander
  Popov)"

* tag 'gcc-plugins-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  stackleak: Disable function tracing and kprobes for stackleak_erase()
2018-11-30 18:36:30 -08:00
Anders Roxell
903e8ff867 kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace
Since __sanitizer_cov_trace_pc() is marked as notrace, function calls in
__sanitizer_cov_trace_pc() shouldn't be traced either.
ftrace_graph_caller() gets called for each function that isn't marked
'notrace', like canonicalize_ip().  This is the call trace from a run:

[  139.644550]  ftrace_graph_caller+0x1c/0x24
[  139.648352]  canonicalize_ip+0x18/0x28
[  139.652313]  __sanitizer_cov_trace_pc+0x14/0x58
[  139.656184]  sched_clock+0x34/0x1e8
[  139.659759]  trace_clock_local+0x40/0x88
[  139.663722]  ftrace_push_return_trace+0x8c/0x1f0
[  139.667767]  prepare_ftrace_return+0xa8/0x100
[  139.671709]  ftrace_graph_caller+0x1c/0x24

Rework so that check_kcov_mode() and canonicalize_ip() that are called
from __sanitizer_cov_trace_pc() are also marked as notrace.

Link: http://lkml.kernel.org/r/20181128081239.18317-1-anders.roxell@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signen-off-by: Anders Roxell <anders.roxell@linaro.org>
Co-developed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:14 -08:00
Johannes Weiner
e0c274472d psi: make disabling/enabling easier for vendor kernels
Mel Gorman reports a hackbench regression with psi that would prohibit
shipping the suse kernel with it default-enabled, but he'd still like
users to be able to opt in at little to no cost to others.

With the current combination of CONFIG_PSI and the psi_disabled bool set
from the commandline, this is a challenge.  Do the following things to
make it easier:

1. Add a config option CONFIG_PSI_DEFAULT_DISABLED that allows distros
   to enable CONFIG_PSI in their kernel but leave the feature disabled
   unless a user requests it at boot-time.

   To avoid double negatives, rename psi_disabled= to psi=.

2. Make psi_disabled a static branch to eliminate any branch costs
   when the feature is disabled.

In terms of numbers before and after this patch, Mel says:

: The following is a comparision using CONFIG_PSI=n as a baseline against
: your patch and a vanilla kernel
:
:                          4.20.0-rc4             4.20.0-rc4             4.20.0-rc4
:                 kconfigdisable-v1r1                vanilla        psidisable-v1r1
: Amean     1       1.3100 (   0.00%)      1.3923 (  -6.28%)      1.3427 (  -2.49%)
: Amean     3       3.8860 (   0.00%)      4.1230 *  -6.10%*      3.8860 (  -0.00%)
: Amean     5       6.8847 (   0.00%)      8.0390 * -16.77%*      6.7727 (   1.63%)
: Amean     7       9.9310 (   0.00%)     10.8367 *  -9.12%*      9.9910 (  -0.60%)
: Amean     12     16.6577 (   0.00%)     18.2363 *  -9.48%*     17.1083 (  -2.71%)
: Amean     18     26.5133 (   0.00%)     27.8833 *  -5.17%*     25.7663 (   2.82%)
: Amean     24     34.3003 (   0.00%)     34.6830 (  -1.12%)     32.0450 (   6.58%)
: Amean     30     40.0063 (   0.00%)     40.5800 (  -1.43%)     41.5087 (  -3.76%)
: Amean     32     40.1407 (   0.00%)     41.2273 (  -2.71%)     39.9417 (   0.50%)
:
: It's showing that the vanilla kernel takes a hit (as the bisection
: indicated it would) and that disabling PSI by default is reasonably
: close in terms of performance for this particular workload on this
: particular machine so;

Link: http://lkml.kernel.org/r/20181127165329.GA29728@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:14 -08:00
Linus Torvalds
a1b3cf6d94 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc fixes:

   - counter freezing related regression fix

   - uprobes race fix

   - Intel PMU unusual event combination fix

   - .. and diverse tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  uprobes: Fix handle_swbp() vs. unregister() + register() race once more
  perf/x86/intel: Disallow precise_ip on BTS events
  perf/x86/intel: Add generic branch tracing check to intel_pmu_has_bts()
  perf/x86/intel: Move branch tracing setup to the Intel-specific source file
  perf/x86/intel: Fix regression by default disabling perfmon v4 interrupt handling
  perf tools beauty ioctl: Support new ISO7816 commands
  tools uapi asm-generic: Synchronize ioctls.h
  tools arch x86: Update tools's copy of cpufeatures.h
  tools headers uapi: Synchronize i915_drm.h
  perf tools: Restore proper cwd on return from mnt namespace
  tools build feature: Check if get_current_dir_name() is available
  perf tools: Fix crash on synthesizing the unit
2018-11-30 11:31:48 -08:00
Linus Torvalds
49afe66143 This includes two more fixes:
- Change idx variable in DO_TRACE macro to __idx to avoid name conflicts.
    A kvm event had "idx" as a parameter and it confused the macro.
 
  - Fix a race where interrupts would be traced when set_graph_function was set.
    The previous patch set increased a race window that tricked the function graph
    tracer to think it should trace interrupts when it really should not have.
    The bug has been there before, but was seldom hit. Only the last patch series
    made it more common.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXACrFhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qgUuAP9zZ0+PjDIXrkZATogEsMJ1OZKy5AiK
 mwT7S85/ouOeCQEAi5JUcSfwB0caq2nYB1GKOKNfDH0ffeVR4wNykcYjngM=
 =gjwC
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull more tracing fixes from Steven Rostedt:
 "Two more fixes:

   - Change idx variable in DO_TRACE macro to __idx to avoid name
     conflicts. A kvm event had "idx" as a parameter and it confused the
     macro.

   - Fix a race where interrupts would be traced when set_graph_function
     was set. The previous patch set increased a race window that
     tricked the function graph tracer to think it should trace
     interrupts when it really should not have.

     The bug has been there before, but was seldom hit. Only the last
     patch series made it more common"

* tag 'trace-v4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/fgraph: Fix set_graph_function from showing interrupts
  tracepoint: Use __idx instead of idx in DO_TRACE macro to make it unique
2018-11-30 10:40:11 -08:00
Linus Torvalds
0f1f692375 While rewriting the function graph tracer, I discovered a design flaw that
was introduced by a patch that tried to fix one bug, but by doing so created
 another bug. As both bugs corrupt the output (but they do not crash the
 kernel), I decided to fix the design such that it could have both bugs
 fixed. The original fix, fixed time reporting of the function graph tracer
 when doing a max_depth of one. This was code that can test how much the
 kernel interferes with userspace. But in doing so, it could corrupt the time
 keeping of the function profiler.
 
 The issue is that the curr_ret_stack variable was being used for two
 different meanings. One was to keep track of the stack pointer on the
 ret_stack (shadow stack used by the function graph tracer), and the other
 use case was the graph call depth.  Although, the two may be closely
 related, where they got updated was the issue that lead to the two different
 bugs that required the two use cases to be updated differently.
 
 The big issue with this fix is that it requires changing each architecture.
 The good news is, I was able to remove a lot of code that was duplicated
 within the architectures and place it into a single location. Then I could
 make the fix in one place.
 
 I pushed this code into linux-next to let it settle over a week, and before
 doing so, I cross compiled all the affected architectures to make sure that
 they built fine.
 
 In the mean time, I also pulled in a patch that fixes the sched_switch
 previous tasks state output, that was not actually correct.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCW/4NPhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qnWAAQCyUIRLgYImr81eTl52lxNRsULk+aiI
 U29kRFWWU0c40AEA1X9sDF0MgOItbRGfZtnHTZEousXRDaDf4Fge2kF7Egg=
 =liQ0
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "While rewriting the function graph tracer, I discovered a design flaw
  that was introduced by a patch that tried to fix one bug, but by doing
  so created another bug.

  As both bugs corrupt the output (but they do not crash the kernel), I
  decided to fix the design such that it could have both bugs fixed. The
  original fix, fixed time reporting of the function graph tracer when
  doing a max_depth of one. This was code that can test how much the
  kernel interferes with userspace. But in doing so, it could corrupt
  the time keeping of the function profiler.

  The issue is that the curr_ret_stack variable was being used for two
  different meanings. One was to keep track of the stack pointer on the
  ret_stack (shadow stack used by the function graph tracer), and the
  other use case was the graph call depth. Although, the two may be
  closely related, where they got updated was the issue that lead to the
  two different bugs that required the two use cases to be updated
  differently.

  The big issue with this fix is that it requires changing each
  architecture. The good news is, I was able to remove a lot of code
  that was duplicated within the architectures and place it into a
  single location. Then I could make the fix in one place.

  I pushed this code into linux-next to let it settle over a week, and
  before doing so, I cross compiled all the affected architectures to
  make sure that they built fine.

  In the mean time, I also pulled in a patch that fixes the sched_switch
  previous tasks state output, that was not actually correct"

* tag 'trace-v4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  sched, trace: Fix prev_state output in sched_switch tracepoint
  function_graph: Have profiler use curr_ret_stack and not depth
  function_graph: Reverse the order of pushing the ret_stack and the callback
  function_graph: Move return callback before update of curr_ret_stack
  function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack
  function_graph: Make ftrace_push_return_trace() static
  sparc/function_graph: Simplify with function_graph_enter()
  sh/function_graph: Simplify with function_graph_enter()
  s390/function_graph: Simplify with function_graph_enter()
  riscv/function_graph: Simplify with function_graph_enter()
  powerpc/function_graph: Simplify with function_graph_enter()
  parisc: function_graph: Simplify with function_graph_enter()
  nds32: function_graph: Simplify with function_graph_enter()
  MIPS: function_graph: Simplify with function_graph_enter()
  microblaze: function_graph: Simplify with function_graph_enter()
  arm64: function_graph: Simplify with function_graph_enter()
  ARM: function_graph: Simplify with function_graph_enter()
  x86/function_graph: Simplify with function_graph_enter()
  function_graph: Create function_graph_enter() to consolidate architecture code
2018-11-30 09:32:34 -08:00
Alexander Popov
ef1a840934 stackleak: Disable function tracing and kprobes for stackleak_erase()
The stackleak_erase() function is called on the trampoline stack at the
end of syscall. This stack is not big enough for ftrace and kprobes
operations, e.g. it can be exhausted if we use kprobe_events for
stackleak_erase().

So let's disable function tracing and kprobes of stackleak_erase().

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 10e9ae9fab ("gcc-plugins: Add STACKLEAK plugin for tracking the kernel stack")
Signed-off-by: Alexander Popov <alex.popov@linux.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-11-30 09:05:07 -08:00
Steven Rostedt (VMware)
5cf99a0f31 tracing/fgraph: Fix set_graph_function from showing interrupts
The tracefs file set_graph_function is used to only function graph functions
that are listed in that file (or all functions if the file is empty). The
way this is implemented is that the function graph tracer looks at every
function, and if the current depth is zero and the function matches
something in the file then it will trace that function. When other functions
are called, the depth will be greater than zero (because the original
function will be at depth zero), and all functions will be traced where the
depth is greater than zero.

The issue is that when a function is first entered, and the handler that
checks this logic is called, the depth is set to zero. If an interrupt comes
in and a function in the interrupt handler is traced, its depth will be
greater than zero and it will automatically be traced, even if the original
function was not. But because the logic only looks at depth it may trace
interrupts when it should not be.

The recent design change of the function graph tracer to fix other bugs
caused the depth to be zero while the function graph callback handler is
being called for a longer time, widening the race of this happening. This
bug was actually there for a longer time, but because the race window was so
small it seldom happened. The Fixes tag below is for the commit that widen
the race window, because that commit belongs to a series that will also help
fix the original bug.

Cc: stable@kernel.org
Fixes: 39eb456dac ("function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack")
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Tested-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-11-29 22:09:00 -05:00
David S. Miller
93029d7d40 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
bpf-next 2018-11-30

The following pull-request contains BPF updates for your *net-next* tree.

(Getting out bit earlier this time to pull in a dependency from bpf.)

The main changes are:

1) Add libbpf ABI versioning and document API naming conventions
   as well as ABI versioning process, from Andrey.

2) Add a new sk_msg_pop_data() helper for sk_msg based BPF
   programs that is used in conjunction with sk_msg_push_data()
   for adding / removing meta data to the msg data, from John.

3) Optimize convert_bpf_ld_abs() for 0 offset and fix various
   lib and testsuite build failures on 32 bit, from David.

4) Make BPF prog dump for !JIT identical to how we dump subprogs
   when JIT is in use, from Yonghong.

5) Rename btf_get_from_id() to make it more conform with libbpf
   API naming conventions, from Martin.

6) Add a missing BPF kselftest config item, from Naresh.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-29 18:15:07 -08:00
Yangtao Li
96c6935212 PM / QoS: Change to use DEFINE_SHOW_ATTRIBUTE macro
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-11-29 22:28:11 +01:00
Jessica Yu
2d25bc5523 module: make it clearer when we're handling kallsyms symbols vs exported symbols
The module loader internally works with both exported symbols
represented as struct kernel_symbol, as well as Elf symbols from a
module's symbol table. It's hard to distinguish sometimes which type of
symbol we're handling given that some helper function names are not
consistent or helpful. Take get_ksymbol() for instance - are we
looking for an exported symbol or a kallsyms symbol here? Or symname()
and kernel_symbol_name() - which function handles an exported symbol and
which one an Elf symbol?

Clean up and unify the function naming scheme a bit to make it clear
which kind of symbol we're handling. This change only affects static
functions internal to the module loader.

Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
2018-11-29 14:24:13 +01:00
Tetsuo Handa
9adcfaffc3 printk: Make printk_emit() local function.
printk_emit() is called from only devkmsg_write() in the same file.
Save object size by making it a local function.

Link: http://lkml.kernel.org/r/5cc99d2c-c408-34f7-d1fc-e1cd2a9e31da@i-love.sakura.ne.jp
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-11-29 10:59:40 +01:00
David S. Miller
e561bb29b6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Trivial conflict in net/core/filter.c, a locally computed
'sdif' is now an argument to the function.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-28 22:10:54 -08:00
Yonghong Song
eb04bbb608 bpf: btf: check name validity for various types
This patch added name checking for the following types:
 . BTF_KIND_PTR, BTF_KIND_ARRAY, BTF_KIND_VOLATILE,
   BTF_KIND_CONST, BTF_KIND_RESTRICT:
     the name must be null
 . BTF_KIND_STRUCT, BTF_KIND_UNION: the struct/member name
     is either null or a valid identifier
 . BTF_KIND_ENUM: the enum type name is either null or a valid
     identifier; the enumerator name must be a valid identifier.
 . BTF_KIND_FWD: the name must be a valid identifier
 . BTF_KIND_TYPEDEF: the name must be a valid identifier

For those places a valid name is required, the name must be
a valid C identifier. This can be relaxed later if we found
use cases for a different (non-C) frontend.

Fixes: 69b693f0ae ("bpf: btf: Introduce BPF Type Format (BTF)")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-28 16:03:04 -08:00
Yonghong Song
cdbb096add bpf: btf: implement btf_name_valid_identifier()
Function btf_name_valid_identifier() have been implemented in
bpf-next commit 2667a2626f ("bpf: btf: Add BTF_KIND_FUNC and
BTF_KIND_FUNC_PROTO"). Backport this function so later patch
can use it.

Fixes: 69b693f0ae ("bpf: btf: Introduce BPF Type Format (BTF)")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-28 16:03:04 -08:00
Linus Torvalds
60b548237f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) ARM64 JIT fixes for subprog handling from Daniel Borkmann.

 2) Various sparc64 JIT bug fixes (fused branch convergance, frame
    pointer usage detection logic, PSEODU call argument handling).

 3) Fix to use BH locking in nf_conncount, from Taehee Yoo.

 4) Fix race of TX skb freeing in ipheth driver, from Bernd Eckstein.

 5) Handle return value of TX NAPI completion properly in lan743x
    driver, from Bryan Whitehead.

 6) MAC filter deletion in i40e driver clears wrong state bit, from
    Lihong Yang.

 7) Fix use after free in rionet driver, from Pan Bian.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits)
  s390/qeth: fix length check in SNMP processing
  net: hisilicon: remove unexpected free_netdev
  rapidio/rionet: do not free skb before reading its length
  i40e: fix kerneldoc for xsk methods
  ixgbe: recognize 1000BaseLX SFP modules as 1Gbps
  i40e: Fix deletion of MAC filters
  igb: fix uninitialized variables
  netfilter: nf_tables: deactivate expressions in rule replecement routine
  lan743x: Enable driver to work with LAN7431
  tipc: fix lockdep warning during node delete
  lan743x: fix return value for lan743x_tx_napi_poll
  net: via: via-velocity: fix spelling mistake "alignement" -> "alignment"
  qed: fix spelling mistake "attnetion" -> "attention"
  net: thunderx: fix NULL pointer dereference in nic_remove
  sctp: increase sk_wmem_alloc when head->truesize is increased
  firestream: fix spelling mistake: "Inititing" -> "Initializing"
  net: phy: add workaround for issue where PHY driver doesn't bind to the device
  usbnet: ipheth: fix potential recvmsg bug and recvmsg bug 2
  sparc: Adjust bpf JIT prologue for PSEUDO calls.
  bpf, doc: add entries of who looks over which jits
  ...
2018-11-28 12:53:48 -08:00
Thomas Gleixner
46f7ecb1e7 ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS
The IBPB control code in x86 removed the usage. Remove the functionality
which was introduced for this.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185005.559149393@linutronix.de
2018-11-28 11:57:11 +01:00
Thomas Gleixner
a74cfffb03 x86/speculation: Rework SMT state change
arch_smt_update() is only called when the sysfs SMT control knob is
changed. This means that when SMT is enabled in the sysfs control knob the
system is considered to have SMT active even if all siblings are offline.

To allow finegrained control of the speculation mitigations, the actual SMT
state is more interesting than the fact that siblings could be enabled.

Rework the code, so arch_smt_update() is invoked from each individual CPU
hotplug function, and simplify the update function while at it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185004.521974984@linutronix.de
2018-11-28 11:57:07 +01:00
Thomas Gleixner
321a874a7e sched/smt: Expose sched_smt_present static key
Make the scheduler's 'sched_smt_present' static key globaly available, so
it can be used in the x86 speculation control code.

Provide a query function and a stub for the CONFIG_SMP=n case.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185004.430168326@linutronix.de
2018-11-28 11:57:07 +01:00
Peter Zijlstra (Intel)
c5511d03ec sched/smt: Make sched_smt_present track topology
Currently the 'sched_smt_present' static key is enabled when at CPU bringup
SMT topology is observed, but it is never disabled. However there is demand
to also disable the key when the topology changes such that there is no SMT
present anymore.

Implement this by making the key count the number of cores that have SMT
enabled.

In particular, the SMT topology bits are set before interrrupts are enabled
and similarly, are cleared after interrupts are disabled for the last time
and the CPU dies.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185004.246110444@linutronix.de
2018-11-28 11:57:06 +01:00
Steven Rostedt (VMware)
b1b35f2e21 function_graph: Have profiler use curr_ret_stack and not depth
The profiler uses trace->depth to find its entry on the ret_stack, but the
depth may not match the actual location of where its entry is (if an
interrupt were to preempt the processing of the profiler for another
function, the depth and the curr_ret_stack will be different).

Have it use the curr_ret_stack as the index to find its ret_stack entry
instead of using the depth variable, as that is no longer guaranteed to be
the same.

Cc: stable@kernel.org
Fixes: 03274a3ffb ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-11-27 20:31:55 -05:00
Steven Rostedt (VMware)
7c6ea35ef5 function_graph: Reverse the order of pushing the ret_stack and the callback
The function graph profiler uses the ret_stack to store the "subtime" and
reuse it by nested functions and also on the return. But the current logic
has the profiler callback called before the ret_stack is updated, and it is
just modifying the ret_stack that will later be allocated (it's just lucky
that the "subtime" is not touched when it is allocated).

This could also cause a crash if we are at the end of the ret_stack when
this happens.

By reversing the order of the allocating the ret_stack and then calling the
callbacks attached to a function being traced, the ret_stack entry is no
longer used before it is allocated.

Cc: stable@kernel.org
Fixes: 03274a3ffb ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-11-27 20:31:54 -05:00
Steven Rostedt (VMware)
552701dd0f function_graph: Move return callback before update of curr_ret_stack
In the past, curr_ret_stack had two functions. One was to denote the depth
of the call graph, the other is to keep track of where on the ret_stack the
data is used. Although they may be slightly related, there are two cases
where they need to be used differently.

The one case is that it keeps the ret_stack data from being corrupted by an
interrupt coming in and overwriting the data still in use. The other is just
to know where the depth of the stack currently is.

The function profiler uses the ret_stack to save a "subtime" variable that
is part of the data on the ret_stack. If curr_ret_stack is modified too
early, then this variable can be corrupted.

The "max_depth" option, when set to 1, will record the first functions going
into the kernel. To see all top functions (when dealing with timings), the
depth variable needs to be lowered before calling the return hook. But by
lowering the curr_ret_stack, it makes the data on the ret_stack still being
used by the return hook susceptible to being overwritten.

Now that there's two variables to handle both cases (curr_ret_depth), we can
move them to the locations where they can handle both cases.

Cc: stable@kernel.org
Fixes: 03274a3ffb ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-11-27 20:31:54 -05:00
Steven Rostedt (VMware)
39eb456dac function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack
Currently, the depth of the ret_stack is determined by curr_ret_stack index.
The issue is that there's a race between setting of the curr_ret_stack and
calling of the callback attached to the return of the function.

Commit 03274a3ffb ("tracing/fgraph: Adjust fgraph depth before calling
trace return callback") moved the calling of the callback to after the
setting of the curr_ret_stack, even stating that it was safe to do so, when
in fact, it was the reason there was a barrier() there (yes, I should have
commented that barrier()).

Not only does the curr_ret_stack keep track of the current call graph depth,
it also keeps the ret_stack content from being overwritten by new data.

The function profiler, uses the "subtime" variable of ret_stack structure
and by moving the curr_ret_stack, it allows for interrupts to use the same
structure it was using, corrupting the data, and breaking the profiler.

To fix this, there needs to be two variables to handle the call stack depth
and the pointer to where the ret_stack is being used, as they need to change
at two different locations.

Cc: stable@kernel.org
Fixes: 03274a3ffb ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-11-27 20:31:54 -05:00
Steven Rostedt (VMware)
d125f3f866 function_graph: Make ftrace_push_return_trace() static
As all architectures now call function_graph_enter() to do the entry work,
no architecture should ever call ftrace_push_return_trace(). Make it static.

This is needed to prepare for a fix of a design bug on how the curr_ret_stack
is used.

Cc: stable@kernel.org
Fixes: 03274a3ffb ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-11-27 20:31:54 -05:00
Paul E. McKenney
aacb5d91ab srcu: Use "ssp" instead of "sp" for srcu_struct pointer
In RCU, the distinction between "rsp", "rnp", and "rdp" has served well
for a great many years, but in SRCU, "sp" vs. "sdp" has proven confusing.
This commit therefore renames SRCU's "sp" pointers to "ssp", so that there
is "ssp" for srcu_struct pointer, "snp" for srcu_node pointer, and "sdp"
for srcu_data pointer.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-27 09:24:17 -08:00
Dennis Krein
eb4c238227 srcu: Lock srcu_data structure in srcu_gp_start()
The srcu_gp_start() function is called with the srcu_struct structure's
->lock held, but not with the srcu_data structure's ->lock.  This is
problematic because this function accesses and updates the srcu_data
structure's ->srcu_cblist, which is protected by that lock.  Failing to
hold this lock can result in corruption of the SRCU callback lists,
which in turn can result in arbitrarily bad results.

This commit therefore makes srcu_gp_start() acquire the srcu_data
structure's ->lock across the calls to rcu_segcblist_advance() and
rcu_segcblist_accelerate(), thus preventing this corruption.

Reported-by: Bart Van Assche <bvanassche@acm.org>
Reported-by: Christoph Hellwig <hch@infradead.org>
Reported-by: Sebastian Kuzminsky <seb.kuzminsky@gmail.com>
Signed-off-by: Dennis Krein <Dennis.Krein@netapp.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Tested-by: Dennis Krein <Dennis.Krein@netapp.com>
Cc: <stable@vger.kernel.org> # 4.16.x
2018-11-27 09:23:57 -08:00
Paul E. McKenney
0809d95451 events: Replace synchronize_sched() with synchronize_rcu()
Now that synchronize_rcu() waits for preempt-disable regions of code
as well as RCU read-side critical sections, synchronize_sched() can be
replaced by synchronize_rcu().  This commit therefore makes this change.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
2018-11-27 09:21:44 -08:00
Paul E. McKenney
25b0077511 workqueue: Replace call_rcu_sched() with call_rcu()
Now that call_rcu()'s callback is not invoked until after all
preempt-disable regions of code have completed (in addition to explicitly
marked RCU read-side critical sections), call_rcu() can be used in place
of call_rcu_sched().  This commit therefore makes that change.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
2018-11-27 09:21:44 -08:00
Paul E. McKenney
cb2f55369d modules: Replace synchronize_sched() and call_rcu_sched()
Now that synchronize_rcu() waits for preempt-disable regions of code
as well as RCU read-side critical sections, synchronize_sched() can
be replaced by synchronize_rcu().  Similarly, call_rcu_sched() can be
replaced by call_rcu().  This commit therefore makes these changes.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Acked-by: Jessica Yu <jeyu@kernel.org>
2018-11-27 09:21:43 -08:00
Paul E. McKenney
c9a863bbb1 sched/membarrier: synchronize_sched() with synchronize_rcu()
Now that synchronize_rcu() waits for preempt-disable regions of code
as well as RCU read-side critical sections, synchronize_sched() can be
replaced by synchronize_rcu().  This commit therefore makes this change.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
2018-11-27 09:21:43 -08:00
Paul E. McKenney
51959d85f3 lockdep: Replace synchronize_sched() with synchronize_rcu()
Now that synchronize_rcu() waits for preempt-disable regions of code
as well as RCU read-side critical sections, synchronize_sched() can be
replaced by synchronize_rcu().  This commit therefore makes this change.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
2018-11-27 09:21:42 -08:00
Paul E. McKenney
ae8b7ce764 kprobes: Replace synchronize_sched() with synchronize_rcu()
Now that synchronize_rcu() waits for preempt-disable regions of code
as well as RCU read-side critical sections, synchronize_sched() can be
replaced by synchronize_rcu().  This commit therefore makes this change.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
2018-11-27 09:21:42 -08:00
Paul E. McKenney
7440172974 tracing: Replace synchronize_sched() and call_rcu_sched()
Now that synchronize_rcu() waits for preempt-disable regions of code
as well as RCU read-side critical sections, synchronize_sched() can
be replaced by synchronize_rcu().  Similarly, call_rcu_sched() can be
replaced by call_rcu().  This commit therefore makes these changes.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: <linux-kernel@vger.kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-11-27 09:21:41 -08:00
Yonghong Song
ba64e7d852 bpf: btf: support proper non-jit func info
Commit 838e96904f ("bpf: Introduce bpf_func_info")
added bpf func info support. The userspace is able
to get better ksym's for bpf programs with jit, and
is able to print out func prototypes.

For a program containing func-to-func calls, the existing
implementation returns user specified number of function
calls and BTF types if jit is enabled. If the jit is not
enabled, it only returns the type for the main function.

This is undesirable. Interpreter may still be used
and we should keep feature identical regardless of
whether jit is enabled or not.
This patch fixed this discrepancy.

Fixes: 838e96904f ("bpf: Introduce bpf_func_info")
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-26 17:57:10 -08:00
Daniel Borkmann
e2c95a6165 bpf, ppc64: generalize fetching subprog into bpf_jit_get_func_addr
Make fetching of the BPF call address from ppc64 JIT generic. ppc64
was using a slightly different variant rather than through the insns'
imm field encoding as the target address would not fit into that space.
Therefore, the target subprog number was encoded into the insns' offset
and fetched through fp->aux->func[off]->bpf_func instead. Given there
are other JITs with this issue and the mechanism of fetching the address
is JIT-generic, move it into the core as a helper instead. On the JIT
side, we get information on whether the retrieved address is a fixed
one, that is, not changing through JIT passes, or a dynamic one. For
the former, JITs can optimize their imm emission because this doesn't
change jump offsets throughout JIT process.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Tested-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-26 17:34:24 -08:00
Paul Moore
2a1fe215e7 audit: use current whenever possible
There are many places, notably audit_log_task_info() and
audit_log_exit(), that take task_struct pointers but in reality they
are always working on the current task.  This patch eliminates the
task_struct arguments and uses current directly which allows a number
of cleanups as well.

Acked-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-11-26 18:41:21 -05:00
Paul Moore
d0a3f18a70 audit: minimize our use of audit_log_format()
There are some cases where we are making multiple audit_log_format()
calls in a row, for no apparent reason.  Squash these down to a
single audit_log_format() call whenever possible.

Acked-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-11-26 18:40:00 -05:00
Steven Rostedt (VMware)
8114865ff8 function_graph: Create function_graph_enter() to consolidate architecture code
Currently all the architectures do basically the same thing in preparing the
function graph tracer on entry to a function. This code can be pulled into a
generic location and then this will allow the function graph tracer to be
fixed, as well as extended.

Create a new function graph helper function_graph_enter() that will call the
hook function (ftrace_graph_entry) and the shadow stack operation
(ftrace_push_return_trace), and remove the need of the architecture code to
manage the shadow stack.

This is needed to prepare for a fix of a design bug on how the curr_ret_stack
is used.

Cc: stable@kernel.org
Fixes: 03274a3ffb ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-11-26 16:18:04 -05:00
David S. Miller
4afe60a97b Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-11-26

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Extend BTF to support function call types and improve the BPF
   symbol handling with this info for kallsyms and bpftool program
   dump to make debugging easier, from Martin and Yonghong.

2) Optimize LPM lookups by making longest_prefix_match() handle
   multiple bytes at a time, from Eric.

3) Adds support for loading and attaching flow dissector BPF progs
   from bpftool, from Stanislav.

4) Extend the sk_lookup() helper to be supported from XDP, from Nitin.

5) Enable verifier to support narrow context loads with offset > 0
   to adapt to LLVM code generation (currently only offset of 0 was
   supported). Add test cases as well, from Andrey.

6) Simplify passing device functions for offloaded BPF progs by
   adding callbacks to bpf_prog_offload_ops instead of ndo_bpf.
   Also convert nfp and netdevsim to make use of them, from Quentin.

7) Add support for sock_ops based BPF programs to send events to
   the perf ring-buffer through perf_event_output helper, from
   Sowmini and Daniel.

8) Add read / write support for skb->tstamp from tc BPF and cg BPF
   programs to allow for supporting rate-limiting in EDT qdiscs
   like fq from BPF side, from Vlad.

9) Extend libbpf API to support map in map types and add test cases
   for it as well to BPF kselftests, from Nikita.

10) Account the maximum packet offset accessed by a BPF program in
    the verifier and use it for optimizing nfp JIT, from Jiong.

11) Fix error handling regarding kprobe_events in BPF sample loader,
    from Daniel T.

12) Add support for queue and stack map type in bpftool, from David.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26 13:08:17 -08:00
David S. Miller
6950012742 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2018-11-25

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix an off-by-one bug when adjusting subprog start offsets after
   patching, from Edward.

2) Fix several bugs such as overflow in size allocation in queue /
   stack map creation, from Alexei.

3) Fix wrong IPv6 destination port byte order in bpf_sk_lookup_udp
   helper, from Andrey.

4) Fix several bugs in bpftool such as preventing an infinite loop
   in get_fdinfo, error handling and man page references, from Quentin.

5) Fix a warning in bpf_trace_printk() that wasn't catching an
   invalid format string, from Martynas.

6) Fix a bug in BPF cgroup local storage where non-atomic allocation
   was used in atomic context, from Roman.

7) Fix a NULL pointer dereference bug in bpftool from reallocarray()
   error handling, from Jakub and Wen.

8) Add a copy of pkt_cls.h and tc_bpf.h uapi headers to the tools
   include infrastructure so that bpftool compiles on older RHEL7-like
   user space which does not ship these headers, from Yonghong.

9) Fix BPF kselftests for user space where to get ping test working
   with ping6 and ping -6, from Li.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-25 20:04:58 -08:00
Colin Ian King
311fe1a813 bpf: btf: fix spelling mistake "Memmber" -> "Member"
There is a spelling mistake in a btf_verifier_log_member message,
fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-26 01:28:16 +01:00
Rustam Kovhaev
cf0dd411e8 bpf, tags: Fix DEFINE_PER_CPU expansion
Building tags produces warning:

  ctags: Warning: kernel/bpf/local_storage.c:10: null expansion of name pattern "\1"

Let's use the same fix as in commit 25528213fe ("tags: Fix DEFINE_PER_CPU
expansions"), even though it violates the usual code style.

Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-26 01:21:39 +01:00
Martynas Pumputis
1efb6ee3ed bpf: fix check of allowed specifiers in bpf_trace_printk
A format string consisting of "%p" or "%s" followed by an invalid
specifier (e.g. "%p%\n" or "%s%") could pass the check which
would make format_decode (lib/vsprintf.c) to warn.

Fixes: 9c959c863f ("tracing: Allow BPF programs to call bpf_trace_printk()")
Reported-by: syzbot+1ec5c5ec949c4adaa0c4@syzkaller.appspotmail.com
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-23 21:54:14 +01:00
Thomas Gleixner
0141de741e posix-timers: Remove license boilerplate
The SPDX identifier defines the license of the file already. No need for
the boilerplate.

Remove also the completely outdated Montavista snail mail address.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: David Riley <davidriley@chromium.org>
Cc: Colin Cross <ccross@android.com>
Cc: Mark Brown <broonie@kernel.org>
Link: https://lkml.kernel.org/r/20181031182253.479792883@linutronix.de
2018-11-23 11:51:22 +01:00
Thomas Gleixner
c804efeb58 posix-clocks: Remove license boiler plate
The SPDX identifier defines the license of the file already. No need for
the boilerplate.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Manfred Rudigier <manfred.rudigier@omicronenergy.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: David Riley <davidriley@chromium.org>
Cc: Colin Cross <ccross@android.com>
Cc: Mark Brown <broonie@kernel.org>
Link: https://lkml.kernel.org/r/20181031182253.385909804@linutronix.de
2018-11-23 11:51:22 +01:00
Thomas Gleixner
2fa6d420c2 sched/clock: Remove license boilerplate
The SPDX identifier defines the license of the file already. No need for
the boilerplate.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: David Riley <davidriley@chromium.org>
Cc: Colin Cross <ccross@android.com>
Cc: Mark Brown <broonie@kernel.org>
Link: https://lkml.kernel.org/r/20181031182253.300140921@linutronix.de
2018-11-23 11:51:22 +01:00
Thomas Gleixner
3c8f2515ac posix-timers/stubs: Remove license boilerplate
The SPDX identifier defines the license of the file already. No need for
the boilerplate.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Riley <davidriley@chromium.org>
Cc: Colin Cross <ccross@android.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: https://lkml.kernel.org/r/20181031182253.215825217@linutronix.de
2018-11-23 11:51:21 +01:00
Thomas Gleixner
6c7811c628 time: Remove license boilerplate
The SPDX identifier defines the license of the files already. No need for
the boilerplates.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: David Riley <davidriley@chromium.org>
Cc: Colin Cross <ccross@android.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: https://lkml.kernel.org/r/20181031182253.132458951@linutronix.de
2018-11-23 11:51:21 +01:00
Thomas Gleixner
9281a7857b time/debug: Remove license boilerplate
The SPDX identifier is enough. Remove the license boilerplate.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: David Riley <davidriley@chromium.org>
Cc: Colin Cross <ccross@android.com>
Cc: Mark Brown <broonie@kernel.org>
Link: https://lkml.kernel.org/r/20181031182253.047449481@linutronix.de
2018-11-23 11:51:21 +01:00
Thomas Gleixner
f49c174b5f hrtimers/tick/clockevents: Remove sloppy license references
"For licencing details see kernel-base/COPYING" and similar license
references have no value over the SPDX identifier. Remove them.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: David Riley <davidriley@chromium.org>
Cc: Colin Cross <ccross@android.com>
Cc: Mark Brown <broonie@kernel.org>
Link: https://lkml.kernel.org/r/20181031182252.963632760@linutronix.de
2018-11-23 11:51:21 +01:00
Thomas Gleixner
35728b8209 time: Add SPDX license identifiers
Update the time(r) core files files with the correct SPDX license
identifier based on the license text in the file itself. The SPDX
identifier is a legally binding shorthand, which can be used instead of the
full boiler plate text.

This work is based on a script and data from Philippe Ombredanne, Kate
Stewart and myself. The data has been created with two independent license
scanners and manual inspection.

The following files do not contain any direct license information and have
been omitted from the big initial SPDX changes:

  timeconst.bc: The .bc files were not touched
  time.c, timer.c, timekeeping.c: Licence was deduced from EXPORT_SYMBOL_GPL

As those files do not contain direct license references they fall under the
project license, i.e. GPL V2 only.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: David Riley <davidriley@chromium.org>
Cc: Colin Cross <ccross@android.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: https://lkml.kernel.org/r/20181031182252.879109557@linutronix.de
2018-11-23 11:51:20 +01:00
Thomas Gleixner
58c5fc2b96 time: Remove useless filenames in top level comments
Remove the pointless filenames in the top level comments. They have no
value at all and just occupy space. While at it tidy up some of the
comments and remove a stale one.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Riley <davidriley@chromium.org>
Cc: Colin Cross <ccross@android.com>
Cc: Mark Brown <broonie@kernel.org>
Link: https://lkml.kernel.org/r/20181031182252.794898238@linutronix.de
2018-11-23 11:51:20 +01:00
Andrea Parri
09d3f015d1 uprobes: Fix handle_swbp() vs. unregister() + register() race once more
Commit:

  142b18ddc8 ("uprobes: Fix handle_swbp() vs unregister() + register() race")

added the UPROBE_COPY_INSN flag, and corresponding smp_wmb() and smp_rmb()
memory barriers, to ensure that handle_swbp() uses fully-initialized
uprobes only.

However, the smp_rmb() is mis-placed: this barrier should be placed
after handle_swbp() has tested for the flag, thus guaranteeing that
(program-order) subsequent loads from the uprobe can see the initial
stores performed by prepare_uprobe().

Move the smp_rmb() accordingly.  Also amend the comments associated
to the two memory barriers to indicate their actual locations.

Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@kernel.org
Fixes: 142b18ddc8 ("uprobes: Fix handle_swbp() vs unregister() + register() race")
Link: http://lkml.kernel.org/r/20181122161031.15179-1-andrea.parri@amarulasolutions.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-23 08:31:19 +01:00
Alexei Starovoitov
813961de3e bpf: fix integer overflow in queue_stack_map
Fix the following issues:

- allow queue_stack_map for root only
- fix u32 max_entries overflow
- disallow value_size == 0

Fixes: f1a2e44a3a ("bpf: add queue and stack maps")
Reported-by: Wei Wu <ww9210@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Mauricio Vasquez B <mauricio.vasquez@polito.it>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-22 21:29:40 +01:00
Sergey Senozhatsky
c7c3f05e34 panic: avoid deadlocks in re-entrant console drivers
From printk()/serial console point of view panic() is special, because
it may force CPU to re-enter printk() or/and serial console driver.
Therefore, some of serial consoles drivers are re-entrant. E.g. 8250:

serial8250_console_write()
{
	if (port->sysrq)
		locked = 0;
	else if (oops_in_progress)
		locked = spin_trylock_irqsave(&port->lock, flags);
	else
		spin_lock_irqsave(&port->lock, flags);
	...
}

panic() does set oops_in_progress via bust_spinlocks(1), so in theory
we should be able to re-enter serial console driver from panic():

	CPU0
	<NMI>
	uart_console_write()
	serial8250_console_write()		// if (oops_in_progress)
						//    spin_trylock_irqsave()
	call_console_drivers()
	console_unlock()
	console_flush_on_panic()
	bust_spinlocks(1)			// oops_in_progress++
	panic()
	<NMI/>
	spin_lock_irqsave(&port->lock, flags)   // spin_lock_irqsave()
	serial8250_console_write()
	call_console_drivers()
	console_unlock()
	printk()
	...

However, this does not happen and we deadlock in serial console on
port->lock spinlock. And the problem is that console_flush_on_panic()
called after bust_spinlocks(0):

void panic(const char *fmt, ...)
{
	bust_spinlocks(1);
	...
	bust_spinlocks(0);
	console_flush_on_panic();
	...
}

bust_spinlocks(0) decrements oops_in_progress, so oops_in_progress
can go back to zero. Thus even re-entrant console drivers will simply
spin on port->lock spinlock. Given that port->lock may already be
locked either by a stopped CPU, or by the very same CPU we execute
panic() on (for instance, NMI panic() on printing CPU) the system
deadlocks and does not reboot.

Fix this by removing bust_spinlocks(0), so oops_in_progress is always
set in panic() now and, thus, re-entrant console drivers will trylock
the port->lock instead of spinning on it forever, when we call them
from console_flush_on_panic().

Link: http://lkml.kernel.org/r/20181025101036.6823-1-sergey.senozhatsky@gmail.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Daniel Wang <wonderfly@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: linux-serial@vger.kernel.org
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-11-22 14:07:43 +01:00
Eric Dumazet
8d75839b84 bpf, lpm: make longest_prefix_match() faster
At LPC 2018 in Vancouver, Vlad Dumitrescu mentioned that longest_prefix_match()
has a high cost [1].

One reason for that cost is a loop handling one byte at a time.

We can handle more bytes at a time, if enough attention is paid
to endianness.

I was able to remove ~55 % of longest_prefix_match() cpu costs.

[1] https://linuxplumbersconf.org/event/2/contributions/88/attachments/76/87/lpc-bpf-2018-shaping.pdf

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vlad Dumitrescu <vladum@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-22 10:35:13 +01:00
Robin Murphy
cb216b84d6 swiotlb: Skip cache maintenance on map error
If swiotlb_bounce_page() failed, calling arch_sync_dma_for_device() may
lead to such delights as performing cache maintenance on whatever
address phys_to_virt(SWIOTLB_MAP_ERROR) looks like, which is typically
outside the kernel memory map and goes about as well as expected.

Don't do that.

Fixes: a4a4330db4 ("swiotlb: add support for non-coherent DMA")
Tested-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-11-21 18:47:58 +01:00
Yonghong Song
838e96904f bpf: Introduce bpf_func_info
This patch added interface to load a program with the following
additional information:
   . prog_btf_fd
   . func_info, func_info_rec_size and func_info_cnt
where func_info will provide function range and type_id
corresponding to each function.

The func_info_rec_size is introduced in the UAPI to specify
struct bpf_func_info size passed from user space. This
intends to make bpf_func_info structure growable in the future.
If the kernel gets a different bpf_func_info size from userspace,
it will try to handle user request with part of bpf_func_info
it can understand. In this patch, kernel can understand
  struct bpf_func_info {
       __u32   insn_offset;
       __u32   type_id;
  };
If user passed a bpf func_info record size of 16 bytes, the
kernel can still handle part of records with the above definition.

If verifier agrees with function range provided by the user,
the bpf_prog ksym for each function will use the func name
provided in the type_id, which is supposed to provide better
encoding as it is not limited by 16 bytes program name
limitation and this is better for bpf program which contains
multiple subprograms.

The bpf_prog_info interface is also extended to
return btf_id, func_info, func_info_rec_size and func_info_cnt
to userspace, so userspace can print out the function prototype
for each xlated function. The insn_offset in the returned
func_info corresponds to the insn offset for xlated functions.
With other jit related fields in bpf_prog_info, userspace can also
print out function prototypes for each jited function.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-20 10:54:39 -08:00
Martin KaFai Lau
2667a2626f bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO
This patch adds BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO
to support the function debug info.

BTF_KIND_FUNC_PROTO must not have a name (i.e. !t->name_off)
and it is followed by >= 0 'struct bpf_param' objects to
describe the function arguments.

The BTF_KIND_FUNC must have a valid name and it must
refer back to a BTF_KIND_FUNC_PROTO.

The above is the conclusion after the discussion between
Edward Cree, Alexei, Daniel, Yonghong and Martin.

By combining BTF_KIND_FUNC and BTF_LIND_FUNC_PROTO,
a complete function signature can be obtained.  It will be
used in the later patches to learn the function signature of
a running bpf program.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-20 10:54:38 -08:00
Martin KaFai Lau
b47a0bd23e bpf: btf: Break up btf_type_is_void()
This patch breaks up btf_type_is_void() into
btf_type_is_void() and btf_type_is_fwd().

It also adds btf_type_nosize() to better describe it is
testing a type has nosize info.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-20 10:54:38 -08:00
Lorenz Bauer
96b3b6c909 bpf: allow zero-initializing hash map seed
Add a new flag BPF_F_ZERO_SEED, which forces a hash map
to initialize the seed to zero. This is useful when doing
performance analysis both on individual BPF programs, as
well as the kernel's hash table implementation.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-20 00:53:39 +01:00
Richard Guy Briggs
c8fc5d49c3 audit: remove WATCH and TREE config options
Remove the CONFIG_AUDIT_WATCH and CONFIG_AUDIT_TREE config options since
they are both dependent on CONFIG_AUDITSYSCALL and force
CONFIG_FSNOTIFY.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-11-19 16:29:50 -05:00
Richard Guy Briggs
a2c97da11c audit: use session_info helper
There are still a couple of places (mark and watch config changes) that
open code auid and ses fields in sequence in records instead of using
the audit_log_session_info() helper.  Use the helper.  Adjust the helper
to accommodate being the first fields.  Passes audit-testsuite.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[PM: fixed misspellings in the description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-11-19 12:31:42 -05:00
Richard Guy Briggs
0fe3c7fceb audit: localize audit_log_session_info prototype
The audit_log_session_info() function is only used in kernel/audit*, so
move its prototype to kernel/audit.h

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-11-19 12:26:27 -05:00
Linus Torvalds
c67a98c00e Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "16 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm/memblock.c: fix a typo in __next_mem_pfn_range() comments
  mm, page_alloc: check for max order in hot path
  scripts/spdxcheck.py: make python3 compliant
  tmpfs: make lseek(SEEK_DATA/SEK_HOLE) return ENXIO with a negative offset
  lib/ubsan.c: don't mark __ubsan_handle_builtin_unreachable as noreturn
  mm/vmstat.c: fix NUMA statistics updates
  mm/gup.c: fix follow_page_mask() kerneldoc comment
  ocfs2: free up write context when direct IO failed
  scripts/faddr2line: fix location of start_kernel in comment
  mm: don't reclaim inodes with many attached pages
  mm, memory_hotplug: check zone_movable in has_unmovable_pages
  mm/swapfile.c: use kvzalloc for swap_info_struct allocation
  MAINTAINERS: update OMAP MMC entry
  hugetlbfs: fix kernel BUG at fs/hugetlbfs/inode.c:444!
  kernel/sched/psi.c: simplify cgroup_move_task()
  z3fold: fix possible reclaim races
2018-11-18 11:31:26 -08:00
Linus Torvalds
03582f338e Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
 "Fix an exec() related scalability/performance regression, which was
  caused by incorrectly calculating load and migrating tasks on exec()
  when they shouldn't be"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix cpu_util_wake() for 'execl' type workloads
2018-11-18 10:58:20 -08:00
Olof Johansson
8fcb2312d1 kernel/sched/psi.c: simplify cgroup_move_task()
The existing code triggered an invalid warning about 'rq' possibly being
used uninitialized.  Instead of doing the silly warning suppression by
initializa it to NULL, refactor the code to bail out early instead.

Warning was:

  kernel/sched/psi.c: In function `cgroup_move_task':
  kernel/sched/psi.c:639:13: warning: `rq' may be used uninitialized in this function [-Wmaybe-uninitialized]

Link: http://lkml.kernel.org/r/20181103183339.8669-1-olof@lixom.net
Fixes: 2ce7135adc ("psi: cgroup support")
Signed-off-by: Olof Johansson <olof@lixom.net>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-18 10:15:09 -08:00
Roman Gushchin
569a933b03 bpf: allocate local storage buffers using GFP_ATOMIC
Naresh reported an issue with the non-atomic memory allocation of
cgroup local storage buffers:

[   73.047526] BUG: sleeping function called from invalid context at
/srv/oe/build/tmp-rpb-glibc/work-shared/intel-corei7-64/kernel-source/mm/slab.h:421
[   73.060915] in_atomic(): 1, irqs_disabled(): 0, pid: 3157, name: test_cgroup_sto
[   73.068342] INFO: lockdep is turned off.
[   73.072293] CPU: 2 PID: 3157 Comm: test_cgroup_sto Not tainted
4.20.0-rc2-next-20181113 #1
[   73.080548] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.0b 07/27/2017
[   73.088018] Call Trace:
[   73.090463]  dump_stack+0x70/0xa5
[   73.093783]  ___might_sleep+0x152/0x240
[   73.097619]  __might_sleep+0x4a/0x80
[   73.101191]  __kmalloc_node+0x1cf/0x2f0
[   73.105031]  ? cgroup_storage_update_elem+0x46/0x90
[   73.109909]  cgroup_storage_update_elem+0x46/0x90

cgroup_storage_update_elem() (as well as other update map update
callbacks) is called with disabled preemption, so GFP_ATOMIC
allocation should be used: e.g. alloc_htab_elem() in hashtab.c.

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-16 21:17:49 -08:00
Edward Cree
afd5942408 bpf: fix off-by-one error in adjust_subprog_starts
When patching in a new sequence for the first insn of a subprog, the start
 of that subprog does not change (it's the first insn of the sequence), so
 adjust_subprog_starts should check start <= off (rather than < off).
Also added a test to test_verifier.c (it's essentially the syz reproducer).

Fixes: cc8b0b92a1 ("bpf: introduce function calls (function boundaries)")
Reported-by: syzbot+4fc427c7af994b0948be@syzkaller.appspotmail.com
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-16 21:10:00 -08:00
Colin Ian King
592ee43faf bpf: fix null pointer dereference on pointer offload
Pointer offload is being null checked however the following statement
dereferences the potentially null pointer offload when assigning
offload->dev_state.  Fix this by only assigning it if offload is not
null.

Detected by CoverityScan, CID#1475437 ("Dereference after null check")

Fixes: 00db12c3d1 ("bpf: call verifier_prep from its callback in struct bpf_offload_dev")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-16 20:48:27 -08:00
Colin Ian King
8ddab42873 padata: clean an indentation issue, remove extraneous space
Trivial fix to clean up an indentation issue

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-11-16 14:11:04 +08:00
Gustavo A. R. Silva
646558ff16 kdb: kdb_support: mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Notice that in this particular case, I replaced the code comments with
a proper "fall through" annotation, which is what GCC is expecting
to find.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2018-11-13 20:38:50 +00:00
Gustavo A. R. Silva
01cb37351b kdb: kdb_keyboard: mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Notice that in this particular case, I replaced the code comments with
a proper "fall through" annotation, which is what GCC is expecting
to find.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2018-11-13 20:38:50 +00:00
Gustavo A. R. Silva
9eb62f0e1b kdb: kdb_main: refactor code in kdb_md_line
Replace the whole switch statement with a for loop.  This makes the
code clearer and easy to read.

This also addresses the following Coverity warnings:

Addresses-Coverity-ID: 115090 ("Missing break in switch")
Addresses-Coverity-ID: 115091 ("Missing break in switch")
Addresses-Coverity-ID: 114700 ("Missing break in switch")

Suggested-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
[daniel.thompson@linaro.org: Tiny grammar change in description]
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2018-11-13 20:37:53 +00:00
Prarit Bhargava
c2b94c72d9 kdb: Use strscpy with destination buffer size
gcc 8.1.0 warns with:

kernel/debug/kdb/kdb_support.c: In function ‘kallsyms_symbol_next’:
kernel/debug/kdb/kdb_support.c:239:4: warning: ‘strncpy’ specified bound depends on the length of the source argument [-Wstringop-overflow=]
     strncpy(prefix_name, name, strlen(name)+1);
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/debug/kdb/kdb_support.c:239:31: note: length computed here

Use strscpy() with the destination buffer size, and use ellipses when
displaying truncated symbols.

v2: Use strscpy()

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Jonathan Toppins <jtoppins@redhat.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: kgdb-bugreport@lists.sourceforge.net
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2018-11-13 20:27:53 +00:00
Christophe Leroy
568fb6f42a kdb: print real address of pointers instead of hashed addresses
Since commit ad67b74d24 ("printk: hash addresses printed with %p"),
all pointers printed with %p are printed with hashed addresses
instead of real addresses in order to avoid leaking addresses in
dmesg and syslog. But this applies to kdb too, with is unfortunate:

    Entering kdb (current=0x(ptrval), pid 329) due to Keyboard Entry
    kdb> ps
    15 sleeping system daemon (state M) processes suppressed,
    use 'ps A' to see all.
    Task Addr       Pid   Parent [*] cpu State Thread     Command
    0x(ptrval)      329      328  1    0   R  0x(ptrval) *sh

    0x(ptrval)        1        0  0    0   S  0x(ptrval)  init
    0x(ptrval)        3        2  0    0   D  0x(ptrval)  rcu_gp
    0x(ptrval)        4        2  0    0   D  0x(ptrval)  rcu_par_gp
    0x(ptrval)        5        2  0    0   D  0x(ptrval)  kworker/0:0
    0x(ptrval)        6        2  0    0   D  0x(ptrval)  kworker/0:0H
    0x(ptrval)        7        2  0    0   D  0x(ptrval)  kworker/u2:0
    0x(ptrval)        8        2  0    0   D  0x(ptrval)  mm_percpu_wq
    0x(ptrval)       10        2  0    0   D  0x(ptrval)  rcu_preempt

The whole purpose of kdb is to debug, and for debugging real addresses
need to be known. In addition, data displayed by kdb doesn't go into
dmesg.

This patch replaces all %p by %px in kdb in order to display real
addresses.

Fixes: ad67b74d24 ("printk: hash addresses printed with %p")
Cc: <stable@vger.kernel.org>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2018-11-13 20:27:37 +00:00
Christophe Leroy
dded2e1592 kdb: use correct pointer when 'btc' calls 'btt'
On a powerpc 8xx, 'btc' fails as follows:

Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0x0

when booting the kernel with 'debug_boot_weak_hash', it fails as well

Entering kdb (current=0xba99ad80, pid 284) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0xba99ad80

On other platforms, Oopses have been observed too, see
https://github.com/linuxppc/linux/issues/139

This is due to btc calling 'btt' with %p pointer as an argument.

This patch replaces %p by %px to get the real pointer value as
expected by 'btt'

Fixes: ad67b74d24 ("printk: hash addresses printed with %p")
Cc: <stable@vger.kernel.org>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2018-11-13 20:27:16 +00:00
Lance Roy
04547728b7 locking/mutex: Replace spin_is_locked() with lockdep
lockdep_assert_held() is better suited to checking locking requirements,
since it only checks if the current thread holds the lock regardless of
whether someone else does. This is also a step towards possibly removing
spin_is_locked().

Signed-off-by: Lance Roy <ldr709@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 09:06:22 -08:00
Paul E. McKenney
5f1a6ef374 rcu: Avoid signed integer overflow in rcu_preempt_deferred_qs()
Subtracting INT_MIN can be interpreted as unconditional signed integer
overflow, which according to the C standard is undefined behavior.
Therefore, kernel build arguments notwithstanding, it would be good to
future-proof the code.  This commit therefore substitutes INT_MAX for
INT_MIN in order to avoid undefined behavior.

While in the neighborhood, this commit also creates some meaningful names
for INT_MAX and friends in order to improve readability, as suggested
by Joel Fernandes.

Reported-by: Ran Rozenstein <ranro@mellanox.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 09:03:59 -08:00
Paul E. McKenney
117f683c6e rcu: Replace this_cpu_ptr() with __this_cpu_read()
Because __this_cpu_read() can be lighter weight than equivalent uses of
this_cpu_ptr(), this commit replaces the latter with the former.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 09:03:59 -08:00
Paul E. McKenney
05f415715c rcu: Speed up expedited GPs when interrupting RCU reader
In PREEMPT kernels, an expedited grace period might send an IPI to a
CPU that is executing an RCU read-side critical section.  In that case,
it would be nice if the rcu_read_unlock() directly interacted with the
RCU core code to immediately report the quiescent state.  And this does
happen in the case where the reader has been preempted.  But it would
also be a nice performance optimization if immediate reporting also
happened in the preemption-free case.

This commit therefore adds an ->exp_hint field to the task_struct structure's
->rcu_read_unlock_special field.  The IPI handler sets this hint when
it has interrupted an RCU read-side critical section, and this causes
the outermost rcu_read_unlock() call to invoke rcu_read_unlock_special(),
which, if preemption is enabled, reports the quiescent state immediately.
If preemption is disabled, then the report is required to be deferred
until preemption (or bottom halves or interrupts or whatever) is re-enabled.

Because this is a hint, it does nothing for more complicated cases.  For
example, if the IPI interrupts an RCU reader, but interrupts are disabled
across the rcu_read_unlock(), but another rcu_read_lock() is executed
before interrupts are re-enabled, the hint will already have been cleared.
If you do crazy things like this, reporting will be deferred until some
later RCU_SOFTIRQ handler, context switch, cond_resched(), or similar.

Reported-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2018-11-12 09:03:59 -08:00
Paul E. McKenney
0a89e5a402 rcu: Trace end of grace period before end of grace period
Currently, rcu_gp_cleanup() traces the end of the old grace period after
the old grace period has officially ended.  This might make intuitive
sense, but it also makes for confusing event-trace output because the
"end" trace displays not the old but instead the new grace-period number.
This commit therefore traces the end of an old grace period just before
that grace period officially ends.

Reported-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 09:03:59 -08:00
Zhouyi Zhou
2320bda26d rcu: Adjust the comment of function rcu_is_watching
Because RCU avoids interrupting idle CPUs, rcu_is_watching() is used to
test whether or not it is currently legal to run RCU read-side critical
sections on this CPU.  However, the first sentence and last sentences
of current comment for rcu_is_watching have opposite meaning of what
is expected.  This commit therefore fixes this header comment.

Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 09:03:59 -08:00
Paul E. McKenney
c669c014d1 rcu: Add jiffies-since-GP-activity to show_rcu_gp_kthreads()
This commit adds a printout of the number of jiffies since the last time
that the RCU grace-period kthread did any processing.  This can be useful
when tracking down forward-progress issues.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 09:03:59 -08:00
Paul E. McKenney
691960197e rcu: Add state name to show_rcu_gp_kthreads() output
This commit adds the name of the RCU grace-period state to
the show_rcu_gp_kthreads() output in order to ease debugging.
This commit also moves gp_state_getname() up in the code so that
show_rcu_gp_kthreads() can use it.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 09:03:59 -08:00
Paul E. McKenney
791416c471 rcu: Parameterize rcu_check_gp_start_stall()
In order to debug forward-progress stalls, it is necessary to check
for excessively delayed grace-period starts.  This is currently done
for RCU CPU stall warnings by rcu_check_gp_start_stall(), which checks
to see if the start of a requested grace period has been delayed by an
RCU CPU stall warning period.  Because rcutorture will need to check
for the time consumed by an RCU forward-progress delay, this commit
promotes gpssdelay from a local variable to a formal parameter.  It is
not necessary to export rcu_check_gp_start_stall() because rcutorture
will access it via a wrapper function.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 09:03:59 -08:00
Paul E. McKenney
b3c1d9ec7c rcu: Avoid double multiply by HZ
The rcu_check_gp_start_stall() function multiplies the return value
from rcu_jiffies_till_stall_check() by HZ, but the units are already
in jiffies.  This commit therefore avoids the need for introduction of
a jiffies-squared unit by removing the extraneous multiplication.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 09:03:59 -08:00
Paul E. McKenney
f0ad56e876 rcu: Eliminate BUG_ON() for kernel/rcu/update.c
The update.c file has a number of calls to BUG_ON(), which panics the
kernel, which is not a good strategy for devices (like embedded) that
don't have a way to capture console output.  This commit therefore
converts these BUG_ON() calls to WARN_ON_ONCE() and WARN_ONCE().

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12 08:15:59 -08:00
Paul E. McKenney
9213784b48 rcu: Eliminate BUG_ON() for kernel/rcu/tree_plugin.h
The tree_plugin.h file has a number of calls to BUG_ON(), which panics
the kernel, which is not a good strategy for devices (like embedded)
that don't have a way to capture console output.  This commit therefore
converts these BUG_ON() calls to WARN_ON_ONCE() and WARN_ONCE().

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
[ paulmck: Fix typo: s/rcuo/rcub/. ]
2018-11-12 08:15:16 -08:00
Jan Kara
f905c2fc39 audit: Use 'mark' name for fsnotify_mark variables
Variables pointing to fsnotify_mark are sometimes called 'entry' and
sometimes 'mark'. Use 'mark' in all places.

Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
[PM: minor merge fuzz due to updated patches previously in the series]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-11-12 09:55:16 -05:00
Jan Kara
83d23bc8ae audit: Replace chunk attached to mark instead of replacing mark
Audit tree code currently associates new fsnotify mark with each new
chunk. As chunk attached to an inode is replaced when new tag is added /
removed, we also need to remove old fsnotify mark and add a new one on
such occasion.  This is cumbersome and makes locking rules somewhat
difficult to follow.

Fix these problems by allocating fsnotify mark independently of chunk
and keeping it all the time while there is some chunk attached to an
inode. Also add documentation about the locking rules so that things are
easier to follow.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
[PM: minor merge fuzz due to updated patches previously in the series]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-11-12 09:55:16 -05:00
Jan Kara
8432c70062 audit: Simplify locking around untag_chunk()
untag_chunk() has to be called with hash_lock, it drops it and
reacquires it when returning. The unlocking of hash_lock is thus hidden
from the callers of untag_chunk() with is rather error prone. Reorganize
the code so that untag_chunk() is called without hash_lock, only with
mark reference preventing the chunk from going away.

Since this requires some more code in the caller of untag_chunk() to
assure forward progress, factor out loop pruning tree from all chunks
into a common helper function.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-11-12 09:54:56 -05:00
Jan Kara
c22fcde775 audit: Drop all unused chunk nodes during deletion
When deleting chunk from a tree, drop all unused nodes in a chunk
instead of just the one used by the tree. This gets rid of possibly
lingering unused nodes (created due to fallback path in untag_chunk())
and also removes some special cases and will allow us to simplify
locking in untag_chunk().

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-11-12 09:54:49 -05:00
Jan Kara
49a4ee7d98 audit: Guarantee forward progress of chunk untagging
When removing chunk from a tree, we do shrink the chunk. This can fail
for various reasons (due to races, ENOMEM, etc.) and in some cases we
just bail from untag_chunk() relying on someone else to cleanup.
Although this currently works, later we will need to add new failure
situation which would break. Also this simplifies the code and will
allow us to make locking around untag_chunk() less awkward.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-11-12 09:54:49 -05:00
Jan Kara
5f5161300d audit: Allocate fsnotify mark independently of chunk
Allocate fsnotify mark independently instead of embedding it inside
chunk. This will allow us to just replace chunk attached to mark when
growing / shrinking chunk instead of replacing mark attached to inode
which is a more complex operation.

Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-11-12 09:54:49 -05:00
Jan Kara
a8375713fb audit: Provide helper for dropping mark's chunk reference
Provide a helper function audit_mark_put_chunk() for dropping mark's
reference (which has to happen only after RCU grace period expires).
Currently that happens only from a single place but in later patches we
introduce more callers.

Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-11-12 09:54:49 -05:00
Jan Kara
8cd0feb523 audit: Remove pointless check in insert_hash()
The audit_tree_group->mark_mutex is held all the time while we create
the fsnotify mark, add it to the inode, and insert chunk into the hash.
Hence mark cannot get detached during this time and so the check whether
the mark is attached in insert_hash() is pointless.

Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-11-12 09:54:48 -05:00
Jan Kara
d31b326d3c audit: Factor out chunk replacement code
Chunk replacement code is very similar for the cases where we grow or
shrink chunk. Factor the code out into a common helper function.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-11-12 09:54:48 -05:00
Jan Kara
1635e57223 audit: Make hash table insertion safe against concurrent lookups
Currently, the audit tree code does not make sure that when a chunk is
inserted into the hash table, it is fully initialized. So in theory a
user of RCU lookup could see uninitialized structure in the hash table
and crash. Add appropriate barriers between initialization of the
structure and its insertion into hash table.

Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-11-12 09:54:48 -05:00
Jan Kara
8d20d6e930 audit: Embed key into chunk
Currently chunk hash key (which is in fact pointer to the inode) is
derived as chunk->mark.conn->obj. It is tricky to make this dereference
reliable for hash table lookups only under RCU as mark can get detached
from the connector and connector gets freed independently of the
running lookup. Thus there is a possible use after free / NULL ptr
dereference issue:

CPU1					CPU2
					untag_chunk()
					  ...
audit_tree_lookup()
  list_for_each_entry_rcu(p, list, hash) {
					  list_del_rcu(&chunk->hash);
					  fsnotify_destroy_mark(entry);
					  fsnotify_put_mark(entry)
    chunk_to_key(p)
      if (!chunk->mark.connector)
					    ...
					    hlist_del_init_rcu(&mark->obj_list);
					    if (hlist_empty(&conn->list)) {
					      inode = fsnotify_detach_connector_from_object(conn);
					    mark->connector = NULL;
					    ...
					    frees connector from workqueue
      chunk->mark.connector->obj

This race is probably impossible to hit in practice as the race window
on CPU1 is very narrow and CPU2 has a lot of code to execute. Still it's
better to have this fixed. Since the inode the chunk is attached to is
constant during chunk's lifetime it is easy to cache the key in the
chunk itself and thus avoid these issues.

Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-11-12 09:54:48 -05:00
Jan Kara
b1e4603b92 audit: Fix possible tagging failures
Audit tree code is replacing marks attached to inodes in non-atomic way.
Thus fsnotify_find_mark() in tag_chunk() may find a mark that belongs to
a chunk that is no longer valid one and will soon be destroyed. Tags
added to such chunk will be simply lost.

Fix the problem by making sure old mark is marked as going away (through
fsnotify_detach_mark()) before dropping mark_mutex and thus in an atomic
way wrt tag_chunk(). Note that this does not fix the problem completely
as if tag_chunk() finds a mark that is going away, it fails with
-ENOENT. But at least the failure is not silent and currently there's no
way to search for another fsnotify mark attached to the inode. We'll fix
this problem in later patch.

Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-11-12 09:54:48 -05:00
Jan Kara
a5789b07b3 audit: Fix possible spurious -ENOSPC error
When an inode is tagged with a tree, tag_chunk() checks whether there is
audit_tree_group mark attached to the inode and adds one if not. However
nothing protects another tag_chunk() to add the mark between we've
checked and try to add the fsnotify mark thus resulting in an error from
fsnotify_add_mark() and consequently an ENOSPC error from tag_chunk().

Fix the problem by holding mark_mutex over the whole check-insert code
sequence.

Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-11-12 09:54:48 -05:00
Jan Kara
9f16d2e624 audit_tree: Remove mark->lock locking
Currently, audit_tree code uses mark->lock to protect against detaching
of mark from an inode. In most places it however also uses
mark->group->mark_mutex (as we need to atomically replace attached
marks) and this provides protection against mark detaching as well. So
just remove protection with mark->lock from audit tree code and replace
it with mark->group->mark_mutex protection in all the places. It
simplifies the code and gets rid of some ugly catches like calling
fsnotify_add_mark_locked() with mark->lock held (which cannot sleep only
because we hold a reference to another mark attached to the same inode).

Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-11-12 09:54:29 -05:00
Viresh Kumar
3e18450108 sched/core: Clean up the #ifdef block in add_nr_running()
There is no point in keeping the conditional statement of the #if block
outside of the #ifdef block, while all of its body is contained within
the #ifdef block.

Move the conditional statement under the #ifdef block as well.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: http://lkml.kernel.org/r/78cbd78a615d6f9fdcd3327f1ead68470f92593e.1541482935.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-12 11:18:06 +01:00
Muchun Song
ed8885a144 sched/fair: Make some variables static
The variables are local to the source and do not
need to be in global scope, so make them static.

Signed-off-by: Muchun Song <smuchun@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181110075202.61172-1-smuchun@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-12 06:18:15 +01:00
Viresh Kumar
1da1843f9f sched/core: Create task_has_idle_policy() helper
We already have task_has_rt_policy() and task_has_dl_policy() helpers,
create task_has_idle_policy() as well and update sched core to start
using it.

While at it, use task_has_dl_policy() at one more place.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: http://lkml.kernel.org/r/ce3915d5b490fc81af926a3b6bfb775e7188e005.1541416894.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-12 06:17:52 +01:00
Patrick Bellasi
b5c0ce7bd1 sched/fair: Add lsub_positive() and use it consistently
The following pattern:

   var -= min_t(typeof(var), var, val);

is used multiple times in fair.c.

The existing sub_positive() already captures that pattern, but it also
adds an explicit load-store to properly support lockless observations.
In other cases the pattern above is used to update local, and/or not
concurrently accessed, variables.

Let's add a simpler version of sub_positive(), targeted at local variables
updates, which gives the same readability benefits at calling sites,
without enforcing {READ,WRITE}_ONCE() barriers.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/lkml/20181031184527.GA3178@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-12 06:17:52 +01:00
Patrick Bellasi
92a801e5d5 sched/fair: Mask UTIL_AVG_UNCHANGED usages
The _task_util_est() is mainly used to add/remove the task contribution
to/from the rq's estimated utilization at task enqueue/dequeue time.
In both cases we ensure the UTIL_AVG_UNCHANGED flag is set to keep
consistency between enqueue and dequeue time while still being
transparent to update_load_avg calls which will eventually reset the
flag.

Let's move the flag forcing within _task_util_est() itself so that we
can simplify calling code by hiding that estimated utilization
implementation detail into one of its internal functions.

This will affect also the "public" API task_util_est() but we know that
the flag will (eventually) impact just on the LSB of the estimated
utilization, thus it's certainly acceptable.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: http://lkml.kernel.org/r/20181105145400.935-3-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-12 06:17:52 +01:00
Ingo Molnar
59e1678c29 Merge branch 'sched/urgent' into sched/core, to pick up dependent fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-12 05:15:33 +01:00
Patrick Bellasi
c469933e77 sched/fair: Fix cpu_util_wake() for 'execl' type workloads
A ~10% regression has been reported for UnixBench's execl throughput
test by Aaron Lu and Ye Xiaolong:

  https://lkml.org/lkml/2018/10/30/765

That test is pretty simple, it does a "recursive" execve() syscall on the
same binary. Starting from the syscall, this sequence is possible:

   do_execve()
     do_execveat_common()
       __do_execve_file()
         sched_exec()
           select_task_rq_fair()          <==| Task already enqueued
             find_idlest_cpu()
               find_idlest_group()
                 capacity_spare_wake()    <==| Functions not called from
		   cpu_util_wake()           | the wakeup path

which means we can end up calling cpu_util_wake() not only from the
"wakeup path", as its name would suggest. Indeed, the task doing an
execve() syscall is already enqueued on the CPU we want to get the
cpu_util_wake() for.

The estimated utilization for a CPU computed in cpu_util_wake() was
written under the assumption that function can be called only from the
wakeup path. If instead the task is already enqueued, we end up with a
utilization which does not remove the current task's contribution from
the estimated utilization of the CPU.
This will wrongly assume a reduced spare capacity on the current CPU and
increase the chances to migrate the task on execve.

The regression is tracked down to:

 commit d519329f72 ("sched/fair: Update util_est only on util_avg updates")

because in that patch we turn on by default the UTIL_EST sched feature.
However, the real issue is introduced by:

 commit f9be3e5961 ("sched/fair: Use util_est in LB and WU paths")

Let's fix this by ensuring to always discount the task estimated
utilization from the CPU's estimated utilization when the task is also
the current one. The same benchmark of the bug report, executed on a
dual socket 40 CPUs Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz machine,
reports these "Execl Throughput" figures (higher the better):

   mainline     : 48136.5 lps
   mainline+fix : 55376.5 lps

which correspond to a 15% speedup.

Moreover, since {cpu_util,capacity_spare}_wake() are not really only
used from the wakeup path, let's remove this ambiguity by using a better
matching name: {cpu_util,capacity_spare}_without().

Since we are at that, let's also improve the existing documentation.

Reported-by: Aaron Lu <aaron.lu@intel.com>
Reported-by: Ye Xiaolong <xiaolong.ye@intel.com>
Tested-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Fixes: f9be3e5961 (sched/fair: Use util_est in LB and WU paths)
Link: https://lore.kernel.org/lkml/20181025093100.GB13236@e110439-lin/
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-12 05:00:46 +01:00
Linus Torvalds
08b5278650 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
 "Just the removal of a redundant call into the sched deadline overrun
  check"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-cpu-timers: Remove useless call to check_dl_overrun()
2018-11-11 16:37:41 -06:00
Linus Torvalds
024d4d4c0c Merge branch 'sched/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
 "Two small scheduler fixes:

   - Take hotplug lock in sched_init_smp(). Technically not really
     required, but lockdep will complain other.

   - Trivial comment fix in sched/fair"

* 'sched/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix a comment in task_numa_fault()
  sched/core: Take the hotplug lock in sched_init_smp()
2018-11-11 16:33:00 -06:00
Linus Torvalds
0b002cdd50 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core fixes from Thomas Gleixner:
 "A couple of fixlets for the core:

   - Kernel doc function documentation fixes

   - Missing prototypes for weak watchdog functions"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  resource/docs: Complete kernel-doc style function documentation
  watchdog/core: Add missing prototypes for weak functions
  resource/docs: Fix new kernel-doc warnings
2018-11-11 16:14:05 -06:00
Paul E. McKenney
9cac83a57e rcu: Stop expedited grace periods from relying on stop-machine
The CPU-selection code in sync_rcu_exp_select_cpus() disables preemption
to prevent the cpu_online_mask from changing.  However, this relies on
the stop-machine mechanism in the CPU-hotplug offline code, which is not
desirable (it would be good to someday remove the stop-machine mechanism).

This commit therefore instead uses the relevant leaf rcu_node structure's
->ffmask, which has a bit set for all CPUs that are fully functional.
A given CPU's bit is cleared very early during offline processing by
rcutree_offline_cpu() and set very late during online processing by
rcutree_online_cpu().  Therefore, if a CPU's bit is set in this mask, and
preemption is disabled, we have to be before the synchronize_sched() in
the CPU-hotplug offline code, which means that the CPU is guaranteed to be
workqueue-ready throughout the duration of the enclosing preempt_disable()
region of code.

This also has the side-effect of using WORK_CPU_UNBOUND if all the CPUs for
this leaf rcu_node structure are offline, which is an acceptable difference
in behavior.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-11-11 11:23:01 -08:00
Andrey Ignatov
46f53a65d2 bpf: Allow narrow loads with offset > 0
Currently BPF verifier allows narrow loads for a context field only with
offset zero. E.g. if there is a __u32 field then only the following
loads are permitted:
  * off=0, size=1 (narrow);
  * off=0, size=2 (narrow);
  * off=0, size=4 (full).

On the other hand LLVM can generate a load with offset different than
zero that make sense from program logic point of view, but verifier
doesn't accept it.

E.g. tools/testing/selftests/bpf/sendmsg4_prog.c has code:

  #define DST_IP4			0xC0A801FEU /* 192.168.1.254 */
  ...
  	if ((ctx->user_ip4 >> 24) == (bpf_htonl(DST_IP4) >> 24) &&

where ctx is struct bpf_sock_addr.

Some versions of LLVM can produce the following byte code for it:

       8:       71 12 07 00 00 00 00 00         r2 = *(u8 *)(r1 + 7)
       9:       67 02 00 00 18 00 00 00         r2 <<= 24
      10:       18 03 00 00 00 00 00 fe 00 00 00 00 00 00 00 00         r3 = 4261412864 ll
      12:       5d 32 07 00 00 00 00 00         if r2 != r3 goto +7 <LBB0_6>

where `*(u8 *)(r1 + 7)` means narrow load for ctx->user_ip4 with size=1
and offset=3 (7 - sizeof(ctx->user_family) = 3). This load is currently
rejected by verifier.

Verifier code that rejects such loads is in bpf_ctx_narrow_access_ok()
what means any is_valid_access implementation, that uses the function,
works this way, e.g. bpf_skb_is_valid_access() for __sk_buff or
sock_addr_is_valid_access() for bpf_sock_addr.

The patch makes such loads supported. Offset can be in [0; size_default)
but has to be multiple of load size. E.g. for __u32 field the following
loads are supported now:
  * off=0, size=1 (narrow);
  * off=1, size=1 (narrow);
  * off=2, size=1 (narrow);
  * off=3, size=1 (narrow);
  * off=0, size=2 (narrow);
  * off=2, size=2 (narrow);
  * off=0, size=4 (full).

Reported-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 22:29:59 -08:00
Quentin Monnet
16a8cb5cff bpf: do not pass netdev to translate() and prepare() offload callbacks
The kernel functions to prepare verifier and translate for offloaded
program retrieve "offload" from "prog", and "netdev" from "offload".
Then both "prog" and "netdev" are passed to the callbacks.

Simplify this by letting the drivers retrieve the net device themselves
from the offload object attached to prog - if they need it at all. There
is currently no need to pass the netdev as an argument to those
functions.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 15:39:54 -08:00
Quentin Monnet
a40a26322a bpf: pass prog instead of env to bpf_prog_offload_verifier_prep()
Function bpf_prog_offload_verifier_prep(), called from the kernel BPF
verifier to run a driver-specific callback for preparing for the
verification step for offloaded programs, takes a pointer to a struct
bpf_verifier_env object. However, no driver callback needs the whole
structure at this time: the two drivers supporting this, nfp and
netdevsim, only need a pointer to the struct bpf_prog instance held by
env.

Update the callback accordingly, on kernel side and in these two
drivers.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 15:39:54 -08:00
Quentin Monnet
eb9119471e bpf: pass destroy() as a callback and remove its ndo_bpf subcommand
As part of the transition from ndo_bpf() to callbacks attached to struct
bpf_offload_dev for some of the eBPF offload operations, move the
functions related to program destruction to the struct and remove the
subcommand that was used to call them through the NDO.

Remove function __bpf_offload_ndo(), which is no longer used.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 15:39:54 -08:00
Quentin Monnet
b07ade27e9 bpf: pass translate() as a callback and remove its ndo_bpf subcommand
As part of the transition from ndo_bpf() to callbacks attached to struct
bpf_offload_dev for some of the eBPF offload operations, move the
functions related to code translation to the struct and remove the
subcommand that was used to call them through the NDO.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 15:39:54 -08:00
Quentin Monnet
00db12c3d1 bpf: call verifier_prep from its callback in struct bpf_offload_dev
In a way similar to the change previously brought to the verify_insn
hook and to the finalize callback, switch to the newly added ops in
struct bpf_prog_offload for calling the functions used to prepare driver
verifiers.

Since the dev_ops pointer in struct bpf_prog_offload is no longer used
by any callback, we can now remove it from struct bpf_prog_offload.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 15:39:54 -08:00
Quentin Monnet
6dc18fa6f4 bpf: call finalize() from its callback in struct bpf_offload_dev
In a way similar to the change previously brought to the verify_insn
hook, switch to the newly added ops in struct bpf_prog_offload for
calling the functions used to perform final verification steps for
offloaded programs.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 15:39:53 -08:00
Quentin Monnet
341b3e7b7b bpf: call verify_insn from its callback in struct bpf_offload_dev
We intend to remove the dev_ops in struct bpf_prog_offload, and to only
keep the ops in struct bpf_offload_dev instead, which is accessible from
more locations for passing function pointers.

But dev_ops is used for calling the verify_insn hook. Switch to the
newly added ops in struct bpf_prog_offload instead.

To avoid table lookups for each eBPF instruction to verify, we remember
the offdev attached to a netdev and modify bpf_offload_find_netdev() to
avoid performing more than once a lookup for a given offload object.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 15:39:53 -08:00
Quentin Monnet
1385d755cf bpf: pass a struct with offload callbacks to bpf_offload_dev_create()
For passing device functions for offloaded eBPF programs, there used to
be no place where to store the pointer without making the non-offloaded
programs pay a memory price.

As a consequence, three functions were called with ndo_bpf() through
specific commands. Now that we have struct bpf_offload_dev, and since
none of those operations rely on RTNL, we can turn these three commands
into hooks inside the struct bpf_prog_offload_ops, and pass them as part
of bpf_offload_dev_create().

This commit effectively passes a pointer to the struct to
bpf_offload_dev_create(). We temporarily have two struct
bpf_prog_offload_ops instances, one under offdev->ops and one under
offload->dev_ops. The next patches will make the transition towards the
former, so that offload->dev_ops can be removed, and callbacks relying
on ndo_bpf() added to offdev->ops as well.

While at it, rename "nfp_bpf_analyzer_ops" as "nfp_bpf_dev_ops" (and
similarly for netdevsim).

Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10 15:39:53 -08:00
Linus Torvalds
1de4f2ef21 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace fixes from Eric Biederman:
 "I believe all of these are simple obviously correct bug fixes. These
  fall into two groups:

   - Fixing the implementation of MNT_LOCKED which prevents lesser
     privileged users from seeing unders mounts created by more
     privileged users.

   - Fixing the extended uid and group mapping in user namespaces.

  As well as ensuring the code looks correct I have spot tested these
  changes as well and in my testing the fixes are working"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  mount: Prevent MNT_DETACH from disconnecting locked mounts
  mount: Don't allow copying MNT_UNBINDABLE|MNT_LOCKED mounts
  mount: Retest MNT_LOCKED in do_umount
  userns: also map extents in the reverse map to kernel IDs
2018-11-10 13:27:58 -06:00
Jiong Wang
e647815a4d bpf: let verifier to calculate and record max_pkt_offset
In check_packet_access, update max_pkt_offset after the offset has passed
__check_packet_access.

It should be safe to use u32 for max_pkt_offset as explained in code
comment.

Also, when there is tail call, the max_pkt_offset of the called program is
unknown, so conservatively set max_pkt_offset to MAX_PACKET_OFF for such
case.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-09 09:16:31 +01:00
Paul E. McKenney
0607ba8403 srcu: Prevent __call_srcu() counter wrap with read-side critical section
Ever since cdf7abc461 ("srcu: Allow use of Tiny/Tree SRCU from
both process and interrupt context"), it has been permissible
to use SRCU read-side critical sections in interrupt context.
This allows __call_srcu() to use SRCU read-side critical sections to
prevent a new SRCU grace period from ending before the call to either
srcu_funnel_gp_start() or srcu_funnel_exp_start completes, thus preventing
SRCU grace-period counter overflow during that time.

Note that this does not permit removal of the counter-wrap checks in
srcu_gp_end().  These check are necessary to handle the case where
a given CPU does not interact at all with SRCU for an extended time
period.

This commit therefore adds an SRCU read-side critical section to
__call_srcu() in order to prevent grace period counter wrap during
the funnel-locking process.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-11-08 21:54:14 -08:00
Paul E. McKenney
78d125d338 sched/membarrier: Replace synchronize_sched() with synchronize_rcu()
Now that synchronize_rcu() waits for preempt-disable regions of code
as well as RCU read-side critical sections, the synchronize_sched()
in sys_membarrier() can be replaced by synchronize_rcu().  This commit
therefore makes this change.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2018-11-08 21:43:20 -08:00
Paul E. McKenney
d3ff3891b2 rcu: Consolidate the RCU update functions invoked by sync.c
This commit retains all the various gp_ops[] entries, but makes their
update functions all be synchronize_rcu(), call_rcu() and rcu_barrier().
The read-side checks remain consistent with the various RCU flavors,
which still exist on the read side.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
2018-11-08 21:43:20 -08:00
Paul E. McKenney
309ba859b9 rcu: Eliminate synchronize_rcu_mult()
Now that synchronize_rcu() waits for both RCU read-side critical
sections and preempt-disabled regions of code, the sole caller of
synchronize_rcu_mult() can be replaced by synchronize_rcu().
This patch makes this change and removes synchronize_rcu_mult().
Note that _wait_rcu_gp() still supports synchronize_rcu_mult(),
and thus might be simplified in the future to take only take
a single call_rcu() function rather than the current list of them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-11-08 21:43:20 -08:00
Joel Fernandes (Google)
adbccddb4a rcu: Fix rcu_{node,data} comments about gp_seq_needed
Recent changes have removed the old ->gp_seq_needed field from the
rcu_state structure, which in turn obsoleted a couple of comments in
the rcu_node and rcu_data structures.  This commit therefore updates
these comments accordingly.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: <kernel-team@android.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-08 21:43:20 -08:00
Joel Fernandes (Google)
75a8f72245 rcu: Remove unused rcu_state externs
The rcu_bh_state and rcu_sched_state variables were removed during the
RCU flavor consolidations, but external declarations remain in tree.h.
This commit therefore removes these obsolete declarations.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: <kernel-team@android.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-08 21:43:20 -08:00
Paul E. McKenney
08543bda42 rcu: Eliminate BUG_ON() for kernel/rcu/tree.c
The tree.c file has a number of calls to BUG_ON(), which panics the
kernel, which is not a good strategy for devices (like embedded) that
don't have a way to capture console output.  This commit therefore
converts these BUG_ON() calls to WARN_ON_ONCE() and WARN_ONCE().

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-08 21:41:57 -08:00
Paul E. McKenney
042d4c70a2 rcu: Eliminate BUG_ON() for sync.c
The sync.c file has a number of calls to BUG_ON(), which panics the
kernel, which is not a good strategy for devices (like embedded) that
don't have a way to capture console output.  This commit therefore
changes these BUG_ON() calls to WARN_ON_ONCE(), but does so quite naively.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
2018-11-08 21:41:57 -08:00
Juri Lelli
e6a2d72c10 posix-cpu-timers: Remove useless call to check_dl_overrun()
check_dl_overrun() is used to send a SIGXCPU to users that asked to be
informed when a SCHED_DEADLINE runtime overruns occur.

The function is called by check_thread_timers() already, so the call in
check_process_timers() is redundant/wrong (even though harmless).

Remove it.

Fixes: 34be39305a ("sched/deadline: Implement "runtime overrun signal" support")
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: linux-rt-users@vger.kernel.org
Cc: mtk.manpages@gmail.com
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
Cc: Claudio Scordino <claudio@evidence.eu.com>
Link: https://lkml.kernel.org/r/20181107111032.32291-1-juri.lelli@redhat.com
2018-11-08 07:43:35 +01:00
Jann Horn
d2f007dbe7 userns: also map extents in the reverse map to kernel IDs
The current logic first clones the extent array and sorts both copies, then
maps the lower IDs of the forward mapping into the lower namespace, but
doesn't map the lower IDs of the reverse mapping.

This means that code in a nested user namespace with >5 extents will see
incorrect IDs. It also breaks some access checks, like
inode_owner_or_capable() and privileged_wrt_inode_uidgid(), so a process
can incorrectly appear to be capable relative to an inode.

To fix it, we have to make sure that the "lower_first" members of extents
in both arrays are translated; and we have to make sure that the reverse
map is sorted *after* the translation (since otherwise the translation can
break the sorting).

This is CVE-2018-18955.

Fixes: 6397fac491 ("userns: bump idmap limits to 340")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Tested-by: Eric W. Biederman <ebiederm@xmission.com>
Reviewed-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2018-11-07 23:51:16 -06:00
Borislav Petkov
f26621e60b resource/docs: Complete kernel-doc style function documentation
Add the missing kernel-doc style function parameters documentation.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: linux-tip-commits@vger.kernel.org
Cc: rdunlap@infradead.org
Fixes: b69c2e20f6 ("resource: Clean it up a bit")
Link: http://lkml.kernel.org/r/20181105093307.GA12445@zn.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-07 16:47:47 +01:00
Ke Wu
e84cd7ee63 modsign: use all trusted keys to verify module signature
Make mod_verify_sig to use all trusted keys. This allows keys in
secondary_trusted_keys to be used to verify PKCS#7 signature on a
kernel module.

Signed-off-by: Ke Wu <mikewu@google.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
2018-11-07 14:41:41 +01:00
Long Li
e8da8794a7 genirq/matrix: Improve target CPU selection for managed interrupts.
On large systems with multiple devices of the same class (e.g. NVMe disks,
using managed interrupts), the kernel can affinitize these interrupts to a
small subset of CPUs instead of spreading them out evenly.

irq_matrix_alloc_managed() tries to select the CPU in the supplied cpumask
of possible target CPUs which has the lowest number of interrupt vectors
allocated.

This is done by searching the CPU with the highest number of available
vectors. While this is correct for non-managed CPUs it can select the wrong
CPU for managed interrupts. Under certain constellations this results in
affinitizing the managed interrupts of several devices to a single CPU in
a set.

The book keeping of available vectors works the following way:

 1) Non-managed interrupts:

    available is decremented when the interrupt is actually requested by
    the device driver and a vector is assigned. It's incremented when the
    interrupt and the vector are freed.

 2) Managed interrupts:

    Managed interrupts guarantee vector reservation when the MSI/MSI-X
    functionality of a device is enabled, which is achieved by reserving
    vectors in the bitmaps of the possible target CPUs. This reservation
    decrements the available count on each possible target CPU.

    When the interrupt is requested by the device driver then a vector is
    allocated from the reserved region. The operation is reversed when the
    interrupt is freed by the device driver. Neither of these operations
    affect the available count.

    The reservation persist up to the point where the MSI/MSI-X
    functionality is disabled and only this operation increments the
    available count again.

For non-managed interrupts the available count is the correct selection
criterion because the guaranteed reservations need to be taken into
account. Using the allocated counter could lead to a failing allocation in
the following situation (total vector space of 10 assumed):

		 CPU0	CPU1
 available:	    2	   0
 allocated:	    5	   3   <--- CPU1 is selected, but available space = 0
 managed reserved:  3	   7

 while available yields the correct result.

For managed interrupts the available count is not the appropriate
selection criterion because as explained above the available count is not
affected by the actual vector allocation.

The following example illustrates that. Total vector space of 10
assumed. The starting point is:

		 CPU0	CPU1
 available:	    5	   4
 allocated:	    2	   3
 managed reserved:  3	   3

 Allocating vectors for three non-managed interrupts will result in
 affinitizing the first two to CPU0 and the third one to CPU1 because the
 available count is adjusted with each allocation:

		  CPU0	CPU1
 available:	     5	   4	<- Select CPU0 for 1st allocation
 --> allocated:	     3	   3

 available:	     4	   4	<- Select CPU0 for 2nd allocation
 --> allocated:	     4	   3

 available:	     3	   4	<- Select CPU1 for 3rd allocation
 --> allocated:	     4	   4

 But the allocation of three managed interrupts starting from the same
 point will affinitize all of them to CPU0 because the available count is
 not affected by the allocation (see above). So the end result is:

		  CPU0	CPU1
 available:	     5	   4
 allocated:	     5	   3

Introduce a "managed_allocated" field in struct cpumap to track the vector
allocation for managed interrupts separately. Use this information to
select the target CPU when a vector is allocated for a managed interrupt,
which results in more evenly distributed vector assignments. The above
example results in the following allocations:

		 CPU0	CPU1
 managed_allocated: 0	   0	<- Select CPU0 for 1st allocation
 --> allocated:	    3	   3

 managed_allocated: 1	   0	<- Select CPU1 for 2nd allocation
 --> allocated:	    3	   4

 managed_allocated: 1	   1	<- Select CPU0 for 3rd allocation
 --> allocated:	    4	   4

The allocation of non-managed interrupts is not affected by this change and
is still evaluating the available count.

The overall distribution of interrupt vectors for both types of interrupts
might still not be perfectly even depending on the number of non-managed
and managed interrupts in a system, but due to the reservation guarantee
for managed interrupts this cannot be avoided.

Expose the new field in debugfs as well.

[ tglx: Clarified the background of the problem in the changelog and
  	described it independent of NVME ]

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Michael Kelley <mikelley@microsoft.com>
Link: https://lkml.kernel.org/r/20181106040000.27316-1-longli@linuxonhyperv.com
2018-11-06 23:20:13 +01:00
Linus Torvalds
8053e5b93e Masami found a slight bug in his code where he transposed the arguments of a
call to strpbrk.
 
 The reason this wasn't detected in our tests is that the only way this would
 transpire is when a kprobe event with a symbol offset is attached to a
 function that belongs to a module that isn't loaded yet. When the kprobe
 trace event is added, the offset would be truncated after it was parsed,
 and when the module is loaded, it would use the symbol without the offset
 (as the nul character added by the parsing would not be replaced with the
 original character).
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCW+Dv7BQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qoa6AQDHmraHc2A2Q2sKKaLa7HcJvz7y1dez
 K73QtEJx/C0sUwEA9bALVV+TSO/C468/VjrdA5qMNUn6RpUR4HV7aWTHiQg=
 =Tn/+
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "Masami found a slight bug in his code where he transposed the
  arguments of a call to strpbrk.

  The reason this wasn't detected in our tests is that the only way this
  would transpire is when a kprobe event with a symbol offset is
  attached to a function that belongs to a module that isn't loaded yet.
  When the kprobe trace event is added, the offset would be truncated
  after it was parsed, and when the module is loaded, it would use the
  symbol without the offset (as the nul character added by the parsing
  would not be replaced with the original character)"

* tag 'trace-v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/kprobes: Fix strpbrk() argument order
2018-11-06 08:12:10 -08:00
Linus Torvalds
a13511dfa8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Handle errors mid-stream of an all dump, from Alexey Kodanev.

 2) Fix build of openvswitch with certain combinations of netfilter
    options, from Arnd Bergmann.

 3) Fix interactions between GSO and BQL, from Eric Dumazet.

 4) Don't put a '/' in RTL8201F's sysfs file name, from Holger
    Hoffstätte.

 5) S390 qeth driver fixes from Julian Wiedmann.

 6) Allow ipv6 link local addresses for netconsole when both source and
    destination are link local, from Matwey V. Kornilov.

 7) Fix the BPF program address seen in /proc/kallsyms, from Song Liu.

 8) Initialize mutex before use in dsa microchip driver, from Tristram
    Ha.

 9) Out-of-bounds access in hns3, from Yunsheng Lin.

10) Various netfilter fixes from Stefano Brivio, Jozsef Kadlecsik, Jiri
    Slaby, Florian Westphal, Eric Westbrook, Andrey Ryabinin, and Pablo
    Neira Ayuso.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (50 commits)
  net: alx: make alx_drv_name static
  net: bpfilter: fix iptables failure if bpfilter_umh is disabled
  sock_diag: fix autoloading of the raw_diag module
  net: core: netpoll: Enable netconsole IPv6 link local address
  ipv6: properly check return value in inet6_dump_all()
  rtnetlink: restore handling of dumpit return value in rtnl_dump_all()
  net/ipv6: Move anycast init/cleanup functions out of CONFIG_PROC_FS
  bonding/802.3ad: fix link_failure_count tracking
  net: phy: realtek: fix RTL8201F sysfs name
  sctp: define SCTP_SS_DEFAULT for Stream schedulers
  sctp: fix strchange_flags name for Stream Change Event
  mlxsw: spectrum: Fix IP2ME CPU policer configuration
  openvswitch: fix linking without CONFIG_NF_CONNTRACK_LABELS
  qed: fix link config error handling
  net: hns3: Fix for out-of-bounds access when setting pfc back pressure
  net/mlx4_en: use __netdev_tx_sent_queue()
  net: do not abort bulk send on BQL status
  net: bql: add __netdev_tx_sent_queue()
  s390/qeth: report 25Gbit link speed
  s390/qeth: sanitize ARP requests
  ...
2018-11-06 07:44:04 -08:00
Richard Guy Briggs
ea956d8be9 audit: print empty EXECVE args
Empty executable arguments were being skipped when printing out the list
of arguments in an EXECVE record, making it appear they were somehow
lost.  Include empty arguments as an itemized empty string.

Reproducer:
	autrace /bin/ls "" "/etc"
	ausearch --start recent -m execve -i | grep EXECVE
	type=EXECVE msg=audit(10/03/2018 13:04:03.208:1391) : argc=3 a0=/bin/ls a2=/etc

With fix:
	type=EXECVE msg=audit(10/03/2018 21:51:38.290:194) : argc=3 a0=/bin/ls a1= a2=/etc
	type=EXECVE msg=audit(1538617898.290:194): argc=3 a0="/bin/ls" a1="" a2="/etc"

Passes audit-testsuite.  GH issue tracker at
https://github.com/linux-audit/audit-kernel/issues/99

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[PM: cleaned up the commit metadata]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-11-05 16:41:49 -05:00
Masami Hiramatsu
ee474b81fe tracing/kprobes: Fix strpbrk() argument order
Fix strpbrk()'s argument order, it must pass acceptable string
in 2nd argument. Note that this can cause a kernel panic where
it recovers backup character to code->data.

Link: http://lkml.kernel.org/r/154108256792.2604.1816052586385217811.stgit@devbox

Fixes: a6682814f3 ("tracing/kprobes: Allow kprobe-events to record module symbol")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-11-05 09:47:14 -05:00
Yangtao Li
7d9df98be6 clockevents: Remove unnecessary unlikely()
WARN_ON() and WARN_ON_ONCE() already contains an unlikely(), so it's not
necessary to use unlikely.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20181104023104.2572-1-tiny.windzz@gmail.com
2018-11-05 14:22:54 +01:00
Jens Axboe
6da4b3ab9a genirq/affinity: Add support for allocating interrupt sets
A driver may have a need to allocate multiple sets of MSI/MSI-X interrupts,
and have them appropriately affinitized.

Add support for defining a number of sets in the irq_affinity structure, of
varying sizes, and get each set affinitized correctly across the machine.

[ tglx: Minor changelog tweaks ]

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-block@vger.kernel.org
Link: https://lkml.kernel.org/r/20181102145951.31979-5-ming.lei@redhat.com
2018-11-05 12:16:27 +01:00
Ming Lei
060746d9e3 genirq/affinity: Pass first vector to __irq_build_affinity_masks()
No functional change.

Prepares for support of allocating and affinitizing sets of interrupts, in
which each set of interrupts needs a full two stage spreading. The first
vector argument is necessary for this so the affinitizing starts from the
first vector of each set.

[ tglx: Minor changelog tweaks ]

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Cc: Hannes Reinecke <hare@suse.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Link: https://lkml.kernel.org/r/20181102145951.31979-4-ming.lei@redhat.com
2018-11-05 12:16:26 +01:00
Ming Lei
5c903e108d genirq/affinity: Move two stage affinity spreading into a helper function
No functional change. Prepares for supporting allocating and affinitizing
interrupt sets.

[ tglx: Minor changelog tweaks ]

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Cc: Hannes Reinecke <hare@suse.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Link: https://lkml.kernel.org/r/20181102145951.31979-3-ming.lei@redhat.com
2018-11-05 12:16:26 +01:00
Long Li
b825921990 genirq/affinity: Spread IRQs to all available NUMA nodes
If the number of NUMA nodes exceeds the number of MSI/MSI-X interrupts
which are allocated for a device, the interrupt affinity spreading code
fails to spread them across all nodes.

The reason is, that the spreading code starts from node 0 and continues up
to the number of interrupts requested for allocation. This leaves the nodes
past the last interrupt unused.

This results in interrupt concentration on the first nodes which violates
the assumption of the block layer that all nodes are covered evenly. As a
consequence the NUMA nodes above the number of interrupts are all assigned
to hardware queue 0 and therefore NUMA node 0, which results in bad
performance and has CPU hotplug implications, because queue 0 gets shut
down when the last CPU of node 0 is offlined.

Go over all NUMA nodes and assign them round-robin to all requested
interrupts to solve this.

[ tglx: Massaged changelog ]

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: Michael Kelley <mikelley@microsoft.com>
Link: https://lkml.kernel.org/r/20181102180248.13583-1-longli@linuxonhyperv.com
2018-11-05 12:16:26 +01:00
Randy Dunlap
f75d651587 resource/docs: Fix new kernel-doc warnings
The first group of warnings is caused by a "/**" kernel-doc notation
marker but the function comments are not in kernel-doc format.
Also add another error return value here.

  ../kernel/resource.c:337: warning: Function parameter or member 'start' not described in 'find_next_iomem_res'
  ../kernel/resource.c:337: warning: Function parameter or member 'end' not described in 'find_next_iomem_res'
  ../kernel/resource.c:337: warning: Function parameter or member 'flags' not described in 'find_next_iomem_res'
  ../kernel/resource.c:337: warning: Function parameter or member 'desc' not described in 'find_next_iomem_res'
  ../kernel/resource.c:337: warning: Function parameter or member 'first_lvl' not described in 'find_next_iomem_res'
  ../kernel/resource.c:337: warning: Function parameter or member 'res' not described in 'find_next_iomem_res'

Add the missing function parameter documentation for the other warnings:

  ../kernel/resource.c:409: warning: Function parameter or member 'arg' not described in 'walk_iomem_res_desc'
  ../kernel/resource.c:409: warning: Function parameter or member 'func' not described in 'walk_iomem_res_desc'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: b69c2e20f6 ("resource: Clean it up a bit")
Link: http://lkml.kernel.org/r/dda2e4d8-bedd-3167-20fe-8c7d2d35b354@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-05 07:05:04 +01:00
Yi Wang
e1ff516a56 sched/fair: Fix a comment in task_numa_fault()
Duplicated 'case it'.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Reviewed-by: Xi Xu <xu.xi8@zte.com.cn>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: zhong.weidong@zte.com.cn
Link: http://lkml.kernel.org/r/1541379013-11352-1-git-send-email-wang.yi59@zte.com.cn
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-05 07:03:59 +01:00
Linus Torvalds
71e5602817 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "A memory (under-)allocation fix and a comment fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/topology: Fix off by one bug
  sched/rt: Update comment in pick_next_task_rt()
2018-11-03 18:37:09 -07:00
Linus Torvalds
601a88077c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "A number of fixes and some late updates:

   - make in_compat_syscall() behavior on x86-32 similar to other
     platforms, this touches a number of generic files but is not
     intended to impact non-x86 platforms.

   - objtool fixes

   - PAT preemption fix

   - paravirt fixes/cleanups

   - cpufeatures updates for new instructions

   - earlyprintk quirk

   - make microcode version in sysfs world-readable (it is already
     world-readable in procfs)

   - minor cleanups and fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  compat: Cleanup in_compat_syscall() callers
  x86/compat: Adjust in_compat_syscall() to generic code under !COMPAT
  objtool: Support GCC 9 cold subfunction naming scheme
  x86/numa_emulation: Fix uniform-split numa emulation
  x86/paravirt: Remove unused _paravirt_ident_32
  x86/mm/pat: Disable preemption around __flush_tlb_all()
  x86/paravirt: Remove GPL from pv_ops export
  x86/traps: Use format string with panic() call
  x86: Clean up 'sizeof x' => 'sizeof(x)'
  x86/cpufeatures: Enumerate MOVDIR64B instruction
  x86/cpufeatures: Enumerate MOVDIRI instruction
  x86/earlyprintk: Add a force option for pciserial device
  objtool: Support per-function rodata sections
  x86/microcode: Make revision and processor flags world-readable
2018-11-03 18:25:17 -07:00
Linus Torvalds
01897f3e05 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates and fixes from Ingo Molnar:
 "These are almost all tooling updates: 'perf top', 'perf trace' and
  'perf script' fixes and updates, an UAPI header sync with the merge
  window versions, license marker updates, much improved Sparc support
  from David Miller, and a number of fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (66 commits)
  perf intel-pt/bts: Calculate cpumode for synthesized samples
  perf intel-pt: Insert callchain context into synthesized callchains
  perf tools: Don't clone maps from parent when synthesizing forks
  perf top: Start display thread earlier
  tools headers uapi: Update linux/if_link.h header copy
  tools headers uapi: Update linux/netlink.h header copy
  tools headers: Sync the various kvm.h header copies
  tools include uapi: Update linux/mmap.h copy
  perf trace beauty: Use the mmap flags table generated from headers
  perf beauty: Wire up the mmap flags table generator to the Makefile
  perf beauty: Add a generator for MAP_ mmap's flag constants
  tools include uapi: Update asound.h copy
  tools arch uapi: Update asm-generic/unistd.h and arm64 unistd.h copies
  tools include uapi: Update linux/fs.h copy
  perf callchain: Honour the ordering of PERF_CONTEXT_{USER,KERNEL,etc}
  perf cs-etm: Correct CPU mode for samples
  perf unwind: Take pgoff into account when reporting elf to libdwfl
  perf top: Do not use overwrite mode by default
  perf top: Allow disabling the overwrite mode
  perf trace: Beautify mount's first pathname arg
  ...
2018-11-03 18:13:43 -07:00
Linus Torvalds
e9ebc2151f Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
 "An irqchip driver fix and a memory (over-)allocation fix"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/irq-mvebu-sei: Fix a NULL vs IS_ERR() bug in probe function
  irq/matrix: Fix memory overallocation
2018-11-03 18:12:09 -07:00
Muchun Song
ff1cdc94de sched/core: Introduce set_next_task() helper for better code readability
When we pick the next task, we will do the following for the task:

  1) p->se.exec_start = rq_clock_task(rq);
  2) dequeue_pushable(_dl)_task(rq, p);

When we call set_curr_task(), we also need to do the same thing
above. In rt.c, the code at 1) is in the _pick_next_task_rt()
and the code at 2) is in the pick_next_task_rt(). If we put two
operations in one function, maybe better. So, we introduce a new
function set_next_task(), which is responsible for doing the above.

By introducing the function we can get rid of calling the
dequeue_pushable(_dl)_task() directly(We can call set_next_task())
in pick_next_task() and have better code readability and reuse.
In set_curr_task_rt(), we also can call set_next_task().

Do this things such that we end up with:

  static struct task_struct *pick_next_task(struct rq *rq,
  					    struct task_struct *prev,
  					    struct rq_flags *rf)
  {
  	/* do something else ... */

  	put_prev_task(rq, prev);

  	/* pick next task p */

  	set_next_task(rq, p);

  	/* do something else ... */
  }

put_prev_task() can match set_next_task(), which can make the
code more readable.

Signed-off-by: Muchun Song <smuchun@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181026131743.21786-1-smuchun@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-04 00:59:24 +01:00
Valentin Schneider
3f130a37c4 sched/fair: Don't increase sd->balance_interval on newidle balance
When load_balance() fails to move some load because of task affinity,
we end up increasing sd->balance_interval to delay the next periodic
balance in the hopes that next time we look, that annoying pinned
task(s) will be gone.

However, idle_balance() pays no attention to sd->balance_interval, yet
it will still lead to an increase in balance_interval in case of
pinned tasks.

If we're going through several newidle balances (e.g. we have a
periodic task), this can lead to a huge increase of the
balance_interval in a very small amount of time.

To prevent that, don't increase the balance interval when going
through a newidle balance.

This is a similar approach to what is done in commit 58b26c4c02
("sched: Increment cache_nice_tries only on periodic lb"), where we
disregard newidle balance and rely on periodic balance for more stable
results.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dietmar.Eggemann@arm.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: patrick.bellasi@arm.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1537974727-30788-2-git-send-email-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-04 00:59:23 +01:00
Valentin Schneider
47b7aee14f sched/fair: Clean up load_balance() condition
The alignment of the condition is off, clean that up.

Also, logical operators have lower precedence than bitwise/relational
operators, so remove one layer of parentheses to make the condition a
bit simpler to follow.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dietmar.Eggemann@arm.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: patrick.bellasi@arm.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1537974727-30788-1-git-send-email-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-04 00:59:22 +01:00
Valentin Schneider
40fa3780ba sched/core: Take the hotplug lock in sched_init_smp()
When running on linux-next (8c60c36d0b8c ("Add linux-next specific files
for 20181019")) + CONFIG_PROVE_LOCKING=y on a big.LITTLE system (e.g.
Juno or HiKey960), we get the following report:

 [    0.748225] Call trace:
 [    0.750685]  lockdep_assert_cpus_held+0x30/0x40
 [    0.755236]  static_key_enable_cpuslocked+0x20/0xc8
 [    0.760137]  build_sched_domains+0x1034/0x1108
 [    0.764601]  sched_init_domains+0x68/0x90
 [    0.768628]  sched_init_smp+0x30/0x80
 [    0.772309]  kernel_init_freeable+0x278/0x51c
 [    0.776685]  kernel_init+0x10/0x108
 [    0.780190]  ret_from_fork+0x10/0x18

The static_key in question is 'sched_asym_cpucapacity' introduced by
commit:

  df054e8445 ("sched/topology: Add static_key for asymmetric CPU capacity optimizations")

In this particular case, we enable it because smp_prepare_cpus() will
end up fetching the capacity-dmips-mhz entry from the devicetree,
so we already have some asymmetry detected when entering sched_init_smp().

This didn't get detected in tip/sched/core because we were missing:

  commit cb538267ea ("jump_label/lockdep: Assert we hold the hotplug lock for _cpuslocked() operations")

Calls to build_sched_domains() post sched_init_smp() will hold the
hotplug lock, it just so happens that this very first call is a
special case. As stated by a comment in sched_init_smp(), "There's no
userspace yet to cause hotplug operations" so this is a harmless
warning.

However, to both respect the semantics of underlying
callees and make lockdep happy, take the hotplug lock in
sched_init_smp(). This also satisfies the comment atop
sched_init_domains() that says "Callers must hold the hotplug lock".

Reported-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dietmar.Eggemann@arm.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: morten.rasmussen@arm.com
Cc: quentin.perret@arm.com
Link: http://lkml.kernel.org/r/1540301851-3048-1-git-send-email-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-04 00:57:44 +01:00
Peter Zijlstra
993f0b0510 sched/topology: Fix off by one bug
With the addition of the NUMA identity level, we increased @level by
one and will run off the end of the array in the distance sort loop.

Fixed: 051f3ca02e ("sched/topology: Introduce NUMA identity node sched domain")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-04 00:40:03 +01:00
Ingo Molnar
23a12ddee1 Merge branch 'core/urgent' into x86/urgent, to pick up objtool fix
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-03 23:42:16 +01:00
Michael Schupikov
6f0483d1f9 kernel/sysctl.c: remove duplicated include
Remove one include of <linux/pipe_fs_i.h>.
No functional changes.

Link: http://lkml.kernel.org/r/20181004134223.17735-1-michael@schupikov.de
Signed-off-by: Michael Schupikov <michael@schupikov.de>
Reviewed-by: Richard Weinberger <richard@nod.at>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03 10:09:37 -07:00
zhong jiang
3383b36040 kernel/kexec_file.c: remove some duplicated includes
We include kexec.h and slab.h twice in kexec_file.c. It's unnecessary.
hence just remove them.

Link: http://lkml.kernel.org/r/1537498098-19171-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03 10:09:37 -07:00
Daniel Borkmann
28c2fae726 bpf: fix bpf_prog_get_info_by_fd to return 0 func_lens for unpriv
While dbecd73884 ("bpf: get kernel symbol addresses via syscall")
zeroed info.nr_jited_ksyms in bpf_prog_get_info_by_fd() for queries
from unprivileged users, commit 815581c11c ("bpf: get JITed image
lengths of functions via syscall") forgot about doing so and therefore
returns the #elems of the user set up buffer which is incorrect. It
also needs to indicate a info.nr_jited_func_lens of zero.

Fixes: 815581c11c ("bpf: get JITed image lengths of functions via syscall")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Sandipan Das <sandipan@linux.vnet.ibm.com>
Cc: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-02 13:51:15 -07:00
Song Liu
ff1889fc53 bpf: show main program address and length in bpf_prog_info
Currently, when there is no subprog (prog->aux->func_cnt == 0),
bpf_prog_info does not return any jited_ksyms or jited_func_lens. This
patch adds main program address (prog->bpf_func) and main program
length (prog->jited_len) to bpf_prog_info.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-02 21:39:01 +01:00
Song Liu
de57e99ceb bpf: show real jited address in bpf_prog_info->jited_ksyms
Currently, jited_ksyms in bpf_prog_info shows page addresses of jited
bpf program. The main reason here is to not expose randomized start
address. However, this is not ideal for detailed profiling (find hot
instructions from stack traces). This patch replaces the page address
with real prog start address.

This change is OK because bpf_prog_get_info_by_fd() is only available
to root.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-02 21:39:01 +01:00
Song Liu
df0734702a bpf: show real jited prog address in /proc/kallsyms
Currently, /proc/kallsyms shows page address of jited bpf program. The
main reason here is to not expose randomized start address. However,
This is not ideal for detailed profiling (find hot instructions from
stack traces). This patch replaces the page address with real prog start
address.

This change is OK because these addresses are still protected by sysctl
kptr_restrict (see kallsyms_show_value()), and only programs loaded by
root are added to kallsyms (see bpf_prog_kallsyms_add()).

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-02 21:39:01 +01:00
Linus Torvalds
5f21585384 for-linus-20181102
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlvchGgQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpj/1D/4kEQx4ncnFoZk8QshHV1L++rH3BbcLjQDd
 Wbh9ZSIQdI/gHTzS6bE7x3YfcbpMWPMO3+jFawdfRiFTEjlF8vQ+mnJ+Btb3z4D6
 mGEeFGVhHExlp2a0x/Ma8YWVNlMB7BE8Tq73bZEVMY+9lbpmDW/vp7Sfa87LBDKQ
 ZmY+My+VdHN7qLtQ7t3W/HtpbU+kcXMMd3ICjK4i+ofXy6mynk4+oQ2jwyXc5L86
 UCJCsTsSRr3CgbnkW/uprHo0XHk8i7O/4C3oR+x4pAIxCCa9g+vmw0EO9fvi/2iQ
 qe8jKdm7Y09xu/TiPBa7iz45tdh0cNMJKo3OezmSF9Np+r69KL5C/U4GRPKN3Iwm
 keoqn14ScABkYMSe4ys1AdEgKD6bNUaW3r/lJxTH2oUR23mjnCLp7c4WD/G+MlbB
 CzoakQyCHTZmDFLr2Kc8bkjmpil2T2UFfmLIDAu30LWIYeSGpiIO/V+g1foJMF2f
 06ERltNvgX1BJjoh4NSWySLEf1ZtkUU60NeATRol6gwhnIyLrHsgfm6OEhqlW/7x
 Xc1BWyzX7K6c3Dskk/u5aSRyXOyRC9KkMt3/2XexeDNHkte9yMH0IgSvopPBuER8
 +iPvPjNp7ychTKZB3zpSnlqGgePTjbufIEBtO3OyUmDZKjUqxahtxkQfmPhoclu+
 XdR4ArcqNg==
 =0zM4
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20181102' of git://git.kernel.dk/linux-block

Pull block layer fixes from Jens Axboe:
 "The biggest part of this pull request is the revert of the blkcg
  cleanup series. It had one fix earlier for a stacked device issue, but
  another one was reported. Rather than play whack-a-mole with this,
  revert the entire series and try again for the next kernel release.

  Apart from that, only small fixes/changes.

  Summary:

   - Indentation fixup for mtip32xx (Colin Ian King)

   - The blkcg cleanup series revert (Dennis Zhou)

   - Two NVMe fixes. One fixing a regression in the nvme request
     initialization in this merge window, causing nvme-fc to not work.
     The other is a suspend/resume p2p resource issue (James, Keith)

   - Fix sg discard merge, allowing us to merge in cases where we didn't
     before (Jianchao Wang)

   - Call rq_qos_exit() after the queue is frozen, preventing a hang
     (Ming)

   - Fix brd queue setup, fixing an oops if we fail setting up all
     devices (Ming)"

* tag 'for-linus-20181102' of git://git.kernel.dk/linux-block:
  nvme-pci: fix conflicting p2p resource adds
  nvme-fc: fix request private initialization
  blkcg: revert blkcg cleanups series
  block: brd: associate with queue until adding disk
  block: call rq_qos_exit() after queue is frozen
  mtip32xx: clean an indentation issue, remove extraneous tabs
  block: fix the DISCARD request merge
2018-11-02 11:25:48 -07:00
Dennis Zhou
b5f2954d30 blkcg: revert blkcg cleanups series
This reverts a series committed earlier due to null pointer exception
bug report in [1]. It seems there are edge case interactions that I did
not consider and will need some time to understand what causes the
adverse interactions.

The original series can be found in [2] with a follow up series in [3].

[1] https://www.spinics.net/lists/cgroups/msg20719.html
[2] https://lore.kernel.org/lkml/20180911184137.35897-1-dennisszhou@gmail.com/
[3] https://lore.kernel.org/lkml/20181020185612.51587-1-dennis@kernel.org/

This reverts the following commits:
d459d853c2, b2c3fa5467, 101246ec02, b3b9f24f5f, e2b0989954,
f0fcb3ec89, c839e7a03f, bdc2491708, 74b7c02a9b, 5bf9a1f3b4,
a7b39b4e96, 07b05bcc32, 49f4c2dc2b, 27e6fa996c

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-01 19:59:53 -06:00
Linus Torvalds
b5b1de3537 virtio, vhost: fixes, tweaks
virtio balloon page hinting support
 vhost scsi control queue
 
 misc fixes.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJb222AAAoJECgfDbjSjVRpfHMH/1uh87pIj6Qbh9LRZm0dRVHs
 iEuIa0TECb+9AmqHRBliEcqWjihWnlQnSwE6d/ZTk9cH9zEEXsIu0B7nDzjGwJ5b
 m7wq679JqrtGpCdQhRH85sk8P3fs5ldatdbc4/0fsPwwkoXHcqTOpSZWyhtQHGc4
 EkzzxHXVhVXgRBjzLXVCMoOAQ+8QfMZFrKIgKuOB0I4OVughFrAGf0Hemm18f4CL
 5+YwsJQjleDkm+Udf+FTQS2oZ57DJsOLm2bwoKgqCkBaDfPlR92uWjgTa50WXggo
 RaokpFQkJKpz11yenezslzrVWUJApnWnUhfRd71t1ttvujrrcD9zUvWEVMURbuU=
 =2i/5
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio/vhost updates from Michael Tsirkin:
 "Fixes and tweaks:

   - virtio balloon page hinting support

   - vhost scsi control queue

   - misc fixes"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  MAINTAINERS: remove reference to bogus vsock file
  vhost/scsi: Use common handling code in request queue handler
  vhost/scsi: Extract common handling code from control queue handler
  vhost/scsi: Respond to control queue operations
  vhost/scsi: truncate T10 PI iov_iter to prot_bytes
  virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
  mm/page_poison: expose page_poisoning_enabled to kernel modules
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  kvm_config: add CONFIG_VIRTIO_MENU
2018-11-01 14:42:49 -07:00
Linus Torvalds
2d6bb6adb7 New gcc plugin: stackleak
- Introduces the stackleak gcc plugin ported from grsecurity by Alexander
   Popov, with x86 and arm64 support.
 -----BEGIN PGP SIGNATURE-----
 Comment: Kees Cook <kees@outflux.net>
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlvQvn4WHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJpSfD/sErFreuPT1beSw994Lr9Zx4k9v
 ERsuXxWBENaJOJXbOOHMfVEcEeG/1uhPSp7hlw/dpHfh0anATTrcYqm8RNKbfK+k
 o06+JK14OJfpm5Ghq/7OizhdNLCMT8wMU3XZtWfy65VSJGjEFx8Y48vMeQtpWtUK
 ylSzi9JV6j2iUBF9oibtiT53+yqsqAtX80X1G7HRCgv9kxuKMhZr+Q5oGV6+ViyQ
 Azj8mNn06iRnhHKd17WxDJr0GjSibzz4weS/9XgP3t3EcNWJo1EgBlD2KV3tOfP5
 nzmqfqTqrcjxs/tyjdh6vVCSlYucNtyCQGn63qyShQYSg6mZwclR2fY8YSTw6PWw
 GfYWFOWru9z+qyQmwFkQ9bSQS2R+JIT0oBCj9VmtF9XmPCy7K2neJsQclzSPBiCW
 wPgXVQS4IA4684O5CmDOVMwmDpGvhdBNUR6cqSzGLxQOHY1csyXubMNUsqU3g9xk
 Ob4pEy/xrrIw4WpwHcLHSEW5gV1/OLhsT0fGRJJiC947L3cN5s9EZp7FLbIS0zlk
 qzaXUcLmn6AgcfkYwg5cI3RMLaN2V0eDCMVTWZJ1wbrmUV9chAaOnTPTjNqLOTht
 v3b1TTxXG4iCpMmOFf59F8pqgAwbBDlfyNSbySZ/Pq5QH69udz3Z9pIUlYQnSJHk
 u6q++2ReDpJXF81rBw==
 =Ks6B
 -----END PGP SIGNATURE-----

Merge tag 'stackleak-v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull stackleak gcc plugin from Kees Cook:
 "Please pull this new GCC plugin, stackleak, for v4.20-rc1. This plugin
  was ported from grsecurity by Alexander Popov. It provides efficient
  stack content poisoning at syscall exit. This creates a defense
  against at least two classes of flaws:

   - Uninitialized stack usage. (We continue to work on improving the
     compiler to do this in other ways: e.g. unconditional zero init was
     proposed to GCC and Clang, and more plugin work has started too).

   - Stack content exposure. By greatly reducing the lifetime of valid
     stack contents, exposures via either direct read bugs or unknown
     cache side-channels become much more difficult to exploit. This
     complements the existing buddy and heap poisoning options, but
     provides the coverage for stacks.

  The x86 hooks are included in this series (which have been reviewed by
  Ingo, Dave Hansen, and Thomas Gleixner). The arm64 hooks have already
  been merged through the arm64 tree (written by Laura Abbott and
  reviewed by Mark Rutland and Will Deacon).

  With VLAs having been removed this release, there is no need for
  alloca() protection, so it has been removed from the plugin"

* tag 'stackleak-v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  arm64: Drop unneeded stackleak_check_alloca()
  stackleak: Allow runtime disabling of kernel stack erasing
  doc: self-protection: Add information about STACKLEAK feature
  fs/proc: Show STACKLEAK metrics in the /proc file system
  lkdtm: Add a test for STACKLEAK
  gcc-plugins: Add STACKLEAK plugin for tracking the kernel stack
  x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
2018-11-01 11:46:27 -07:00
Linus Torvalds
82aa467151 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) BPF verifier fixes from Daniel Borkmann.

 2) HNS driver fixes from Huazhong Tan.

 3) FDB only works for ethernet devices, reject attempts to install FDB
    rules for others. From Ido Schimmel.

 4) Fix spectre V1 in vhost, from Jason Wang.

 5) Don't pass on-stack object to irq_set_affinity_hint() in mvpp2
    driver, from Marc Zyngier.

 6) Fix mlx5e checksum handling when RXFCS is enabled, from Eric
    Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (49 commits)
  openvswitch: Fix push/pop ethernet validation
  net: stmmac: Fix stmmac_mdio_reset() when building stmmac as modules
  bpf: test make sure to run unpriv test cases in test_verifier
  bpf: add various test cases to test_verifier
  bpf: don't set id on after map lookup with ptr_to_map_val return
  bpf: fix partial copy of map_ptr when dst is scalar
  libbpf: Fix compile error in libbpf_attach_type_by_name
  kselftests/bpf: use ping6 as the default ipv6 ping binary if it exists
  selftests: mlxsw: qos_mc_aware: Add a test for UC awareness
  selftests: mlxsw: qos_mc_aware: Tweak for min shaper
  mlxsw: spectrum: Set minimum shaper on MC TCs
  mlxsw: reg: QEEC: Add minimum shaper fields
  net: hns3: bugfix for rtnl_lock's range in the hclgevf_reset()
  net: hns3: bugfix for rtnl_lock's range in the hclge_reset()
  net: hns3: bugfix for handling mailbox while the command queue reinitialized
  net: hns3: fix incorrect return value/type of some functions
  net: hns3: bugfix for hclge_mdio_write and hclge_mdio_read
  net: hns3: bugfix for is_valid_csq_clean_head()
  net: hns3: remove unnecessary queue reset in the hns3_uninit_all_ring()
  net: hns3: bugfix for the initialization of command queue's spin lock
  ...
2018-11-01 09:16:01 -07:00
Linus Torvalds
eb7046e9bf platform-drivers-x86 for v4.20-1
Move the Dell dcdbas and dell_rbu drivers into platform/drivers/x86 as
 they are closely coupled with other drivers in this location.
 
 Improve _init* usage for acerhdf and fix some usage issues with messages
 and module parameters.
 
 Simplify asus-wmi by calling ACPI/WMI methods directly, eliminating
 workqueue overhead, eliminate double reporting of keyboard backlight.
 
 Fix wake from USB failure on Bay Trail devices (intel_int0002_vgpio).
 
 Notify intel_telemetry users when IPC1 device is not enabled.
 
 Update various drivers with new laptop model IDs.
 
 Update several intel drivers to use SPDX identifers and order headers
 alphabetically.
 
 The following is an automated git shortlog grouped by driver:
 
 Add Intel AtomISP2 dummy / power-management driver:
  - Add Intel AtomISP2 dummy / power-management driver
 
 lg-laptop:
  - Add LG Gram laptop special features driver
 
 HID:
  -  asus: only support backlight when it's not driven by WMI
 
 MAINTAINERS:
  -  intel_telemetry: Update maintainers info
  -  intel_pmc_core: Update MAINTAINERS
  -  Update maintainer for dcdbas and dell_rbu
  -  Use my infradead account exclusively for PDx86 work
 
 acerhdf:
  -  restructure to allow large BIOS table be __initconst
  -  mark appropriate content with __init prefix
  -  Add BIOS entry for Gateway LT31 v1.3307
  -  Remove cut-and-paste trap from instructions
  -  Enable ability to list supported systems
  -  clarify modinfo messages for BIOS override
 
 asus-wmi:
  -  export function for evaluating WMI methods
  -  Only notify kbd LED hw_change by fn-key pressed
  -  Simplify the keyboard brightness updating process
 
 firmware:
  -  dcdbas: include linux/io.h
  -  dcdbas: Move dcdbas to drivers/platform/x86
  -  dell_rbu: Move dell_rbu to drivers/platform/x86
  -  dcdbas: Add support for WSMT ACPI table
  -  dell_rbu: Make payload memory uncachable
 
 ideapad-laptop:
  -  Add Y530-15ICH to no_hw_rfkill
  -  Use __func__ instead of read_ec_cmd in pr_err
 
 intel-hid:
  -  Convert to use SPDX identifier
 
 intel-ips:
  -  Convert to use SPDX identifier
 
 intel-rst:
  -  Convert to use SPDX identifier
  -  Sort headers alphabetically
 
 intel-smartconnect:
  -  Convert to use SPDX identifier
  -  Sort headers alphabetically
 
 intel-wmi-thunderbolt:
  -  Add dynamic debugging
  -  Convert to use SPDX identifier
 
 intel_bxtwc_tmu:
  -  Convert to use SPDX identifier
 
 intel_cht_int33fe:
  -  Convert to use SPDX identifier
 
 intel_chtdc_ti_pwrbtn:
  -  Add SPDX identifier
 
 intel_int0002_vgpio:
  -  Convert to use SPDX identifier
  -  Implement irq_set_wake
  -  Enable the driver on Bay Trail platforms
 
 intel_menlow:
  -  Convert to use SPDX identifier
  -  Sort headers alphabetically
 
 intel_mid_powerbtn:
  -  Convert to use SPDX identifier
  -  Remove unnecessary init.h inclusion
  -  Get rid of custom ICPU() macro
 
 intel_mid_thermal:
  -  Convert to use SPDX identifier
  -  Sort headers alphabetically
 
 intel_oaktrail:
  -  Convert to use SPDX identifier
  -  Sort headers alphabetically
 
 intel_pmc:
  -  Convert to use SPDX identifier
  -  Sort headers alphabetically
 
 intel_punit_ipc:
  -  Convert to use SPDX identifier
  -  Sort headers alphabetically
 
 intel_scu_ipc:
  -  Convert to use SPDX identifier
  -  Sort headers alphabetically
 
 intel_telemetry:
  -  Get rid of custom macro
  -  report debugfs failure
  -  Convert to use SPDX identifier
 
 intel_turbo_max_3:
  -  Convert to use SPDX identifier
  -  Sort headers alphabetically
 
 mlx-platform:
  -  Properly use mlxplat_mlxcpld_msn201x_items
 
 touchscreen_dmi:
  -  Add min-x and min-y settings for various models
  -  Add info for the Onda V80 Plus v3 tablet
  -  Add info for the Trekstor Primetab T13B tablet
  -  Add info for the Trekstor Primebook C11 convertible
 
 tracing:
  -  Trivia spelling fix containerof() -> container_of()
 
 wmi:
  -  declare device_type structure as constant
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJb2mDCAAoJEKbMaAwKp364KdkH/Rar8cenw7uFeVzT5o4fGzTn
 0Oo1Pnwh+hPgmQwmf8O/mHsa/1Zb9nX4nNLU/ysDUjAAZ9wZLoB57HlImQquPA74
 c8lXTz5MVF140WB1f1Ck5PmeonJR2Xve7HTYCoBmGo71jOxHxCgJbp1Giiho0Imd
 3NXzzahEnAiy/L5PhqqYTmIfTeAHQ2Fo7GkHCjMnS20FP2fBtvPrc7db03F+FZ4y
 QTfbT0oR42NdeW21AbCRGY3P47axB3xDJX4LOYrb4uo4CIzd1yatlHk407NRRyv8
 pab6aeJR23FouA/tcvv+T36D3vqRw78e6BWdoyowWOd7CUOgjtSC3lKGzfT6hSY=
 =0BDZ
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v4.20-1' of git://git.infradead.org/linux-platform-drivers-x86

Pull x86 platform driver updates from Darren Hart:

 - Move the Dell dcdbas and dell_rbu drivers into platform/drivers/x86
   as they are closely coupled with other drivers in this location.

 - Improve _init* usage for acerhdf and fix some usage issues with
   messages and module parameters.

 - Simplify asus-wmi by calling ACPI/WMI methods directly, eliminating
   workqueue overhead, eliminate double reporting of keyboard backlight.

 - Fix wake from USB failure on Bay Trail devices (intel_int0002_vgpio).

 - Notify intel_telemetry users when IPC1 device is not enabled.

 - Update various drivers with new laptop model IDs.

 - Update several intel drivers to use SPDX identifers and order headers
   alphabetically.

* tag 'platform-drivers-x86-v4.20-1' of git://git.infradead.org/linux-platform-drivers-x86: (64 commits)
  HID: asus: only support backlight when it's not driven by WMI
  platform/x86: asus-wmi: export function for evaluating WMI methods
  platform/x86: asus-wmi: Only notify kbd LED hw_change by fn-key pressed
  platform/x86: wmi: declare device_type structure as constant
  platform/x86: ideapad: Add Y530-15ICH to no_hw_rfkill
  platform/x86: Add Intel AtomISP2 dummy / power-management driver
  platform/x86: touchscreen_dmi: Add min-x and min-y settings for various models
  platform/x86: touchscreen_dmi: Add info for the Onda V80 Plus v3 tablet
  platform/x86: touchscreen_dmi: Add info for the Trekstor Primetab T13B tablet
  platform/x86: intel_telemetry: Get rid of custom macro
  platform/x86: intel_telemetry: report debugfs failure
  MAINTAINERS: intel_telemetry: Update maintainers info
  platform/x86: Add LG Gram laptop special features driver
  platform/x86: asus-wmi: Simplify the keyboard brightness updating process
  platform/x86: touchscreen_dmi: Add info for the Trekstor Primebook C11 convertible
  platform/x86: mlx-platform: Properly use mlxplat_mlxcpld_msn201x_items
  MAINTAINERS: intel_pmc_core: Update MAINTAINERS
  firmware: dcdbas: include linux/io.h
  platform/x86: intel-wmi-thunderbolt: Add dynamic debugging
  platform/x86: intel-wmi-thunderbolt: Convert to use SPDX identifier
  ...
2018-11-01 08:42:21 -07:00
Dmitry Safonov
98f76206b3 compat: Cleanup in_compat_syscall() callers
Now that in_compat_syscall() is consistent on all architectures and does
not longer report true on native i686, the workarounds (ifdeffery and
helpers) can be removed.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-efi@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: https://lkml.kernel.org/r/20181012134253.23266-3-dima@arista.com
2018-11-01 13:02:21 +01:00
Michael Kelley
57f01796f1 irq/matrix: Fix memory overallocation
IRQ_MATRIX_SIZE is the number of longs needed for a bitmap, multiplied by
the size of a long, yielding a byte count. But it is used to size an array
of longs, which is way more memory than is needed.

Change IRQ_MATRIX_SIZE so it is just the number of longs needed and the
arrays come out the correct size.

Fixes: 2f75d9e1c9 ("genirq: Implement bitmap matrix allocator")
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: KY Srinivasan <kys@microsoft.com>
Link: https://lkml.kernel.org/r/1541032428-10392-1-git-send-email-mikelley@microsoft.com
2018-11-01 10:00:38 +01:00
Daniel Borkmann
4d31f30148 bpf: don't set id on after map lookup with ptr_to_map_val return
In the verifier there is no such semantics where registers with
PTR_TO_MAP_VALUE type have an id assigned to them. This is only
used in PTR_TO_MAP_VALUE_OR_NULL and later on nullified once the
test against NULL has been pattern matched and type transformed
into PTR_TO_MAP_VALUE.

Fixes: 3e6a4b3e02 ("bpf/verifier: introduce BPF_PTR_TO_MAP_VALUE")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Roman Gushchin <guro@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-31 16:53:17 -07:00
Daniel Borkmann
0962590e55 bpf: fix partial copy of map_ptr when dst is scalar
ALU operations on pointers such as scalar_reg += map_value_ptr are
handled in adjust_ptr_min_max_vals(). Problem is however that map_ptr
and range in the register state share a union, so transferring state
through dst_reg->range = ptr_reg->range is just buggy as any new
map_ptr in the dst_reg is then truncated (or null) for subsequent
checks. Fix this by adding a raw member and use it for copying state
over to dst_reg.

Fixes: f1174f77b5 ("bpf/verifier: rework value tracking")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-31 16:53:17 -07:00
Mike Rapoport
7e1c4e2792 memblock: stop using implicit alignment to SMP_CACHE_BYTES
When a memblock allocation APIs are called with align = 0, the alignment
is implicitly set to SMP_CACHE_BYTES.

Implicit alignment is done deep in the memblock allocator and it can
come as a surprise.  Not that such an alignment would be wrong even
when used incorrectly but it is better to be explicit for the sake of
clarity and the prinicple of the least surprise.

Replace all such uses of memblock APIs with the 'align' parameter
explicitly set to SMP_CACHE_BYTES and stop implicit alignment assignment
in the memblock internal allocation functions.

For the case when memblock APIs are used via helper functions, e.g.  like
iommu_arena_new_node() in Alpha, the helper functions were detected with
Coccinelle's help and then manually examined and updated where
appropriate.

The direct memblock APIs users were updated using the semantic patch below:

@@
expression size, min_addr, max_addr, nid;
@@
(
|
- memblock_alloc_try_nid_raw(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid_raw(size, SMP_CACHE_BYTES, min_addr, max_addr,
nid)
|
- memblock_alloc_try_nid_nopanic(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid_nopanic(size, SMP_CACHE_BYTES, min_addr, max_addr,
nid)
|
- memblock_alloc_try_nid(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid(size, SMP_CACHE_BYTES, min_addr, max_addr, nid)
|
- memblock_alloc(size, 0)
+ memblock_alloc(size, SMP_CACHE_BYTES)
|
- memblock_alloc_raw(size, 0)
+ memblock_alloc_raw(size, SMP_CACHE_BYTES)
|
- memblock_alloc_from(size, 0, min_addr)
+ memblock_alloc_from(size, SMP_CACHE_BYTES, min_addr)
|
- memblock_alloc_nopanic(size, 0)
+ memblock_alloc_nopanic(size, SMP_CACHE_BYTES)
|
- memblock_alloc_low(size, 0)
+ memblock_alloc_low(size, SMP_CACHE_BYTES)
|
- memblock_alloc_low_nopanic(size, 0)
+ memblock_alloc_low_nopanic(size, SMP_CACHE_BYTES)
|
- memblock_alloc_from_nopanic(size, 0, min_addr)
+ memblock_alloc_from_nopanic(size, SMP_CACHE_BYTES, min_addr)
|
- memblock_alloc_node(size, 0, nid)
+ memblock_alloc_node(size, SMP_CACHE_BYTES, nid)
)

[mhocko@suse.com: changelog update]
[akpm@linux-foundation.org: coding-style fixes]
[rppt@linux.ibm.com: fix missed uses of implicit alignment]
  Link: http://lkml.kernel.org/r/20181016133656.GA10925@rapoport-lnx
Link: http://lkml.kernel.org/r/1538687224-17535-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Paul Burton <paul.burton@mips.com>	[MIPS]
Acked-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:16 -07:00
Mike Rapoport
57c8a661d9 mm: remove include/linux/bootmem.h
Move remaining definitions and declarations from include/linux/bootmem.h
into include/linux/memblock.h and remove the redundant header.

The includes were replaced with the semantic patch below and then
semi-automated removal of duplicated '#include <linux/memblock.h>

@@
@@
- #include <linux/bootmem.h>
+ #include <linux/memblock.h>

[sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
  Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
[sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
  Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
[sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
  Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:16 -07:00
Mike Rapoport
eb31d559f1 memblock: remove _virt from APIs returning virtual address
The conversion is done using

sed -i 's@memblock_virt_alloc@memblock_alloc@g' \
	$(git grep -l memblock_virt_alloc)

Link: http://lkml.kernel.org/r/1536927045-23536-8-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:15 -07:00
Arnd Bergmann
6a32c2469c kbuild: fix kernel/bounds.c 'W=1' warning
Building any configuration with 'make W=1' produces a warning:

kernel/bounds.c:16:6: warning: no previous prototype for 'foo' [-Wmissing-prototypes]

When also passing -Werror, this prevents us from building any other files.
Nobody ever calls the function, but we can't make it 'static' either
since we want the compiler output.

Calling it 'main' instead however avoids the warning, because gcc
does not insist on having a declaration for main.

Link: http://lkml.kernel.org/r/20181005083313.2088252-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:14 -07:00
Borislav Petkov
b49dec1cf8 kernel/panic.c: filter out a potential trailing newline
If a call to panic() terminates the string with a \n , the result puts the
closing brace ']---' on a newline because panic() itself adds \n too.

Now, if one goes and removes the newline chars from all panic()
invocations - and the stats right now look like this:

~300 calls with a \n
~500 calls without a \n

one is destined to a neverending game of whack-a-mole because the usual
thing to do is add a newline at the end of a string a function is supposed
to print.

Therefore, simply zap any \n at the end of the panic string to avoid
touching so many places in the kernel.

Link: http://lkml.kernel.org/r/20181009205019.2786-1-bp@alien8.de
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:14 -07:00
Borislav Petkov
95c4fb78fb kernel/panic.c: do not append newline to the stack protector panic string
... because panic() itself already does this. Otherwise you have
line-broken trailer:

  [    1.836965] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: pgd_alloc+0x29e/0x2a0
  [    1.836965]  ]---

Link: http://lkml.kernel.org/r/20181008202901.7894-1-bp@alien8.de
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:14 -07:00
Weikang Shi
2e58f57d85 kernel/signal.c: fix a comment error
Because get_signal_to_deliver() was renamed to get_signal() the
comment should be fixed.

Link: http://lkml.kernel.org/r/1539179128-45709-1-git-send-email-swkhack@gmail.com
Signed-off-by: Weikang Shi <swkhack@gmail.com>
Reported-by: Christian Brauner <christian@brauner.io>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:14 -07:00
zhong jiang
ea6f650465 kernel/fail_function.c: remove meaningless null pointer check before debugfs_remove_recursive
debugfs_remove_recursive() has taken the null pointer into account.  just
remove the null check before debugfs_remove_recursive().

Link: http://lkml.kernel.org/r/1537494404-16473-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:12 -07:00
Linus Torvalds
343a9f3540 The biggest change here is the updates to kprobes
Back in January I posted patches to create function based events. These were
 the events that you suggested I make to allow developers to easily create
 events in code where no trace event exists. After posting those changes for
 review, it was suggested that we implement this instead with kprobes.
 
 The problem with kprobes is that the interface is too complex and needs to
 be simplified. Masami Hiramatsu posted patches in March and I've been
 playing with them a bit. There's been a bit of clean up in the kprobe code
 that was inspired by the function based event patches, and a couple of
 enhancements to the kprobe event interface.
 
  - If the arch supports it (we added support for x86), you can place a
    kprobe event at the start of a function and use $arg1, $arg2, etc
    to reference the arguments of a function. (Before you needed to know
    what register or where on the stack the argument was).
 
  - The second is a way to see array of events. For example, if you reference
    a mac address, you can add:
 
    echo 'p:mac ip_rcv perm_addr=+574($arg2):x8[6]' > kprobe_events
 
    And this will produce:
 
    mac: (ip_rcv+0x0/0x140) perm_addr={0x52,0x54,0x0,0xc0,0x76,0xec}
 
 Other changes include
 
  - Exporting trace_dump_stack to modules
 
  - Have the stack tracer trace the entire stack (stop trying to remove
    tracing itself, as we keep removing too much).
 
  - Added support for SDT in uprobes
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCW9hdjxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qmtbAP9GS/o2WSvsYLSIw4+mF94eCL06lUxp
 rRrktkEofm/PagEAl2JNmvHrAJN+LIrajqXTbwlZ7Ckk1rZhCW41Am7qnQs=
 =sTUM
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "The biggest change here is the updates to kprobes

  Back in January I posted patches to create function based events.
  These were the events that you suggested I make to allow developers to
  easily create events in code where no trace event exists. After
  posting those changes for review, it was suggested that we implement
  this instead with kprobes.

  The problem with kprobes is that the interface is too complex and
  needs to be simplified. Masami Hiramatsu posted patches in March and
  I've been playing with them a bit. There's been a bit of clean up in
  the kprobe code that was inspired by the function based event patches,
  and a couple of enhancements to the kprobe event interface.

   - If the arch supports it (we added support for x86), you can place a
     kprobe event at the start of a function and use $arg1, $arg2, etc
     to reference the arguments of a function. (Before you needed to
     know what register or where on the stack the argument was).

   - The second is a way to see array of events. For example, if you
     reference a mac address, you can add:

	echo 'p:mac ip_rcv perm_addr=+574($arg2):x8[6]' > kprobe_events

     And this will produce:

	mac: (ip_rcv+0x0/0x140) perm_addr={0x52,0x54,0x0,0xc0,0x76,0xec}

  Other changes include

   - Exporting trace_dump_stack to modules

   - Have the stack tracer trace the entire stack (stop trying to remove
     tracing itself, as we keep removing too much).

   - Added support for SDT in uprobes"

[ SDT - "Statically Defined Tracing" are userspace markers for tracing.
  Let's not use random TLA's in explanations unless they are fairly
  well-established as generic (at least for kernel people) - Linus ]

* tag 'trace-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (24 commits)
  tracing: Have stack tracer trace full stack
  tracing: Export trace_dump_stack to modules
  tracing: probeevent: Fix uninitialized used of offset in parse args
  tracing/kprobes: Allow kprobe-events to record module symbol
  tracing/kprobes: Check the probe on unloaded module correctly
  tracing/uprobes: Fix to return -EFAULT if copy_from_user failed
  tracing: probeevent: Add $argN for accessing function args
  x86: ptrace: Add function argument access API
  tracing: probeevent: Add array type support
  tracing: probeevent: Add symbol type
  tracing: probeevent: Unify fetch_insn processing common part
  tracing: probeevent: Append traceprobe_ for exported function
  tracing: probeevent: Return consumed bytes of dynamic area
  tracing: probeevent: Unify fetch type tables
  tracing: probeevent: Introduce new argument fetching code
  tracing: probeevent: Remove NOKPROBE_SYMBOL from print functions
  tracing: probeevent: Cleanup argument field definition
  tracing: probeevent: Cleanup print argument functions
  trace_uprobe: support reference counter in fd-based uprobe
  perf probe: Support SDT markers having reference counter (semaphore)
  ...
2018-10-30 09:49:56 -07:00
Linus Torvalds
f4267b3604 Masami had a couple more fixes to the synthetic events. One was a proper
error return value, and the other is for the self tests.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCW9hYsBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qiWPAQCARhYpeiWNHYGAirpI1vxaNHzutZvP
 j6eaqRXu0JDfsQD7Bb19KxSHUEaPSpZIo68N5OJkoTv9UxD2KYuRysu4VQ8=
 =y1Gc
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.19-rc8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Masami had a couple more fixes to the synthetic events. One was a
  proper error return value, and the other is for the self tests"

* tag 'trace-v4.19-rc8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  selftests/ftrace: Fix synthetic event test to delete event correctly
  tracing: Return -ENOENT if there is no target synthetic event
2018-10-30 09:47:28 -07:00
Linus Torvalds
6ef746769e More power management updates for 4.20-rc1
- Fix build regression in the intel_pstate driver that doesn't
    build without CONFIG_ACPI after recent changes (Dominik Brodowski).
 
  - One of the heuristics in the menu cpuidle governor is based on a
    function returning 0 most of the time, so drop it and clean up
    the scheduler code related to it (Daniel Lezcano).
 
  - Prevent the arm_big_little cpufreq driver from being used on ARM64
    which is not suitable for it and drop the arm_big_little_dt driver
    that is not used any more (Sudeep Holla).
 
  - Prevent the hung task watchdog from triggering during resume from
    system-wide sleep states by disabling it before freezing tasks and
    enabling it again after they have been thawed (Vitaly Kuznetsov).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJb2BJ7AAoJEILEb/54YlRx/kwP/iD7tUUZ6mT84OI0FTbEj8A/
 fM+uHrwy25PmqyWGGtbHpaWU9OxVxUReSicsBCt+2LZmX3sFYpbSb243mv3pmxqb
 A0kLflG4lWCKJNIfa/a3OMDTUw26mxSTCidE3jJXkd8HkWrzeAWvMair+UCuzMf3
 A4Omu0IkNL8C0MKtUOb3PlUk3dnLYMxuairNhozBPhi+P+0tLW9/9XvgPJBVhnbZ
 CKn/aFsDoc08tAfxC8N32cgKwE7nbeIgTJTBFyu2lQmInsd4TTuoM50vSC5i+x88
 AmBOoH9IX0fhXJ6hgm+VMW8+x9S+H7jAVy/3C2xoUBeCclzlxX6eUCtjV5YNZqqn
 1nXQfGeAwgzX6Tyu6HjM7vjbfObk59ZwpmDRPJEUEhLDEBMS+iDStlp9zmKTedNm
 G4iSTzS6qJCNPtx4y5wkLp/FvzTofIuWqVFJSJC4+EoVKkbbw9xwaY+JKXUt1Uwx
 j+U6EtRhzL/kVX0nq+iQXXeANxCFNzI56Ov5O7mxjF1m/hDE/Gb2QEeIb6nRZC2A
 H3I2so2J3h1yTgadpGFFvJWaqfHkgcBTsm06tSgHVb86quiTANJIQ9mqfFyOzDDJ
 KaZ82MROt7UuCMI6X9n+oIBDZWLHmADge6RdHCD1wB+zrUmusCtNEHUZACXd0mPf
 s8MUK4bWVhViVXGS5bMP
 =/bnR
 -----END PGP SIGNATURE-----

Merge tag 'pm-4.20-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "These remove a questionable heuristic from the menu cpuidle governor,
  fix a recent build regression in the intel_pstate driver, clean up ARM
  big-Little support in cpufreq and fix up hung task watchdog's
  interaction with system-wide power management transitions.

  Specifics:

   - Fix build regression in the intel_pstate driver that doesn't build
     without CONFIG_ACPI after recent changes (Dominik Brodowski).

   - One of the heuristics in the menu cpuidle governor is based on a
     function returning 0 most of the time, so drop it and clean up the
     scheduler code related to it (Daniel Lezcano).

   - Prevent the arm_big_little cpufreq driver from being used on ARM64
     which is not suitable for it and drop the arm_big_little_dt driver
     that is not used any more (Sudeep Holla).

   - Prevent the hung task watchdog from triggering during resume from
     system-wide sleep states by disabling it before freezing tasks and
     enabling it again after they have been thawed (Vitaly Kuznetsov)"

* tag 'pm-4.20-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  kernel: hung_task.c: disable on suspend
  cpufreq: remove unused arm_big_little_dt driver
  cpufreq: drop ARM_BIG_LITTLE_CPUFREQ support for ARM64
  cpufreq: intel_pstate: Fix compilation for !CONFIG_ACPI
  cpuidle: menu: Remove get_loadavg() from the performance multiplier
  sched: Factor out nr_iowait and nr_iowait_cpu
2018-10-30 09:08:07 -07:00
Colin Ian King
28fa741c27 perf/core: Clean up inconsisent indentation
Replace a bunch of spaces with tab, cleans up indentation

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-janitors@vger.kernel.org
Link: http://lkml.kernel.org/r/20181029233211.21475-1-colin.king@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-30 09:51:58 +01:00
Rafael J. Wysocki
c4ac688993 Merge branches 'pm-cpuidle' and 'pm-cpufreq'
* pm-cpuidle:
  cpuidle: menu: Remove get_loadavg() from the performance multiplier
  sched: Factor out nr_iowait and nr_iowait_cpu

* pm-cpufreq:
  cpufreq: remove unused arm_big_little_dt driver
  cpufreq: drop ARM_BIG_LITTLE_CPUFREQ support for ARM64
  cpufreq: intel_pstate: Fix compilation for !CONFIG_ACPI
2018-10-30 08:47:14 +01:00
Muchun Song
a68d75081a sched/rt: Update comment in pick_next_task_rt()
Commit:

  f4ebcbc0d7 ("sched/rt: Substract number of tasks of throttled queues from rq->nr_running")

added a new rt_rq->rt_queued field, which is used to indicate the status of
rq->rt enqueue or dequeue. So, the ->rt_nr_running check was removed and we
now check ->rt_queued instead.

Fix the comment in pick_next_task_rt() as well, which was still referencing
the old logic.

Signed-off-by: Muchun Song <smuchun@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181027030517.23292-1-smuchun@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-29 07:18:04 +01:00