Gratian managed to trigger the BUG_ON(!newowner) in fixup_pi_state_owner().
This is one possible chain of events leading to this:
Task Prio Operation
T1 120 lock(F)
T2 120 lock(F) -> blocks (top waiter)
T3 50 (RT) lock(F) -> boosts T1 and blocks (new top waiter)
XX timeout/ -> wakes T2
signal
T1 50 unlock(F) -> wakes T3 (rtmutex->owner == NULL, waiter bit is set)
T2 120 cleanup -> try_to_take_mutex() fails because T3 is the top waiter
and the lower priority T2 cannot steal the lock.
-> fixup_pi_state_owner() sees newowner == NULL -> BUG_ON()
The comment states that this is invalid and rt_mutex_real_owner() must
return a non NULL owner when the trylock failed, but in case of a queued
and woken up waiter rt_mutex_real_owner() == NULL is a valid transient
state. The higher priority waiter has simply not yet managed to take over
the rtmutex.
The BUG_ON() is therefore wrong and this is just another retry condition in
fixup_pi_state_owner().
Drop the locks, so that T3 can make progress, and then try the fixup again.
Gratian provided a great analysis, traces and a reproducer. The analysis is
to the point, but it confused the hell out of that tglx dude who had to
page in all the futex horrors again. Condensed version is above.
[ tglx: Wrote comment and changelog ]
Fixes: c1e2f0eaf0 ("futex: Avoid violating the 10th rule of futex")
Reported-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87a6w6x7bb.fsf@ni.com
Link: https://lore.kernel.org/r/87sg9pkvf7.fsf@nanos.tec.linutronix.de
Conflicts:
include/asm-generic/atomic-instrumented.h
kernel/kprobes.c
Use the upstream atomic-instrumented.h checksum, and pick
the kprobes version of kernel/kprobes.c, which effectively
reverts this upstream workaround:
645f224e7b: ("kprobes: Tell lockdep about kprobe nesting")
Since the new code *should* be fine without nesting.
Knock on wood ...
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As shown through runtime testing, the "filename" allocation is not
always freed in perf_event_parse_addr_filter().
There are three possible ways that this could happen:
- It could be allocated twice on subsequent iterations through the loop,
- or leaked on the success path,
- or on the failure path.
Clean up the code flow to make it obvious that 'filename' is always
freed in the reallocation path and in the two return paths as well.
We rely on the fact that kfree(NULL) is NOP and filename is initialized
with NULL.
This fixes the leak. No other side effects expected.
[ Dan Carpenter: cleaned up the code flow & added a changelog. ]
[ Ingo Molnar: updated the changelog some more. ]
Fixes: 375637bc52 ("perf/core: Introduce address range filtering")
Signed-off-by: "kiyin(尹亮)" <kiyin@tencent.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "Srivatsa S. Bhat" <srivatsa@csail.mit.edu>
Cc: Anthony Liguori <aliguori@amazon.com>
--
kernel/events/core.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
Alexei Starovoitov says:
====================
pull-request: bpf 2020-11-06
1) Pre-allocated per-cpu hashmap needs to zero-fill reused element, from David.
2) Tighten bpf_lsm function check, from KP.
3) Fix bpftool attaching to flow dissector, from Lorenz.
4) Use -fno-gcse for the whole kernel/bpf/core.c instead of function attribute, from Ard.
* git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Update verification logic for LSM programs
bpf: Zero-fill re-used per-cpu map element
bpf: BPF_PRELOAD depends on BPF_SYSCALL
tools/bpftool: Fix attaching flow dissector
libbpf: Fix possible use after free in xsk_socket__delete
libbpf: Fix null dereference in xsk_socket__delete
libbpf, hashmap: Fix undefined behavior in hash_bits
bpf: Don't rely on GCC __attribute__((optimize)) to disable GCSE
tools, bpftool: Remove two unused variables.
tools, bpftool: Avoid array index warnings.
xsk: Fix possible memory leak at socket close
bpf: Add struct bpf_redir_neigh forward declaration to BPF helper defs
samples/bpf: Set rlimit for memlock to infinity in all samples
bpf: Fix -Wshadow warnings
selftest/bpf: Fix profiler test using CO-RE relocation for enums
====================
Link: https://lore.kernel.org/r/20201106221759.24143-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The watchpoint encoding masks for size and address were off-by-one bit
each, with the size mask using 1 unnecessary bit and the address mask
missing 1 bit. However, due to the way the size is shifted into the
encoded watchpoint, we were effectively wasting and never using the
extra bit.
For example, on x86 with PAGE_SIZE==4K, we have 1 bit for the is-write
bit, 14 bits for the size bits, and then 49 bits left for the address.
Prior to this fix we would end up with this usage:
[ write<1> | size<14> | wasted<1> | address<48> ]
Fix it by subtracting 1 bit from the GENMASK() end and start ranges of
size and address respectively. The added static_assert()s verify that
the masks are as expected. With the fixed version, we get the expected
usage:
[ write<1> | size<14> | address<49> ]
Functionally no change is expected, since that extra address bit is
insignificant for enabled architectures.
Acked-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Currently, the units of ->init_fract are milliseconds while those of
->gp_sleep are jiffies. For consistency with each other and with the
argument of schedule_timeout_idle(), this commit changes the units of
->init_fract to jiffies.
This change does affect the backoff algorithm, but only on systems where
HZ is not 1000, and even there the change makes more sense, given that the
current setup would "back off" to the same number of jiffies repeatedly.
In contrast, with this change, the number of jiffies waited increases
on each pass through the loop in the rcu_tasks_wait_gp() function.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
If cur_ops->sync is NULL, rcu_torture_fwd_prog_nr() will nevertheless
attempt to call through it. This commit therefore flags cases where
neither need_resched() nor call_rcu() forward-progress testing
can be performed due to NULL function pointers, and also causes
rcu_torture_fwd_prog_nr() to take an early exit if cur_ops->sync()
is NULL.
Reported-by: Tom Rix <trix@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
When executing the LOCK06 locktorture scenario featuring percpu-rwsem,
the RCU callback rcu_sync_func() may still be pending after locktorture
module is removed. This can in turn lead to the following Oops:
BUG: unable to handle page fault for address: ffffffffc00eb920
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 6500a067 P4D 6500a067 PUD 6500c067 PMD 13a36c067 PTE 800000013691c163
Oops: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.9.0-rc5+ #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
RIP: 0010:rcu_cblist_dequeue+0x12/0x30
Call Trace:
<IRQ>
rcu_core+0x1b1/0x860
__do_softirq+0xfe/0x326
asm_call_on_stack+0x12/0x20
</IRQ>
do_softirq_own_stack+0x5f/0x80
irq_exit_rcu+0xaf/0xc0
sysvec_apic_timer_interrupt+0x2e/0xb0
asm_sysvec_apic_timer_interrupt+0x12/0x20
This commit avoids tis problem by adding an exit hook in lock_torture_ops
and using it to call percpu_free_rwsem() for percpu rwsem torture during
the module-cleanup function, thus ensuring that rcu_sync_func() completes
before module exits.
It is also necessary to call the exit hook if lock_torture_init()
fails half-way, so this commit also adds an ->init_called field in
lock_torture_cxt to indicate that exit hook, if present, must be called.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
In virtual environments on systems with hardware assist, inter-processor
interrupts must do very different things based on whether the target
vCPU is running or not. This commit therefore enables torture-test
stuttering to better test these running/not-running transitions.
Suggested-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The rcu_torture_cleanup() function fails to NULL out the reader_tasks
pointer after freeing it and its fakewriter_tasks loop has redundant
braces. This commit therefore cleans these up.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Currently, stutter_wait() will happily spin waiting for the stutter
interval to end even if the caller is running at a real-time priority
level. This could starve normal-priority tasks for no good reason. This
commit therefore drops the calling task's priority to SCHED_OTHER MAX_NICE
if stutter_wait() needs to wait. But when it waits, stutter_wait()
returns true, which allows the caller to restore the priority if needed.
Callers that were already running at SCHED_OTHER MAX_NICE obviously
do not need any changes, but this commit also restores priority for
higher-priority callers.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
If an rcutorture torture-test run is given a bad kvm.sh argument, the
test will complain to the console, which is good. What is bad is that
from the user's perspective, it will just hang for the time specified
by the --duration argument. This commit therefore forces an immediate
kernel shutdown if a rcu_torture_init()-time error occurs, thus avoiding
the appearance of a hang. It also forces a console splat in this case
to clearly indicate the presence of an error.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
If an locktorture torture-test run is given a bad kvm.sh argument, the
test will complain to the console, which is good. What is bad is that
from the user's perspective, it will just hang for the time specified
by the --duration argument. This commit therefore forces an immediate
kernel shutdown if a lock_torture_init()-time error occurs, thus avoiding
the appearance of a hang. It also forces a console splat in this case
to clearly indicate the presence of an error.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Exclusive locks do not have readlock support, which means that a
locktorture run with the following module parameters will do nothing:
torture_type=mutex_lock nwriters_stress=0 nreaders_stress=1
This commit therefore rejects this combination for exclusive locks by
returning -EINVAL during module init.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
If an refscale torture-test run is given a bad kvm.sh argument, the
test will complain to the console, which is good. What is bad is that
from the user's perspective, it will just hang for the time specified
by the --duration argument. This commit therefore forces an immediate
kernel shutdown if a ref_scale_init()-time error occurs, thus avoiding
the appearance of a hang. It also forces a console splat in this case
to clearly indicate the presence of an error.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
If an rcuscale torture-test run is given a bad kvm.sh argument, the
test will complain to the console, which is good. What is bad is that
from the user's perspective, it will just hang for the time specified
by the --duration argument. This commit therefore forces an immediate
kernel shutdown if a rcu_scale_init()-time error occurs, thus avoiding
the appearance of a hang. It also forces a console splat in this case
to clearly indicate the presence of an error.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit adds the ability to test performance and scalability of RCU
Tasks Trace updaters.
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The scftorture tests currently use only smp_call_function() and
friends, which means that these tests cannot locate bugs caused by
interactions between different IPI vectors. This commit therefore adds
the rescheduling IPI to the mix.
Note that this commit permits resched_cpus() only when scftorture is
built in. This is a workaround. Longer term, this will use real wakeups
rather than resched_cpu().
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The torture_stutter() function uses schedule_timeout_interruptible()
to time the stutter duration, but this can miss race conditions due to
its being time-synchronized with everything else that is based on the
timer wheels. This commit therefore converts torture_stutter() to use
the high-resolution timers via schedule_hrtimeout(), and also to fuzz
the stutter interval. While in the area, this commit also limits the
spin-loop portion of the stutter_wait() function's wait loop to two
jiffies, down from about one second.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Running locktorture scenario LOCK05 results in hangs:
tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --torture lock --duration 3 --configs LOCK05
The lock_torture_writer() kthreads set themselves to MAX_NICE while
running SCHED_OTHER. Other locktorture kthreads run at default niceness,
also SCHED_OTHER. This results in these other locktorture kthreads
indefinitely preempting the lock_torture_writer() kthreads. Note that
the cond_resched() in the stutter_wait() function's loop is ineffective
because this scenario is built with CONFIG_PREEMPT=y.
It is not clear that such indefinite preemption is supposed to happen, but
in the meantime this commit prevents kthreads running in stutter_wait()
from being completely CPU-bound, thus allowing the other threads to get
some CPU in a timely fashion. This commit also uses hrtimers to provide
very short sleeps to avoid degrading the sudden-on testing that stutter
is supposed to provide.
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit adds a last_lock_release variable that tracks the time of
the last ->writeunlock() call, which allows easier diagnosing of lock
hangs when using a kernel debugger.
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Currently, accessing /proc/cpuinfo sends IPIs to idle CPUs in order to
learn their clock frequency. Which is a bit strange, given that waking
them from idle likely significantly changes their clock frequency.
This commit therefore avoids sending /proc/cpuinfo-induced IPIs to
idle CPUs.
[ paulmck: Also check for idle in arch_freq_prepare_all(). ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <x86@kernel.org>
The current logic checks if the name of the BTF type passed in
attach_btf_id starts with "bpf_lsm_", this is not sufficient as it also
allows attachment to non-LSM hooks like the very function that performs
this check, i.e. bpf_lsm_verify_prog.
In order to ensure that this verification logic allows attachment to
only LSM hooks, the LSM_HOOK definitions in lsm_hook_defs.h are used to
generate a BTF_ID set. Upon verification, the attach_btf_id of the
program being attached is checked for presence in this set.
Fixes: 9e4e01dfd3 ("bpf: lsm: Implement attach, detach and execution")
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201105230651.2621917-1-kpsingh@chromium.org
The currently available bpf_get_current_task returns an unsigned integer
which can be used along with BPF_CORE_READ to read data from
the task_struct but still cannot be used as an input argument to a
helper that accepts an ARG_PTR_TO_BTF_ID of type task_struct.
In order to implement this helper a new return type, RET_PTR_TO_BTF_ID,
is added. This is similar to RET_PTR_TO_BTF_ID_OR_NULL but does not
require checking the nullness of returned pointer.
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201106103747.2780972-6-kpsingh@chromium.org
Similar to bpf_local_storage for sockets and inodes add local storage
for task_struct.
The life-cycle of storage is managed with the life-cycle of the
task_struct. i.e. the storage is destroyed along with the owning task
with a callback to the bpf_task_storage_free from the task_free LSM
hook.
The BPF LSM allocates an __rcu pointer to the bpf_local_storage in
the security blob which are now stackable and can co-exist with other
LSMs.
The userspace map operations can be done by using a pid fd as a key
passed to the lookup, update and delete operations.
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201106103747.2780972-3-kpsingh@chromium.org
Currently, if a callback is registered to a ftrace function and its
ftrace_ops does not have the RECURSION flag set, it is encapsulated in a
helper function that does the recursion for it.
Really, all the callbacks should have their own recursion protection for
performance reasons. But they should not all implement their own. Move the
recursion helpers to global headers, so that all callbacks can use them.
Link: https://lkml.kernel.org/r/20201028115612.460535535@goodmis.org
Link: https://lkml.kernel.org/r/20201106023546.166456258@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
make clang-analyzer on x86_64 defconfig caught my attention with:
kernel/printk/printk_ringbuffer.c:885:3: warning:
Value stored to 'desc' is never read [clang-analyzer-deadcode.DeadStores]
desc = to_desc(desc_ring, head_id);
^
Commit b6cf8b3f33 ("printk: add lockless ringbuffer") introduced
desc_reserve() with this unneeded dead-store assignment.
As discussed with John Ogness privately, this is probably just some minor
left-over from previous iterations of the ringbuffer implementation. So,
simply remove this unneeded dead assignment to make clang-analyzer happy.
As compilers will detect this unneeded assignment and optimize this anyway,
the resulting object code is identical before and after this change.
No functional change. No change to object code.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201106034005.18822-1-lukas.bulwahn@gmail.com
Currently key_size of hashtab is limited to MAX_BPF_STACK.
As the key of hashtab can also be a value from a per cpu map it can be
larger than MAX_BPF_STACK.
The use-case for this patch originates to implement allow/disallow
lists for files and file paths. The maximum length of file paths is
defined by PATH_MAX with 4096 chars including nul.
This limit exceeds MAX_BPF_STACK.
Changelog:
v5:
- Fix cast overflow
v4:
- Utilize BPF skeleton in tests
- Rebase
v3:
- Rebase
v2:
- Add a test for bpf side
Signed-off-by: Florian Lehner <dev@der-flo.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201029201442.596690-1-dev@der-flo.net
Zero-fill element values for all other cpus than current, just as
when not using prealloc. This is the only way the bpf program can
ensure known initial values for all cpus ('onallcpus' cannot be
set when coming from the bpf program).
The scenario is: bpf program inserts some elements in a per-cpu
map, then deletes some (or userspace does). When later adding
new elements using bpf_map_update_elem(), the bpf program can
only set the value of the new elements for the current cpu.
When prealloc is enabled, previously deleted elements are re-used.
Without the fix, values for other cpus remain whatever they were
when the re-used entry was previously freed.
A selftest is added to validate correct operation in above
scenario as well as in case of LRU per-cpu map element re-use.
Fixes: 6c90598174 ("bpf: pre-allocate hash map elements")
Signed-off-by: David Verbeiren <david.verbeiren@tessares.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201104112332.15191-1-david.verbeiren@tessares.net
Pull tracing fixes from Steven Rostedt:
- Fix off-by-one error in retrieving the context buffer for
trace_printk()
- Fix off-by-one error in stack nesting limit
- Fix recursion to not make all NMI code false positive as recursing
- Stop losing events in function tracing when transitioning between irq
context
- Stop losing events in ring buffer when transitioning between irq
context
- Fix return code of error pointer in parse_synth_field() to prevent
NULL pointer dereference.
- Fix false positive of NMI recursion in kprobe event handling
* tag 'trace-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
kprobes: Tell lockdep about kprobe nesting
tracing: Make -ENOMEM the default error for parse_synth_field()
ring-buffer: Fix recursion protection transitions between interrupt context
tracing: Fix the checking of stackidx in __ftrace_trace_stack
ftrace: Handle tracing when switching between context
ftrace: Fix recursion check for NMI test
tracing: Fix out of bounds write in get_trace_buf
Pull power management fixes from Rafael Wysocki:
"These fix the device links support in runtime PM, correct mistakes in
the cpuidle documentation, fix the handling of policy limits changes
in the schedutil cpufreq governor, fix assorted issues in the OPP
(operating performance points) framework and make one janitorial
change.
Specifics:
- Unify the handling of managed and stateless device links in the
runtime PM framework and prevent runtime PM references to devices
from being leaked after device link removal (Rafael Wysocki).
- Fix two mistakes in the cpuidle documentation (Julia Lawall).
- Prevent the schedutil cpufreq governor from missing policy limits
updates in some cases (Viresh Kumar).
- Prevent static OPPs from being dropped by mistake (Viresh Kumar).
- Prevent helper function in the OPP framework from returning
prematurely (Viresh Kumar).
- Prevent opp_table_lock from being held too long during removal of
OPP tables with no more active references (Viresh Kumar).
- Drop redundant semicolon from the Intel RAPL power capping driver
(Tom Rix)"
* tag 'pm-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: runtime: Resume the device earlier in __device_release_driver()
PM: runtime: Drop pm_runtime_clean_up_links()
PM: runtime: Drop runtime PM references to supplier on link removal
powercap/intel_rapl: remove unneeded semicolon
Documentation: PM: cpuidle: correct path name
Documentation: PM: cpuidle: correct typo
cpufreq: schedutil: Don't skip freq update if need_freq_update is set
opp: Reduce the size of critical section in _opp_table_kref_release()
opp: Fix early exit from dev_pm_opp_register_set_opp_helper()
opp: Don't always remove static OPPs in _of_add_opp_table_v1()
Lockdep state handling on NMI enter and exit is nothing specific to X86. It's
not any different on other architectures. Also the extra state type is not
necessary, irqentry_state_t can carry the necessary information as well.
Move it to common code and extend irqentry_state_t to carry lockdep state.
[ Ira: Make exit_rcu and lockdep a union as they are mutually exclusive
between the IRQ and NMI exceptions, and add kernel documentation for
struct irqentry_state_t ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201102205320.1458656-7-ira.weiny@intel.com
When an exception/interrupt hits kernel space and the kernel is not
currently in the idle task then RCU must be watching.
irqentry_enter() validates this via rcu_irq_enter_check_tick(), which in
turn invokes lockdep when taking a lock. But at that point lockdep does not
yet know about the fact that interrupts have been disabled by the CPU,
which triggers a lockdep splat complaining about inconsistent state.
Invoking trace_hardirqs_off() before rcu_irq_enter_check_tick() defeats the
point of rcu_irq_enter_check_tick() because trace_hardirqs_off() uses RCU.
So use the same sequence as for the idle case and tell lockdep about the
irq state change first, invoke the RCU check and then do the lockdep and
tracer update.
Fixes: a5497bab5f ("entry: Provide generic interrupt entry/exit code")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87y2jhl19s.fsf@nanos.tec.linutronix.de
Since the kprobe handlers have protection that prohibits other handlers from
executing in other contexts (like if an NMI comes in while processing a
kprobe, and executes the same kprobe, it will get fail with a "busy"
return). Lockdep is unaware of this protection. Use lockdep's nesting api to
differentiate between locks taken in INT3 context and other context to
suppress the false warnings.
Link: https://lore.kernel.org/r/20201102160234.fa0ae70915ad9e2b21c08b85@kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Let's handle the successful call of mod_verify_sig() right after that call,
making the *switch* statement only handle the real errors, and then move
the comment from the first *case* before *switch* itself and the comment
before *default* after it. Fix the comment style, add article/comma/dash,
spell out "nomem" as "lack of memory" in these comments, while at it...
Suggested-by: Joe Perches <joe@perches.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Let's move the common handling of the non-fatal errors after the *switch*
statement -- this avoids *goto*s inside that *switch*...
Suggested-by: Joe Perches <joe@perches.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Signed-off-by: Jessica Yu <jeyu@kernel.org>