It turns out allowing non-contigous allocations here was a rather bad
idea, as we'll now need to define ways to get the pages for mmaping
or dma_buf sharing. Revert this change and stick to the original
concept. A different API for the use case of non-contigous allocations
will be added back later.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Tested-by: Ricardo Ribalda <ribalda@chromium.org>:wq
Pull tracing fix from Steven Rostedt:
"Fix output of top level event tracing 'enable' file.
When writing a tool for enabling events in the tracing system, an
anomaly was discovered. The top level event 'enable' file would never
show '1' when all events were enabled.
The system and event 'enable' files worked as expected.
The reason was because the top level event 'enable' file included the
'ftrace' tracer events, which are not controlled by the 'enable' file
and would cause the output to be wrong. This appears to have been a
bug since it was created"
* tag 'trace-v5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Do not count ftrace events in top level enable output
Currently kdb uses in_interrupt() to determine whether its library
code has been called from the kgdb trap handler or from a saner calling
context such as driver init. This approach is broken because
in_interrupt() alone isn't able to determine kgdb trap handler entry from
normal task context. This can happen during normal use of basic features
such as breakpoints and can also be trivially reproduced using:
echo g > /proc/sysrq-trigger
We can improve this by adding check for in_dbg_master() instead which
explicitly determines if we are running in debugger context.
Cc: stable@vger.kernel.org
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Link: https://lore.kernel.org/r/1611313556-4004-1-git-send-email-sumit.garg@linaro.org
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
EXPORT_UNUSED_SYMBOL* is not actually used anywhere. Remove the
unused functionality as we generally just remove unused code anyway.
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
As far as I can tell this has never been used at all, and certainly
not any time recently.
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
struct symsearch is only used inside of module.h, so move the definition
out of module.h.
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Simplify the calling convention by passing the find_symbol_args structure
to find_symbol instead of initializing it inside the function.
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
each_symbol_section is only called by find_symbol, so merge the two
functions.
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
each_symbol_in_section just contains a trivial loop over its arguments.
Just open code the loop in the two callers.
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Except for two lockdep asserts module_mutex is only used in module.c.
Remove the two asserts given that the functions they are in are not
exported and just called from the module code, and mark module_mutex
static.
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
kallsyms_on_each_symbol and module_kallsyms_on_each_symbol are only used
by the livepatching code, so don't build them if livepatching is not
enabled.
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Require an explicit call to module_kallsyms_on_each_symbol to look
for symbols in modules instead of the call from kallsyms_on_each_symbol,
and acquire module_mutex inside of module_kallsyms_on_each_symbol instead
of leaving that up to the caller. Note that this slightly changes the
behavior for the livepatch code in that the symbols from vmlinux are not
iterated anymore if objname is set, but that actually is the desired
behavior in this case.
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Allow for a RCU-sched critical section around find_module, following
the lower level find_module_all helper, and switch the two callers
outside of module.c to use such a RCU-sched critical section instead
of module_mutex.
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
find_module is not used by modular code any more, and random driver code
has no business calling it to start with.
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Pull dma-mapping fix from Christoph Hellwig:
"Fix a 32 vs 64-bit padding issue in the new benchmark code (Barry
Song)"
* tag 'dma-mapping-5.11-2' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: benchmark: use u8 for reserved field in uAPI structure
Pull irq fixes from Borislav Petkov:
- Prevent device managed IRQ allocation helpers from returning IRQ 0
- A fix for MSI activation of PCI endpoints with multiple MSIs
* tag 'irq_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Prevent [devm_]irq_alloc_desc from returning irq 0
genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is set
Pull syscall entry fixes from Borislav Petkov:
- For syscall user dispatch, separate prctl operation from syscall
redirection range specification before the API has been made official
in 5.11.
- Ensure tasks using the generic syscall code do trap after returning
from a syscall when single-stepping is requested.
* tag 'core_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
entry: Use different define for selector variable in SUD
entry: Ensure trap after single-step on system call return
Pull timer fixes from Borislav Petkov:
"Two more timers-related fixes for v5.11:
- Use a freezable workqueue for RTC sync because the sync can happen
at any time and trigger suspend assertion checks in the i2c
subsystem.
- Correct a previous RTC validation change to check only bit 6 in
register D because some Intel machines use bits 0-5"
* tag 'timers_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ntp: Use freezable workqueue for RTC synchronization
rtc: mc146818: Dont test for bit 0-5 in Register D
Michael Kerrisk suggested that, from an API perspective, it is a bad
idea to share the PR_SYS_DISPATCH_ defines between the prctl operation
and the selector variable.
Therefore, define two new constants to be used by SUD's selector variable
and update the corresponding documentation and test cases.
While this changes the API syscall user dispatch has never been part of a
Linux release, it will show up for the first time in 5.11.
Suggested-by: Michael Kerrisk (man-pages) <mtk.manpages@gmail.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210205184321.2062251-1-krisman@collabora.com
Commit 2991552447 ("entry: Drop usage of TIF flags in the generic syscall
code") introduced a bug on architectures using the generic syscall entry
code, in which processes stopped by PTRACE_SYSCALL do not trap on syscall
return after receiving a TIF_SINGLESTEP.
The reason is that the meaning of TIF_SINGLESTEP flag is overloaded to
cause the trap after a system call is executed, but since the above commit,
the syscall call handler only checks for the SYSCALL_WORK flags on the exit
work.
Split the meaning of TIF_SINGLESTEP such that it only means single-step
mode, and create a new type of SYSCALL_WORK to request a trap immediately
after a syscall in single-step mode. In the current implementation, the
SYSCALL_WORK flag shadows the TIF_SINGLESTEP flag for simplicity.
Update x86 to flip this bit when a tracer enables single stepping.
Fixes: 2991552447 ("entry: Drop usage of TIF flags in the generic syscall code")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kyle Huey <me@kylehuey.com>
Link: https://lore.kernel.org/r/87h7mtc9pr.fsf_-_@collabora.com
The ftrace event subsystem is only created for showing the format files of
events created by the ftrace tracers, and are not trace events. The ftrace
subsystem currently has both the "enable" and "filter" files that in other
subsystems are used to enable/disable all events within the subsystem or set
a filter for all the subsystem events.
As ftrace subsystem events do not use enable or filter operations, these
files are useless in the ftrace subsystem. Remove them.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The file /sys/kernel/tracing/events/enable is used to enable all events by
echoing in "1", or disabling all events when echoing in "0". To know if all
events are enabled, disabled, or some are enabled but not all of them,
cating the file should show either "1" (all enabled), "0" (all disabled), or
"X" (some enabled but not all of them). This works the same as the "enable"
files in the individule system directories (like tracing/events/sched/enable).
But when all events are enabled, the top level "enable" file shows "X". The
reason is that its checking the "ftrace" events, which are special events
that only exist for their format files. These include the format for the
function tracer events, that are enabled when the function tracer is
enabled, but not by the "enable" file. The check includes these events,
which will always be disabled, and even though all true events are enabled,
the top level "enable" file will show "X" instead of "1".
To fix this, have the check test the event's flags to see if it has the
"IGNORE_ENABLE" flag set, and if so, not test it.
Cc: stable@vger.kernel.org
Fixes: 553552ce17 ("tracing: Combine event filter_active and enable into single flags field")
Reported-by: "Yordan Karadzhov (VMware)" <y.karadz@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
On ARCH=um, loading a module doesn't result in its constructors getting
called, which breaks module gcov since the debugfs files are never
registered. On the other hand, in-kernel constructors have already been
called by the dynamic linker, so we can't call them again.
Get out of this conundrum by allowing CONFIG_CONSTRUCTORS to be
selected, but avoiding the in-kernel constructor calls.
Also remove the "if !UML" from GCOV selecting CONSTRUCTORS now, since we
really do want CONSTRUCTORS, just not kernel binary ones.
Link: https://lkml.kernel.org/r/20210120172041.c246a2cac2fb.I1358f584b76f1898373adfed77f4462c8705b736@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The bug fixed by commit e3fab2f3de ("ntp: Fix RTC synchronization on
32-bit platforms") revealed an underlying issue: RTC synchronization may
happen anytime, even while the system is partially suspended.
On systems where the RTC is connected to an I2C bus, the I2C bus controller
may already or still be suspended, triggering a WARNING during suspend or
resume from s2ram:
WARNING: CPU: 0 PID: 124 at drivers/i2c/i2c-core.h:54 __i2c_transfer+0x634/0x680
i2c i2c-6: Transfer while suspended
[...]
Workqueue: events_power_efficient sync_hw_clock
[...]
(__i2c_transfer)
(i2c_transfer)
(regmap_i2c_read)
...
(da9063_rtc_set_time)
(rtc_set_time)
(sync_hw_clock)
(process_one_work)
Fix this race condition by using the freezable instead of the normal
power-efficient workqueue.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20210125143039.1051912-1-geert+renesas@glider.be
In a real dma mapping user case, after dma_map is done, data will be
transmit. Thus, in multi-threaded user scenario, IOMMU contention
should not be that severe. For example, if users enable multiple
threads to send network packets through 1G/10G/100Gbps NIC, usually
the steps will be: map -> transmission -> unmap. Transmission delay
reduces the contention of IOMMU.
Here a delay is added to simulate the transmission between map and unmap
so that the tested result could be more accurate for TX and simple RX.
A typical TX transmission for NIC would be like: map -> TX -> unmap
since the socket buffers come from OS. Simple RX model eg. disk driver,
is also map -> RX -> unmap, but real RX model in a NIC could be more
complicated considering packets can come spontaneously and many drivers
are using pre-mapped buffers pool. This is in the TBD list.
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The original code put five u32 before a u64 expansion[10] array. Five is
odd, this will cause trouble in the extension of the structure by adding
new features. This patch moves to use u8 for reserved field to avoid
future alignment risk.
Meanwhile, it also clears the memory of struct map_benchmark in tools,
otherwise, if users use old version to run on newer kernel, the random
expansion value will cause side effect on newer kernel.
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The BPF ringbuffer map is pre-allocated and the implementation logic
does not rely on disabling preemption or per-cpu data structures. Using
the BPF ringbuffer sleepable LSM and tracing programs does not trigger
any warnings with DEBUG_ATOMIC_SLEEP, DEBUG_PREEMPT,
PROVE_RCU and PROVE_LOCKING and LOCKDEP enabled.
This allows helpers like bpf_copy_from_user and bpf_ima_inode_hash to
write to the BPF ring buffer from sleepable BPF programs.
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210204193622.3367275-2-kpsingh@kernel.org
There are several common patterns.
0:
kdb_printf("...",...);
which is the normal one.
1:
kdb_printf("%s: "...,__func__,...)
We could improve '1' to this :
#define kdb_func_printf(format, args...) \
kdb_printf("%s: " format, __func__, ## args)
2:
if(KDB_DEBUG(AR))
kdb_printf("%s "...,__func__,...);
We could improve '2' to this :
#define kdb_dbg_printf(mask, format, args...) \
do { \
if (KDB_DEBUG(mask)) \
kdb_func_printf(format, ## args); \
} while (0)
In addition, we changed the format code of size_t to %zu.
Signed-off-by: Stephen Zhang <stephenzhangzsd@gmail.com>
Link: https://lore.kernel.org/r/1612440429-6391-1-git-send-email-stephenzhangzsd@gmail.com
Reviewed-by: Douglas Anderson <dianders@chromium.org>
[daniel.thompson@linaro.org: Minor typo and line length fixes in the
patch description]
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Pull tracing fixes from Steven Rostedt:
- Initialize tracing-graph-pause at task creation, not start of
function tracing, to avoid corrupting the pause counter.
- Set "pause-on-trace" for latency tracers as that option breaks their
output (regression).
- Fix the wrong error return for setting kretprobes on future modules
(before they are loaded).
- Fix re-registering the same kretprobe.
- Add missing value check for added RCU variable reload.
* tag 'trace-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracepoint: Fix race between tracing and removing tracepoint
kretprobe: Avoid re-registration of the same kretprobe earlier
tracing/kprobe: Fix to support kretprobe events on unloaded modules
tracing: Use pause-on-trace with the latency tracers
fgraph: Initialize tracing_graph_pause at task creation
The commit 0d00449c7a ("x86: Replace ist_enter() with nmi_enter()")
converted do_int3 handler to be "NMI-like".
That made old if (in_nmi()) check abort execution of bpf programs
attached to kprobe when kprobe is firing via int3
(For example when kprobe is placed in the middle of the function).
Remove the check to restore user visible behavior.
Fixes: 0d00449c7a ("x86: Replace ist_enter() with nmi_enter()")
Reported-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/bpf/20210203070636.70926-1-alexei.starovoitov@gmail.com
When BPF_FETCH is set, atomic instructions load a value from memory
into a register. The current verifier code first checks via
check_mem_access whether we can access the memory, and then checks
via check_reg_arg whether we can write into the register.
For loads, check_reg_arg has the side-effect of marking the
register's value as unkonwn, and check_mem_access has the side effect
of propagating bounds from memory to the register. This currently only
takes effect for stack memory.
Therefore with the current order, bounds information is thrown away,
but by simply reversing the order of check_reg_arg
vs. check_mem_access, we can instead propagate bounds smartly.
A simple test is added with an infinite loop that can only be proved
unreachable if this propagation is present. This is implemented both
with C and directly in test_verifier using assembly.
Suggested-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210202135002.4024825-1-jackmanb@google.com
The kernel thread executing test can run on any cpu, which might be
different cpu latency tracer is running on, as a result, the
big latency caused by preemptirq delay test can't be detected.
Therefore, the argument cpu_affinity is added to be passed to test,
ensure it's running on the same cpu with latency tracer.
e.g.
cyclictest -p 90 -m -c 0 -i 1000 -a 3
modprobe preemptirq_delay_test test_mode=preempt delay=500 \
burst_size=3 cpu_affinity=3
Link: https://lkml.kernel.org/r/1611797713-20965-1-git-send-email-chensong_2000@189.cn
Signed-off-by: Song Chen <chensong_2000@189.cn>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
PREEMPT_RT does not report "serving softirq" because the tracing core
looks at the preemption counter while PREEMPT_RT does not update it
while processing softirqs in order to remain preemptible. The
information is stored somewhere else.
The in_serving_softirq() macro and the SOFTIRQ_OFFSET define are still
working but not on the preempt-counter.
Use in_serving_softirq() macro which works on PREEMPT_RT. On !PREEMPT_RT
the compiler (gcc-10 / clang-11) is smart enough to optimize the
in_serving_softirq() related read of the preemption counter away.
The only difference I noticed by using in_serving_softirq() on
!PREEMPT_RT is that gcc-10 implemented tracing_gen_ctx_flags() as
reading FLAG, jmp _tracing_gen_ctx_flags(). Without in_serving_softirq()
it inlined _tracing_gen_ctx_flags() into tracing_gen_ctx_flags().
Link: https://lkml.kernel.org/r/20210125194511.3924915-4-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>