Commit fa2c3254d7 (sched/tracing: Don't re-read p->state when emitting
sched_switch event, 2022-01-20) added a new prev_state argument to the
sched_switch tracepoint, before the prev task_struct pointer.
This reordering of arguments broke BPF programs that use the raw
tracepoint (e.g. tp_btf programs). The type of the second argument has
changed and existing programs that assume a task_struct* argument
(e.g. for bpf_task_storage access) will now fail to verify.
If we instead append the new argument to the end, all existing programs
would continue to work and can conditionally extract the prev_state
argument on supported kernel versions.
Fixes: fa2c3254d7 (sched/tracing: Don't re-read p->state when emitting sched_switch event, 2022-01-20)
Signed-off-by: Delyan Kratunov <delyank@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/c8a6930dfdd58a4a5755fc01732675472979732b.camel@fb.com
- New user_events interface. User space can register an event with the kernel
describing the format of the event. Then it will receive a byte in a page
mapping that it can check against. A privileged task can then enable that
event like any other event, which will change the mapped byte to true,
telling the user space application to start writing the event to the
tracing buffer.
- Add new "ftrace_boot_snapshot" kernel command line parameter. When set,
the tracing buffer will be saved in the snapshot buffer at boot up when
the kernel hands things over to user space. This will keep the traces that
happened at boot up available even if user space boot up has tracing as
well.
- Have TRACE_EVENT_ENUM() also update trace event field type descriptions.
Thus if a static array defines its size with an enum, the user space trace
event parsers can still know how to parse that array.
- Add new TRACE_CUSTOM_EVENT() macro. This acts the same as the
TRACE_EVENT() macro, but will attach to an existing tracepoint. This will
make one tracepoint be able to trace different content and not be stuck at
only what the original TRACE_EVENT() macro exports.
- Fixes to tracing error logging.
- Better saving of cmdlines to PIDs when tracing (use the wakeup events for
mapping).
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYjiO3RQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qhQzAQDtek5p80p/zkMGymm14wSH6qq0NdgN
Kv7fTBwEewUa0gD/UCOVLw4Oj+JtHQhCa3sCGZopmRv0BT1+4UQANqosKQY=
=Au08
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
- New user_events interface. User space can register an event with the
kernel describing the format of the event. Then it will receive a
byte in a page mapping that it can check against. A privileged task
can then enable that event like any other event, which will change
the mapped byte to true, telling the user space application to start
writing the event to the tracing buffer.
- Add new "ftrace_boot_snapshot" kernel command line parameter. When
set, the tracing buffer will be saved in the snapshot buffer at boot
up when the kernel hands things over to user space. This will keep
the traces that happened at boot up available even if user space boot
up has tracing as well.
- Have TRACE_EVENT_ENUM() also update trace event field type
descriptions. Thus if a static array defines its size with an enum,
the user space trace event parsers can still know how to parse that
array.
- Add new TRACE_CUSTOM_EVENT() macro. This acts the same as the
TRACE_EVENT() macro, but will attach to an existing tracepoint. This
will make one tracepoint be able to trace different content and not
be stuck at only what the original TRACE_EVENT() macro exports.
- Fixes to tracing error logging.
- Better saving of cmdlines to PIDs when tracing (use the wakeup events
for mapping).
* tag 'trace-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (30 commits)
tracing: Have type enum modifications copy the strings
user_events: Add trace event call as root for low permission cases
tracing/user_events: Use alloc_pages instead of kzalloc() for register pages
tracing: Add snapshot at end of kernel boot up
tracing: Have TRACE_DEFINE_ENUM affect trace event types as well
tracing: Fix strncpy warning in trace_events_synth.c
user_events: Prevent dyn_event delete racing with ioctl add/delete
tracing: Add TRACE_CUSTOM_EVENT() macro
tracing: Move the defines to create TRACE_EVENTS into their own files
tracing: Add sample code for custom trace events
tracing: Allow custom events to be added to the tracefs directory
tracing: Fix last_cmd_set() string management in histogram code
user_events: Fix potential uninitialized pointer while parsing field
tracing: Fix allocation of last_cmd in last_cmd_set()
user_events: Add documentation file
user_events: Add sample code for typical usage
user_events: Add self-test for validator boundaries
user_events: Add self-test for perf_event integration
user_events: Add self-test for dynamic_events integration
user_events: Add self-test for ftrace integration
...
- Cleanups for SCHED_DEADLINE
- Tracing updates/fixes
- CPU Accounting fixes
- First wave of changes to optimize the overhead of the scheduler build,
from the fast-headers tree - including placeholder *_api.h headers for
later header split-ups.
- Preempt-dynamic using static_branch() for ARM64
- Isolation housekeeping mask rework; preperatory for further changes
- NUMA-balancing: deal with CPU-less nodes
- NUMA-balancing: tune systems that have multiple LLC cache domains per node (eg. AMD)
- Updates to RSEQ UAPI in preparation for glibc usage
- Lots of RSEQ/selftests, for same
- Add Suren as PSI co-maintainer
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmI5rg8RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hGrw/+M3QOk6fH7G48wjlNnBvcOife6ls+Ni4k
ixOAcF4JKoixO8HieU5vv0A7yf/83tAa6fpeXeMf1hkCGc0NSlmLtuIux+WOmoAL
LzCyDEYfiP8KnVh0A1Tui/lK0+AkGo21O6ADhQE2gh8o2LpslOHQMzvtyekSzeeb
mVxMYQN+QH0m518xdO2D8IQv9ctOYK0eGjmkqdNfntOlytypPZHeNel/tCzwklP/
dElJUjNiSKDlUgTBPtL3DfpoLOI/0mHF2p6NEXvNyULxSOqJTu8pv9Z2ADb2kKo1
0D56iXBDngMi9MHIJLgvzsA8gKzHLFSuPbpODDqkTZCa28vaMB9NYGhJ643NtEie
IXTJEvF1rmNkcLcZlZxo0yjL0fjvPkczjw4Vj27gbrUQeEBfb4mfuI4BRmij63Ep
qEkgQTJhduCqqrQP1rVyhwWZRk1JNcVug+F6N42qWW3fg1xhj0YSrLai2c9nPez6
3Zt98H8YGS1Z/JQomSw48iGXVqfTp/ETI7uU7jqHK8QcjzQ4lFK5H4GZpwuqGBZi
NJJ1l97XMEas+rPHiwMEN7Z1DVhzJLCp8omEj12QU+tGLofxxwAuuOVat3CQWLRk
f80Oya3TLEgd22hGIKDRmHa22vdWnNQyS0S15wJotawBzQf+n3auS9Q3/rh979+t
ES/qvlGxTIs=
=Z8uT
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- Cleanups for SCHED_DEADLINE
- Tracing updates/fixes
- CPU Accounting fixes
- First wave of changes to optimize the overhead of the scheduler
build, from the fast-headers tree - including placeholder *_api.h
headers for later header split-ups.
- Preempt-dynamic using static_branch() for ARM64
- Isolation housekeeping mask rework; preperatory for further changes
- NUMA-balancing: deal with CPU-less nodes
- NUMA-balancing: tune systems that have multiple LLC cache domains per
node (eg. AMD)
- Updates to RSEQ UAPI in preparation for glibc usage
- Lots of RSEQ/selftests, for same
- Add Suren as PSI co-maintainer
* tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (81 commits)
sched/headers: ARM needs asm/paravirt_api_clock.h too
sched/numa: Fix boot crash on arm64 systems
headers/prep: Fix header to build standalone: <linux/psi.h>
sched/headers: Only include <linux/entry-common.h> when CONFIG_GENERIC_ENTRY=y
cgroup: Fix suspicious rcu_dereference_check() usage warning
sched/preempt: Tell about PREEMPT_DYNAMIC on kernel headers
sched/topology: Remove redundant variable and fix incorrect type in build_sched_domains
sched/deadline,rt: Remove unused parameter from pick_next_[rt|dl]_entity()
sched/deadline,rt: Remove unused functions for !CONFIG_SMP
sched/deadline: Use __node_2_[pdl|dle]() and rb_first_cached() consistently
sched/deadline: Merge dl_task_can_attach() and dl_cpu_busy()
sched/deadline: Move bandwidth mgmt and reclaim functions into sched class source file
sched/deadline: Remove unused def_dl_bandwidth
sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE
sched/tracing: Don't re-read p->state when emitting sched_switch event
sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
sched/cpuacct: Remove redundant RCU read lock
sched/cpuacct: Optimize away RCU read lock
sched/cpuacct: Fix charge percpu cpuusage
sched/headers: Reorganize, clean up and optimize kernel/sched/sched.h dependencies
...
When an enum is used in the visible parts of a trace event that is
exported to user space, the user space applications like perf and
trace-cmd do not have a way to know what the value of the enum is. To
solve this, at boot up (or module load) the printk formats are modified to
replace the enum with their numeric value in the string output.
Array fields of the event are defined by [<nr-elements>] in the type
portion of the format file so that the user space parsers can correctly
parse the array into the appropriate size chunks. But in some trace
events, an enum is used in defining the size of the array, which once
again breaks the parsing of user space tooling.
This was solved the same way as the print formats were, but it modified
the type strings of the trace event. This caused crashes in some
architectures because, as supposed to the print string, is a const string
value. This was not detected on x86, as it appears that const strings are
still writable (at least in boot up), but other architectures this is not
the case, and writing to a const string will cause a kernel fault.
To fix this, use kstrdup() to copy the type before modifying it. If the
trace event is for the core kernel there's no need to free it because the
string will be in use for the life of the machine being on line. For
modules, create a link list to store all the strings being allocated for
modules and when the module is removed, free them.
Link: https://lore.kernel.org/all/yt9dr1706b4i.fsf@linux.ibm.com/
Link: https://lkml.kernel.org/r/20220318153432.3984b871@gandalf.local.home
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Fixes: b3bc8547d3 ("tracing: Have TRACE_DEFINE_ENUM affect trace event types as well")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The macro TRACE_DEFINE_ENUM is used to convert enums in the kernel to
their actual value when they are exported to user space via the trace
event format file.
Currently only the enums in the "print fmt" (TP_printk in the TRACE_EVENT
macro) have the enums converted. But the enums can be used to denote array
size:
field:unsigned int fc_ineligible_rc[EXT4_FC_REASON_MAX]; offset:12; size:36; signed:0;
The EXT4_FC_REASON_MAX has no meaning to userspace but it needs to know
that information to know how to parse the array.
Have the array indexes also be parsed as well.
Link: https://lore.kernel.org/all/cover.1646922487.git.riteshh@linux.ibm.com/
Reported-by: Ritesh Harjani <riteshh@linux.ibm.com>
Tested-by: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Allow custom events to be added to the events directory in the tracefs
file system. For example, a module could be installed that attaches to an
event and wants to be enabled and disabled via the tracefs file system. It
would use trace_add_event_call() to add the event to the tracefs
directory, and trace_remove_event_call() to remove it.
Make both those functions EXPORT_SYMBOL_GPL().
Link: https://lkml.kernel.org/r/20220303220625.186988045@goodmis.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
As of commit
c6e7bd7afa ("sched/core: Optimize ttwu() spinning on p->on_cpu")
the following sequence becomes possible:
p->__state = TASK_INTERRUPTIBLE;
__schedule()
deactivate_task(p);
ttwu()
READ !p->on_rq
p->__state=TASK_WAKING
trace_sched_switch()
__trace_sched_switch_state()
task_state_index()
return 0;
TASK_WAKING isn't in TASK_REPORT, so the task appears as TASK_RUNNING in
the trace event.
Prevent this by pushing the value read from __schedule() down the trace
event.
Reported-by: Abhijeet Dharmapurikar <adharmap@quicinc.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20220120162520.570782-2-valentin.schneider@arm.com
A helper macro was added to make reading socket addresses easier in trace
events. It pairs %pISpc with __get_sockaddr() that reads the socket
address from the ring buffer into a human readable format.
The boot up check that makes sure that trace events do not reference
pointers to memory that can later be freed when the trace event is read,
incorrectly flagged this as a delayed reference. Update the check to
handle "__get_sockaddr" and not report an error on it.
Link: https://lore.kernel.org/all/20220125160505.068dbb52@canb.auug.org.au/
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Since the same warning message is already printed in the
trace_create_file() function, there is no need to print it again.
Link: https://lkml.kernel.org/r/20220109162232.361747-1-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When creating a new event (via a module, kprobe, eprobe, etc), the
descriptors that are created must add flags for pid filtering if an
instance has pid filtering enabled, as the flags are used at the time the
event is executed to know if pid filtering should be done or not.
The "Only trace this pid" case was added, but a cut and paste error made
that case checked twice, instead of checking the "Trace all but this pid"
case.
Link: https://lore.kernel.org/all/202111280401.qC0z99JB-lkp@intel.com/
Fixes: 6cb206508b ("tracing: Check pid filtering when creating events")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
When pid filtering is activated in an instance, all of the events trace
files for that instance has the PID_FILTER flag set. This determines
whether or not pid filtering needs to be done on the event, otherwise the
event is executed as normal.
If pid filtering is enabled when an event is created (via a dynamic event
or modules), its flag is not updated to reflect the current state, and the
events are not filtered properly.
Cc: stable@vger.kernel.org
Fixes: 3fdaf80f4a ("tracing: Implement event pid filtering")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
When building the files in the tracefs file system, do not by default set
any permissions for OTH (other). This will make it easier for admins who
want to define a group for accessing tracefs and not having to first
disable all the permission bits for "other" in the file system.
As tracing can leak sensitive information, it should never by default
allowing all users access. An admin can still set the permission bits for
others to have access, which may be useful for creating a honeypot and
seeing who takes advantage of it and roots the machine.
Link: https://lkml.kernel.org/r/20210818153038.864149276@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Instead of having the logic that does trace_pid_list open coded, wrap it in
abstract functions. This will allow a rewrite of the logic that implements
the trace_pid_list without affecting the users.
Note, this causes a change in behavior. Every time a pid is written into
the set_*_pid file, it creates a new list and uses RCU to update it. If
pid_max is lowered, but there was a pid currently in the list that was
higher than pid_max, those pids will now be removed on updating the list.
The old behavior kept that from happening.
The rewrite of the pid_list logic will no longer depend on pid_max,
and will return the old behavior.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
migrate_disable() forbids task migration to another CPU. It is available
since v5.11 and has already users such as highmem or BPF. It is useful
to observe this task state in tracing which already has other states
like the preemption counter.
Instead of adding the migrate disable counter as a new entry to struct
trace_entry, which would extend the whole struct by four bytes, it is
squashed into the preempt-disable counter. The lower four bits represent
the preemption counter, the upper four bits represent the migrate
disable counter. Both counter shouldn't exceed 15 but if they do, there
is a safety net which caps the value at 15.
Add the migrate-disable counter to the trace entry so it shows up in the
trace. Due to the users mentioned above, it is already possible to
observe it:
| bash-1108 [000] ...21 73.950578: rss_stat: mm_id=2213312838 curr=0 type=MM_ANONPAGES size=8192B
| bash-1108 [000] d..31 73.951222: irq_disable: caller=flush_tlb_mm_range+0x115/0x130 parent=ptep_clear_flush+0x42/0x50
| bash-1108 [000] d..31 73.951222: tlb_flush: pages:1 reason:local mm shootdown (3)
The last value is the migrate-disable counter.
Things that popped up:
- trace_print_lat_context() does not print the migrate counter. Not sure
if it should. It is used in "verbose" mode and uses 8 digits and I'm
not sure ther is something processing the value.
- trace_define_common_fields() now defines a different variable. This
probably breaks things. No ide what to do in order to preserve the old
behaviour. Since this is used as a filter it should be split somehow
to be able to match both nibbles here.
Link: https://lkml.kernel.org/r/20210810132625.ylssabmsrkygokuv@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: patch description.]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[ SDR: Removed change to common_preempt_count field name ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
As dynamic events are not created by modules, if something is attached to
one, calling "try_module_get()" on its "mod" field, is not going to keep
the dynamic event from going away.
Since dynamic events do not need the "mod" pointer of the event structure,
make a union out of it in order to save memory (there's one structure for
each of the thousand+ events in the kernel), and have any event with the
DYNAMIC flag set to use a ref counter instead.
Link: https://lore.kernel.org/linux-trace-devel/20210813004448.51c7de69ce432d338f4d226b@kernel.org/
Link: https://lkml.kernel.org/r/20210817035027.174869074@goodmis.org
Suggested-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Trace events record data into the ring buffer at the time of the event. The
trace event has a printf logic to display the recorded data at a much later
time when the user reads the trace file. This makes using dereferencing
pointers unsafe if the dereferenced pointer points to the original source.
The safe way to handle this is to create an array within the trace event and
copy the source into the array. Then the dereference pointer may point to
that array.
As this is a easy mistake to make, a check is added to examine all trace
event print fmts to make sure that they are safe to use. This only checks
the various %p* dereferenced pointers like %pB, %pR, etc. It does not handle
dereferencing of strings, as there are some use cases that are OK to
dereference the source. That will be dealt with differently.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
- Update to the way irqs and preemption is tracked via the trace event PC field
- Fix handling of unregistering event failing due to allocate memory.
This is only triggered by failure injection, as it is pretty much guaranteed
to have less than a page allocation succeed.
- Do not show the useless "filter" or "enable" files for the "ftrace" trace
system, as they have no effect on doing anything.
- Add a warning if kprobes are registered more than once.
- Synthetic events now have their fields parsed by semicolons.
Old formats without semicolons will still work, but new features will
require them.
- New option to allow trace events to show %p without hashing in trace file.
The trace file can only be read by root, and reading the raw event buffer
did not have any pointers hashed, so this does not expose anything new.
- New directory in tools called tools/tracing, where a new tool that reads
sequential latency reports from the ftrace latency tracers.
- Other minor fixes and cleanups.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYDL2wBQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qti6AP0RUcSU5U1onx8DcwPQLC5Xr3CPqJkm
RvKeJDdgFP+sVgEAiMTFsy2UMc0gmlHZMFd5nZLSiJCu1I2hHmhS5yKbHgY=
=fD9+
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
- Update to the way irqs and preemption is tracked via the trace event
PC field
- Fix handling of unregistering event failing due to allocate memory.
This is only triggered by failure injection, as it is pretty much
guaranteed to have less than a page allocation succeed.
- Do not show the useless "filter" or "enable" files for the "ftrace"
trace system, as they have no effect on doing anything.
- Add a warning if kprobes are registered more than once.
- Synthetic events now have their fields parsed by semicolons. Old
formats without semicolons will still work, but new features will
require them.
- New option to allow trace events to show %p without hashing in trace
file. The trace file can only be read by root, and reading the raw
event buffer did not have any pointers hashed, so this does not
expose anything new.
- New directory in tools called tools/tracing, where a new tool that
reads sequential latency reports from the ftrace latency tracers.
- Other minor fixes and cleanups.
* tag 'trace-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
kprobes: Fix to delay the kprobes jump optimization
tracing/tools: Add the latency-collector to tools directory
tracing: Make hash-ptr option default
tracing: Add ptr-hash option to show the hashed pointer value
tracing: Update the stage 3 of trace event macro comment
tracing: Show real address for trace event arguments
selftests/ftrace: Add '!event' synthetic event syntax check
selftests/ftrace: Update synthetic event syntax errors
tracing: Add a backward-compatibility check for synthetic event creation
tracing: Update synth command errors
tracing: Rework synthetic event command parsing
tracing/dynevent: Delegate parsing to create function
kprobes: Warn if the kprobe is reregistered
ftrace: Remove unused ftrace_force_update()
tracepoints: Code clean up
tracepoints: Do not punish non static call users
tracepoints: Remove unnecessary "data_args" macro parameter
tracing: Do not create "enable" or "filter" files for ftrace event subsystem
kernel: trace: preemptirq_delay_test: add cpu affinity
tracepoint: Do not fail unregistering a probe due to memory failure
...
The ftrace event subsystem is only created for showing the format files of
events created by the ftrace tracers, and are not trace events. The ftrace
subsystem currently has both the "enable" and "filter" files that in other
subsystems are used to enable/disable all events within the subsystem or set
a filter for all the subsystem events.
As ftrace subsystem events do not use enable or filter operations, these
files are useless in the ftrace subsystem. Remove them.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The file /sys/kernel/tracing/events/enable is used to enable all events by
echoing in "1", or disabling all events when echoing in "0". To know if all
events are enabled, disabled, or some are enabled but not all of them,
cating the file should show either "1" (all enabled), "0" (all disabled), or
"X" (some enabled but not all of them). This works the same as the "enable"
files in the individule system directories (like tracing/events/sched/enable).
But when all events are enabled, the top level "enable" file shows "X". The
reason is that its checking the "ftrace" events, which are special events
that only exist for their format files. These include the format for the
function tracer events, that are enabled when the function tracer is
enabled, but not by the "enable" file. The check includes these events,
which will always be disabled, and even though all true events are enabled,
the top level "enable" file will show "X" instead of "1".
To fix this, have the check test the event's flags to see if it has the
"IGNORE_ENABLE" flag set, and if so, not test it.
Cc: stable@vger.kernel.org
Fixes: 553552ce17 ("tracing: Combine event filter_active and enable into single flags field")
Reported-by: "Yordan Karadzhov (VMware)" <y.karadz@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The state of the interrupts (irqflags) and the preemption counter are
both passed down to tracing_generic_entry_update(). Only one bit of
irqflags is actually required: The on/off state. The complete 32bit
of the preemption counter isn't needed. Just whether of the upper bits
(softirq, hardirq and NMI) are set and the preemption depth is needed.
The irqflags and the preemption counter could be evaluated early and the
information stored in an integer `trace_ctx'.
tracing_generic_entry_update() would use the upper bits as the
TRACE_FLAG_* and the lower 8bit as the disabled-preemption depth
(considering that one must be substracted from the counter in one
special cases).
The actual preemption value is not used except for the tracing record.
The `irqflags' variable is mostly used only for the tracing record. An
exception here is for instance wakeup_tracer_call() or
probe_wakeup_sched_switch() which explicilty disable interrupts and use
that `irqflags' to save (and restore) the IRQ state and to record the
state.
Struct trace_event_buffer has also the `pc' and flags' members which can
be replaced with `trace_ctx' since their actual value is not used
outside of trace recording.
This will reduce tracing_generic_entry_update() to simply assign values
to struct trace_entry. The evaluation of the TRACE_FLAG_* bits is moved
to _tracing_gen_ctx_flags() which replaces preempt_count() and
local_save_flags() invocations.
As an example, ftrace_syscall_enter() may invoke:
- trace_buffer_lock_reserve() -> … -> tracing_generic_entry_update()
- event_trigger_unlock_commit()
-> ftrace_trace_stack() -> … -> tracing_generic_entry_update()
-> ftrace_trace_userstack() -> … -> tracing_generic_entry_update()
In this case the TRACE_FLAG_* bits were evaluated three times. By using
the `trace_ctx' they are evaluated once and assigned three times.
A build with all tracers enabled on x86-64 with and without the patch:
text data bss dec hex filename
21970669 17084168 7639260 46694097 2c87ed1 vmlinux.old
21970293 17084168 7639260 46693721 2c87d59 vmlinux.new
text shrank by 379 bytes, data remained constant.
Link: https://lkml.kernel.org/r/20210125194511.3924915-2-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The major update to this release is that there's a new arch config option called:
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS. Currently, only x86_64 enables it.
All the ftrace callbacks now take a struct ftrace_regs instead of a struct
pt_regs. If the architecture has HAVE_DYNAMIC_FTRACE_WITH_ARGS enabled, then
the ftrace_regs will have enough information to read the arguments of the
function being traced, as well as access to the stack pointer. This way, if
a user (like live kernel patching) only cares about the arguments, then it
can avoid using the heavier weight "regs" callback, that puts in enough
information in the struct ftrace_regs to simulate a breakpoint exception
(needed for kprobes).
New config option that audits the timestamps of the ftrace ring buffer at
most every event recorded. The "check_buffer()" calls will conflict with
mainline, because I purposely added the check without including the fix that
it caught, which is in mainline. Running a kernel built from the commit of
the added check will trigger it.
Ftrace recursion protection has been cleaned up to move the protection to
the callback itself (this saves on an extra function call for those
callbacks).
Perf now handles its own RCU protection and does not depend on ftrace to do
it for it (saving on that extra function call).
New debug option to add "recursed_functions" file to tracefs that lists all
the places that triggered the recursion protection of the function tracer.
This will show where things need to be fixed as recursion slows down the
function tracer.
The eval enum mapping updates done at boot up are now offloaded to a work
queue, as it caused a noticeable pause on slow embedded boards.
Various clean ups and last minute fixes.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCX9uq8xQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qtrwAQCHevqWMjKc1Q76bnCgwB0AbFKB6vqy
5b6g/co5+ihv8wD/eJPWlZMAt97zTVW7bdp5qj/GTiCDbAsODMZ597LsxA0=
=rZEz
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"The major update to this release is that there's a new arch config
option called CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS.
Currently, only x86_64 enables it. All the ftrace callbacks now take a
struct ftrace_regs instead of a struct pt_regs. If the architecture
has HAVE_DYNAMIC_FTRACE_WITH_ARGS enabled, then the ftrace_regs will
have enough information to read the arguments of the function being
traced, as well as access to the stack pointer.
This way, if a user (like live kernel patching) only cares about the
arguments, then it can avoid using the heavier weight "regs" callback,
that puts in enough information in the struct ftrace_regs to simulate
a breakpoint exception (needed for kprobes).
A new config option that audits the timestamps of the ftrace ring
buffer at most every event recorded.
Ftrace recursion protection has been cleaned up to move the protection
to the callback itself (this saves on an extra function call for those
callbacks).
Perf now handles its own RCU protection and does not depend on ftrace
to do it for it (saving on that extra function call).
New debug option to add "recursed_functions" file to tracefs that
lists all the places that triggered the recursion protection of the
function tracer. This will show where things need to be fixed as
recursion slows down the function tracer.
The eval enum mapping updates done at boot up are now offloaded to a
work queue, as it caused a noticeable pause on slow embedded boards.
Various clean ups and last minute fixes"
* tag 'trace-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
tracing: Offload eval map updates to a work queue
Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS"
ring-buffer: Add rb_check_bpage in __rb_allocate_pages
ring-buffer: Fix two typos in comments
tracing: Drop unneeded assignment in ring_buffer_resize()
tracing: Disable ftrace selftests when any tracer is running
seq_buf: Avoid type mismatch for seq_buf_init
ring-buffer: Fix a typo in function description
ring-buffer: Remove obsolete rb_event_is_commit()
ring-buffer: Add test to validate the time stamp deltas
ftrace/documentation: Fix RST C code blocks
tracing: Clean up after filter logic rewriting
tracing: Remove the useless value assignment in test_create_synth_event()
livepatch: Use the default ftrace_ops instead of REGS when ARGS is available
ftrace/x86: Allow for arguments to be passed in to ftrace_regs by default
ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regs
MAINTAINERS: assign ./fs/tracefs to TRACING
tracing: Fix some typos in comments
ftrace: Remove unused varible 'ret'
ring-buffer: Add recording of ring buffer recursion into recursed_functions
...
On architectures using the generic syscall entry code the architecture
independent syscall work is moved to flags in thread_info::syscall_work.
This removes architecture dependencies and frees up TIF bits.
Define SYSCALL_WORK_SYSCALL_TRACEPOINT, use it in the generic entry code
and convert the code which uses the TIF specific helper functions to use
the new *_syscall_work() helpers which either resolve to the new mode for
users of the generic entry code or to the TIF based functions for the other
architectures.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20201116174206.2639648-6-krisman@collabora.com
In preparation to have arguments of a function passed to callbacks attached
to functions as default, change the default callback prototype to receive a
struct ftrace_regs as the forth parameter instead of a pt_regs.
For callbacks that set the FL_SAVE_REGS flag in their ftrace_ops flags, they
will now need to get the pt_regs via a ftrace_get_regs() helper call. If
this is called by a callback that their ftrace_ops did not have a
FL_SAVE_REGS flag set, it that helper function will return NULL.
This will allow the ftrace_regs to hold enough just to get the parameters
and stack pointer, but without the worry that callbacks may have a pt_regs
that is not completely filled.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Now that all callbacks are recursion safe, reverse the meaning of the
RECURSION flag and rename it from RECURSION_SAFE to simply RECURSION.
Now only callbacks that request to have recursion protecting it will
have the added trampoline to do so.
Also remove the outdated comment about "PER_CPU" when determining to
use the ftrace_ops_assist_func.
Link: https://lkml.kernel.org/r/20201028115613.742454631@goodmis.org
Link: https://lkml.kernel.org/r/20201106023547.904270143@goodmis.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
- Fixes the issue of a mismatch section that was missed due to gcc
inlining the offending function, while clang did not (and reported
the issue).
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCX4oAlhQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qkyBAQDnmsIHN1BPMdWbSFJeRYobSYFKoSzh
qwcqy0prvNeFvgD9GeC7wZeJaUWCt4zRbpZhuclIq2BNqoiA/llE67zXQwQ=
=W1ca
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Fix mismatch section of adding early trace events.
Fixes the issue of a mismatch section that was missed due to gcc
inlining the offending function, while clang did not (and reported the
issue)"
* tag 'trace-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Remove __init from __trace_early_add_new_event()
The commit 720dee53ad ("tracing/boot: Initialize per-instance event
list in early boot") removes __init from __trace_early_add_events()
but __trace_early_add_new_event() still has __init and will cause a
section mismatch.
Remove __init from __trace_early_add_new_event() as same as
__trace_early_add_events().
Link: https://lore.kernel.org/lkml/CAHk-=wjU86UhovK4XuwvCqTOfc+nvtpAuaN2PJBz15z=w=u0Xg@mail.gmail.com/
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
- Add support for "bool" type in synthetic events
- Add per instance tracing for bootconfig
- Support perf-style return probe ("SYMBOL%return") in kprobes and uprobes
- Allow for kprobes to be enabled earlier in boot up
- Added tracepoint helper function to allow testing if tracepoints are
enabled in headers
- Synthetic events can now have dynamic strings (variable length)
- Various fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCX4iMDRQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qrMPAP0UAfOeQcYxBAw9y8oX7oJnBBylLFTR
CICOVEhBYC/xIQD/edVPEUt77ozM/Bplwv8BiO4QxFjgZFqtpZI8mskIfAo=
=sbny
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"Updates for tracing and bootconfig:
- Add support for "bool" type in synthetic events
- Add per instance tracing for bootconfig
- Support perf-style return probe ("SYMBOL%return") in kprobes and
uprobes
- Allow for kprobes to be enabled earlier in boot up
- Added tracepoint helper function to allow testing if tracepoints
are enabled in headers
- Synthetic events can now have dynamic strings (variable length)
- Various fixes and cleanups"
* tag 'trace-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (58 commits)
tracing: support "bool" type in synthetic trace events
selftests/ftrace: Add test case for synthetic event syntax errors
tracing: Handle synthetic event array field type checking correctly
selftests/ftrace: Change synthetic event name for inter-event-combined test
tracing: Add synthetic event error logging
tracing: Check that the synthetic event and field names are legal
tracing: Move is_good_name() from trace_probe.h to trace.h
tracing: Don't show dynamic string internals in synthetic event description
tracing: Fix some typos in comments
tracing/boot: Add ftrace.instance.*.alloc_snapshot option
tracing: Fix race in trace_open and buffer resize call
tracing: Check return value of __create_val_fields() before using its result
tracing: Fix synthetic print fmt check for use of __get_str()
tracing: Remove a pointless assignment
ftrace: ftrace_global_list is renamed to ftrace_ops_list
ftrace: Format variable declarations of ftrace_allocate_records
ftrace: Simplify the calculation of page number for ftrace_page->records
ftrace: Simplify the dyn_ftrace->flags macro
ftrace: Simplify the hash calculation
ftrace: Use fls() to get the bits for dup_hash()
...
Initialize per-instance event list in early boot time (before
initializing instance directory on tracefs). This fixes boot-time
tracing to correctly handle the boot-time per-instance settings.
Link: https://lkml.kernel.org/r/160096560826.182763.17110991546046128881.stgit@devnote2
Fixes: 4114fbfd02 ("tracing: Enable creating new instance early boot")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Split the event fields initialization from creating new event directory.
This allows the boot-time tracing to define dynamic events before
initializing events directory on tracefs.
Link: https://lkml.kernel.org/r/159974153790.478751.3475515065034825374.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Define event fields at early stage so that boot-time tracing can
access the event fields (like per-event filter setting).
Link: https://lkml.kernel.org/r/159974152862.478751.2023768466808361350.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Currently we have following call flow:
tracer_init_tracefs()
tracing_init_dentry()
event_trace_init()
tracing_init_dentry()
This shows tracing_init_dentry() is called twice in this flow and this
is not necessary.
Let's remove the second one when it is for sure be properly initialized.
Link: https://lkml.kernel.org/r/20200712011036.70948-4-richard.weiyang@linux.alibaba.com
Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
While auditing all module notifiers I noticed a whole bunch of fail
wrt the return value. Notifiers have a 'special' return semantics.
As is; NOTIFY_DONE vs NOTIFY_OK is a bit vague; but
notifier_from_errno(0) results in NOTIFY_OK and NOTIFY_DONE has a
comment that says "Don't care".
From this I've used NOTIFY_DONE when the function completely ignores
the callback and notifier_to_error() isn't used.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Robert Richter <rric@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20200818135804.385360407@infradead.org
On exit, if a process is preempted after the trace_sched_process_exit()
tracepoint but before the process is done exiting, then when it gets
scheduled in, the function tracers will not filter it properly against the
function tracing pid filters.
That is because the function tracing pid filters hooks to the
sched_process_exit() tracepoint to remove the exiting task's pid from the
filter list. Because the filtering happens at the sched_switch tracepoint,
when the exiting task schedules back in to finish up the exit, it will no
longer be in the function pid filtering tables.
This was noticeable in the notrace self tests on a preemptable kernel, as
the tests would fail as it exits and preempted after being taken off the
notrace filter table and on scheduling back in it would not be in the
notrace list, and then the ending of the exit function would trace. The test
detected this and would fail.
Cc: stable@vger.kernel.org
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: 1e10486ffe ("ftrace: Add 'function-fork' trace option")
Fixes: c37775d578 ("tracing: Add infrastructure to allow set_event_pid to follow children"
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Add a new "hist_debug" file for each trace event, which when read will
dump out a bunch of internal details about the hist triggers defined
on that event.
This is normally off but can be enabled by saying 'y' to the new
CONFIG_HIST_TRIGGERS_DEBUG config option.
This is in support of the new Documentation file describing histogram
internals, Documentation/trace/histogram-design.rst, which was
requested by developers trying to understand the internals when
extending or making use of the hist triggers for higher-level tools.
The histogram-design.rst documentation refers to the hist_debug files
and demonstrates their use with output in the test examples.
Link: http://lkml.kernel.org/r/77914c22b0ba493d9783c53bbfbc6087d6a7e1b1.1585941485.git.zanussi@kernel.org
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
There's currently a way to select a task that should only have its events
traced, but there's no way to select a task not to have itsevents traced.
Add a set_event_notrace_pid file that acts the same as set_event_pid (and is
also affected by event-fork), but the task pids in this file will not be
traced even if they are listed in the set_event_pid file. This makes it easy
for tools like trace-cmd to "hide" itself from beint traced by events when
it is recording other tasks.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
There's currently a way to select a task that should only be traced by
functions, but there's no way to select a task not to be traced by the
function tracer. Add a set_ftrace_notrace_pid file that acts the same as
set_ftrace_pid (and is also affected by function-fork), but the task pids in
this file will not be traced even if they are listed in the set_ftrace_pid
file. This makes it easy for tools like trace-cmd to "hide" itself from the
function tracer when it is recording other tasks.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
- Added new "bootconfig".
Looks for a file appended to initrd to add boot config options.
This has been discussed thoroughly at Linux Plumbers.
Very useful for adding kprobes at bootup.
Only enabled if "bootconfig" is on the real kernel command line.
- Created dynamic event creation.
Merges common code between creating synthetic events and
kprobe events.
- Rename perf "ring_buffer" structure to "perf_buffer"
- Rename ftrace "ring_buffer" structure to "trace_buffer"
Had to rename existing "trace_buffer" to "array_buffer"
- Allow trace_printk() to work withing (some) tracing code.
- Sort of tracing configs to be a little better organized
- Fixed bug where ftrace_graph hash was not being protected properly
- Various other small fixes and clean ups
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXjtAURQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qshOAQDzopQmvAVrrI6oogghr8JQA30Z2yqT
i+Ld7vPWL2MV9wEA1S+zLGDSYrj8f/vsCq6BxRYT1ApO+YtmY6LTXiUejwg=
=WNds
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
- Added new "bootconfig".
This looks for a file appended to initrd to add boot config options,
and has been discussed thoroughly at Linux Plumbers.
Very useful for adding kprobes at bootup.
Only enabled if "bootconfig" is on the real kernel command line.
- Created dynamic event creation.
Merges common code between creating synthetic events and kprobe
events.
- Rename perf "ring_buffer" structure to "perf_buffer"
- Rename ftrace "ring_buffer" structure to "trace_buffer"
Had to rename existing "trace_buffer" to "array_buffer"
- Allow trace_printk() to work withing (some) tracing code.
- Sort of tracing configs to be a little better organized
- Fixed bug where ftrace_graph hash was not being protected properly
- Various other small fixes and clean ups
* tag 'trace-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (88 commits)
bootconfig: Show the number of nodes on boot message
tools/bootconfig: Show the number of bootconfig nodes
bootconfig: Add more parse error messages
bootconfig: Use bootconfig instead of boot config
ftrace: Protect ftrace_graph_hash with ftrace_sync
ftrace: Add comment to why rcu_dereference_sched() is open coded
tracing: Annotate ftrace_graph_notrace_hash pointer with __rcu
tracing: Annotate ftrace_graph_hash pointer with __rcu
bootconfig: Only load bootconfig if "bootconfig" is on the kernel cmdline
tracing: Use seq_buf for building dynevent_cmd string
tracing: Remove useless code in dynevent_arg_pair_add()
tracing: Remove check_arg() callbacks from dynevent args
tracing: Consolidate some synth_event_trace code
tracing: Fix now invalid var_ref_vals assumption in trace action
tracing: Change trace_boot to use synth_event interface
tracing: Move tracing selftests to bottom of menu
tracing: Move mmio tracer config up with the other tracers
tracing: Move tracing test module configs together
tracing: Move all function tracing configs together
tracing: Documentation for in-kernel synthetic event API
...
Pull vfs recursive removal updates from Al Viro:
"We have quite a few places where synthetic filesystems do an
equivalent of 'rm -rf', with varying amounts of code duplication,
wrong locking, etc. That really ought to be a library helper.
Only debugfs (and very similar tracefs) are converted here - I have
more conversions, but they'd never been in -next, so they'll have to
wait"
* 'work.recursive_removal' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
simple_recursive_removal(): kernel-side rm -rf for ramfs-style filesystems
Add a function to get an event file and prevent it from going away on
module or instance removal.
trace_get_event_file() will find an event file in a given instance (if
instance is NULL, it assumes the top trace array) and return it,
pinning the instance's trace array as well as the event's module, if
applicable, so they won't go away while in use.
trace_put_event_file() does the matching release.
Link: http://lkml.kernel.org/r/bb31ac4bdda168d5ed3c4b5f5a4c8f633e8d9118.1580323897.git.zanussi@kernel.org
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
[ Moved trace_array_put() to end of trace_put_event_file() ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Since kprobe-events use event_trigger_unlock_commit_regs() directly,
that events doesn't show up in printk buffer if "tp_printk" is set.
Use trace_event_buffer_commit() in kprobe events so that it can
invoke output_printk() as same as other trace events.
Link: http://lkml.kernel.org/r/157867233085.17873.5210928676787339604.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
[ Adjusted data var declaration placement in __kretprobe_trace_func() ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
As there's two struct ring_buffers in the kernel, it causes some confusion.
The other one being the perf ring buffer. It was agreed upon that as neither
of the ring buffers are generic enough to be used globally, they should be
renamed as:
perf's ring_buffer -> perf_buffer
ftrace's ring_buffer -> trace_buffer
This implements the changes to the ring buffer that ftrace uses.
Link: https://lore.kernel.org/r/20191213140531.116b3200@gandalf.local.home
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
As we are working to remove the generic "ring_buffer" name that is used by
both tracing and perf, the ring_buffer name for tracing will be renamed to
trace_buffer, and perf's ring buffer will be renamed to perf_buffer.
As there already exists a trace_buffer that is used by the trace_arrays, it
needs to be first renamed to array_buffer.
Link: https://lore.kernel.org/r/20191213153553.GE20583@krava
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Task T2 Task T3
trace_options_core_write() subsystem_open()
mutex_lock(trace_types_lock) mutex_lock(event_mutex)
set_tracer_flag()
trace_event_enable_tgid_record() mutex_lock(trace_types_lock)
mutex_lock(event_mutex)
This gives a circular dependency deadlock between trace_types_lock and
event_mutex. To fix this invert the usage of trace_types_lock and
event_mutex in trace_options_core_write(). This keeps the sequence of
lock usage consistent.
Link: http://lkml.kernel.org/r/0101016eef175e38-8ca71caf-a4eb-480d-a1e6-6f0bbc015495-000000@us-west-2.amazonses.com
Cc: stable@vger.kernel.org
Fixes: d914ba37d7 ("tracing: Add support for recording tgid of tasks")
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
We have been trying to use rasdaemon to monitor hardware errors like
correctable memory errors. rasdaemon uses trace events to monitor
various hardware errors. In order to test it, we have to inject some
hardware errors, unfortunately not all of them provide error
injections. MCE does provide a way to inject MCE errors, but errors
like PCI error and devlink error don't, it is not easy to add error
injection to each of them. Instead, it is relatively easier to just
allow users to inject trace events in a generic way so that all trace
events can be injected.
This patch introduces trace event injection, where a new 'inject' is
added to each tracepoint directory. Users could write into this file
with key=value pairs to specify the value of each fields of the trace
event, all unspecified fields are set to zero values by default.
For example, for the net/net_dev_queue tracepoint, we can inject:
INJECT=/sys/kernel/debug/tracing/events/net/net_dev_queue/inject
echo "" > $INJECT
echo "name='test'" > $INJECT
echo "name='test' len=1024" > $INJECT
cat /sys/kernel/debug/tracing/trace
...
<...>-614 [000] .... 36.571483: net_dev_queue: dev= skbaddr=00000000fbf338c2 len=0
<...>-614 [001] .... 136.588252: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=0
<...>-614 [001] .N.. 208.431878: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=1024
Triggers could be triggered as usual too:
echo "stacktrace if len == 1025" > /sys/kernel/debug/tracing/events/net/net_dev_queue/trigger
echo "len=1025" > $INJECT
cat /sys/kernel/debug/tracing/trace
...
bash-614 [000] .... 36.571483: net_dev_queue: dev= skbaddr=00000000fbf338c2 len=0
bash-614 [001] .... 136.588252: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=0
bash-614 [001] .N.. 208.431878: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=1024
bash-614 [001] .N.1 284.236349: <stack trace>
=> event_inject_write
=> vfs_write
=> ksys_write
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe
The only thing that can't be injected is string pointers as they
require constant string pointers, this can't be done at run time.
Link: http://lkml.kernel.org/r/20191130045218.18979-1-xiyou.wangcong@gmail.com
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>