Commit Graph

5151 Commits

Author SHA1 Message Date
Kalesh Singh
6a6e5ef2b2 tracing/histogram: Document hist trigger variables
Update the tracefs README to describe how hist trigger variables
can be created.

Link: https://lkml.kernel.org/r/20211029183339.3216491-4-kaleshsingh@google.com

Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01 20:46:52 -04:00
Kalesh Singh
8b5d46fd7a tracing/histogram: Optimize division by constants
If the divisor is a constant use specific division functions to
avoid extra branches when the trigger is hit.

If the divisor constant but not a power of 2, the division can be
replaced with a multiplication and shift in the following case:

Let X = dividend and Y = divisor.

Choose Z = some power of 2. If Y <= Z, then:
    X / Y = (X * (Z / Y)) / Z

(Z / Y) is a constant (mult) which is calculated at parse time, so:
    X / Y = (X * mult) / Z

The division by Z can be replaced by a shift since Z is a power of 2:
    X / Y = (X * mult) >> shift

As long, as X < Z the results will not be off by more than 1.

Link: https://lkml.kernel.org/r/20211029232410.3494196-1-kaleshsingh@google.com

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01 20:45:11 -04:00
Daniel Bristot de Oliveira
01e181c776 tracing/osnoise: Remove PREEMPT_RT ifdefs from inside functions
Remove CONFIG_PREEMPT_RT from inside functions, avoiding
compilation problems in the future.

Link: https://lkml.kernel.org/r/37ee0881b033cdc513efc84ebea26cf77880c8c2.1635702894.git.bristot@kernel.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: linux-rt-users@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01 18:10:37 -04:00
Daniel Bristot de Oliveira
b14f4568d3 tracing/osnoise: Remove STACKTRACE ifdefs from inside functions
Remove CONFIG_STACKTRACE from inside functions, avoiding
compilation problems in the future.

Link: https://lkml.kernel.org/r/3465cca2f28e1ba602a1fc8bdb28d12950b5226e.1635702894.git.bristot@kernel.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: linux-rt-users@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01 18:10:26 -04:00
Daniel Bristot de Oliveira
2fac8d6486 tracing/osnoise: Allow multiple instances of the same tracer
Currently, the user can start only one instance of timerlat/osnoise
tracers and the tracers cannot run in parallel.

As starting point to add more flexibility, let's allow the same tracer to
run on different trace instances. The workload will start when the first
trace_array (instance) is registered and stop when the last instance
is unregistered.

So, while this patch allows the same tracer to run in multiple
instances (e.g., two instances running osnoise), it still does not allow
instances of timerlat and osnoise in parallel (e.g., one timerlat and
osnoise). That is because the osnoise: events have different behavior
depending on which tracer is enabled (osnoise or timerlat). Enabling
the parallel usage of these two tracers is in my TODO list.

Link: https://lkml.kernel.org/r/38c8f14b613492a4f3f938d9d3bf0b063b72f0f0.1635702894.git.bristot@kernel.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: linux-rt-users@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01 18:10:19 -04:00
Daniel Bristot de Oliveira
ccb6754495 tracing/osnoise: Remove TIMERLAT ifdefs from inside functions
Remove CONFIG_TIMERLAT_TRACER from inside functions, avoiding
compilation problems in the future.

Link: https://lkml.kernel.org/r/8245abb5a112d249f5da6c1df499244ad9e647bc.1635702894.git.bristot@kernel.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: linux-rt-users@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01 18:10:15 -04:00
Daniel Bristot de Oliveira
dae181349f tracing/osnoise: Support a list of trace_array *tr
osnoise/timerlat were built to run a single instance, and for this,
a single variable is enough to store the current struct trace_array
*tr with information about the tracing instance. This is done via
the *osnoise_trace variable. A trace_array represents a trace instance.

In preparation to support multiple instances, replace the
*osnoise_trace variable with an RCU protected list of instances.

The operations that refer to an instance now propagate to all
elements of the list (all instances).

Also, replace the osnoise_busy variable with a check if the list
has elements (busy).

No functional change is expected with this patch, i.e., only one
instance is allowed yet.

Link: https://lkml.kernel.org/r/91d006e889b9a5d1ff258fe6077f021ae3f26372.1635702894.git.bristot@kernel.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: linux-rt-users@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01 18:10:09 -04:00
Daniel Bristot de Oliveira
2bd1bdf01f tracing/osnoise: Use start/stop_per_cpu_kthreads() on osnoise_cpus_write()
When writing a new CPU mask via osnoise/cpus, if the tracer is running,
the workload is restarted to follow the new cpumask. The restart is
currently done using osnoise_workload_start/stop(), which disables the
workload *and* the instrumentation. However, disabling the
instrumentation is not necessary.

Calling start/stop_per_cpu_kthreads() is enough to apply the new
osnoise/cpus config.

Link: https://lkml.kernel.org/r/ee633e82867c5b88851aa6040522a799c0034486.1635702894.git.bristot@kernel.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: linux-rt-users@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01 18:10:05 -04:00
Daniel Bristot de Oliveira
15ca4bdb03 tracing/osnoise: Split workload start from the tracer start
In preparation from supporting multiple trace instances, create
workload start/stop specific functions.

No functional change.

Link: https://lkml.kernel.org/r/74b090971e9acdd13625be1c28ef3270d2275e77.1635702894.git.bristot@kernel.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: linux-rt-users@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01 18:10:00 -04:00
Daniel Bristot de Oliveira
c3b6343c0d tracing/osnoise: Improve comments about barrier need for NMI callbacks
trace_osnoise_callback_enabled is used by ftrace_nmi_enter/exit()
to know when to call the NMI callback. The barrier is used to
avoid having callbacks enabled before the resetting date during
the start or to touch the values after stopping the tracer.

Link: https://lkml.kernel.org/r/a413b8f14aa9312fbd1ba99f96225a8aed831053.1635702894.git.bristot@kernel.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: linux-rt-users@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01 18:09:52 -04:00
Daniel Bristot de Oliveira
66df27f19f tracing/osnoise: Do not follow tracing_cpumask
In preparation to support multiple instances, decouple the
osnoise/timelat workload from instance-specific tracing_cpumask.

Different instances can have conflicting cpumasks, making osnoise
workload management needlessly complex. Osnoise already has its
global cpumask.

I also thought about using the first instance mask, but the
"first" instance could be removed before the others.

This also fixes the problem that changing the tracing_mask was not
re-starting the trace.

Link: https://lkml.kernel.org/r/169a71bcc919ce3ab53ae6f9ca5cde57fffaf9c6.1635702894.git.bristot@kernel.org

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: linux-rt-users@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01 18:09:31 -04:00
Hou Tao
35346ab641 bpf: Factor out helpers for ctx access checking
Factor out two helpers to check the read access of ctx for raw tp
and BTF function. bpf_tracing_ctx_access() is used to check
the read access to argument is valid, and bpf_tracing_btf_ctx_access()
checks whether the btf type of argument is valid besides the checking
of argument read. bpf_tracing_btf_ctx_access() will be used by the
following patch.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20211025064025.2567443-3-houtao1@huawei.com
2021-11-01 14:10:00 -07:00
Linus Torvalds
33c8846c81 Merge tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:

 - mq-deadline accounting improvements (Bart)

 - blk-wbt timer fix (Andrea)

 - Untangle the block layer includes (Christoph)

 - Rework the poll support to be bio based, which will enable adding
   support for polling for bio based drivers (Christoph)

 - Block layer core support for multi-actuator drives (Damien)

 - blk-crypto improvements (Eric)

 - Batched tag allocation support (me)

 - Request completion batching support (me)

 - Plugging improvements (me)

 - Shared tag set improvements (John)

 - Concurrent queue quiesce support (Ming)

 - Cache bdev in ->private_data for block devices (Pavel)

 - bdev dio improvements (Pavel)

 - Block device invalidation and block size improvements (Xie)

 - Various cleanups, fixes, and improvements (Christoph, Jackie,
   Masahira, Tejun, Yu, Pavel, Zheng, me)

* tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-block: (174 commits)
  blk-mq-debugfs: Show active requests per queue for shared tags
  block: improve readability of blk_mq_end_request_batch()
  virtio-blk: Use blk_validate_block_size() to validate block size
  loop: Use blk_validate_block_size() to validate block size
  nbd: Use blk_validate_block_size() to validate block size
  block: Add a helper to validate the block size
  block: re-flow blk_mq_rq_ctx_init()
  block: prefetch request to be initialized
  block: pass in blk_mq_tags to blk_mq_rq_ctx_init()
  block: add rq_flags to struct blk_mq_alloc_data
  block: add async version of bio_set_polled
  block: kill DIO_MULTI_BIO
  block: kill unused polling bits in __blkdev_direct_IO()
  block: avoid extra iter advance with async iocb
  block: Add independent access ranges support
  blk-mq: don't issue request directly in case that current is to be blocked
  sbitmap: silence data race warning
  blk-cgroup: synchronize blkg creation against policy deactivation
  block: refactor bio_iov_bvec_set()
  block: add single bio async direct IO helper
  ...
2021-11-01 09:19:50 -07:00
kernel test robot
feea69ec12 tracing/histogram: Fix semicolon.cocci warnings
kernel/trace/trace_events_hist.c:6039:2-3: Unneeded semicolon

 Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

Link: https://lkml.kernel.org/r/20211030005615.GA41257@3074f0d39c61

Fixes: c5eac6ee8b ("tracing/histogram: Simplify handling of .sym-offset in expressions")
CC: Kalesh Singh <kaleshsingh@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-29 22:12:38 -04:00
Steven Rostedt (VMware)
ddcf906fe5 tracing: Fix misspelling of "missing"
My snake instinct was on and I wrote "misssing" instead of "missing".

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-29 09:54:14 -04:00
Steven Rostedt (VMware)
6130722f11 ftrace: Fix kernel-doc formatting issues
Some functions had kernel-doc that used a comma instead of a hash to
separate the function name from the one line description.

Also, the "ftrace_is_dead()" had an incomplete description.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-29 09:52:23 -04:00
Jakub Kicinski
7df621a3ee Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
include/net/sock.h
  7b50ecfcc6 ("net: Rename ->stream_memory_read to ->sock_is_readable")
  4c1e34c0db ("vsock: Enable y2038 safe timeval for timeout")

drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c
  0daa55d033 ("octeontx2-af: cn10k: debugfs for dumping LMTST map table")
  e77bcdd1f6 ("octeontx2-af: Display all enabled PF VF rsrc_alloc entries.")

Adjacent code addition in both cases, keep both.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28 10:43:58 -07:00
Steven Rostedt (VMware)
7fa598f970 tracing: Do not warn when connecting eprobe to non existing event
When the syscall trace points are not configured in, the kselftests for
ftrace will try to attach an event probe (eprobe) to one of the system
call trace points. This triggered a WARNING, because the failure only
expects to see memory issues. But this is not the only failure. The user
may attempt to attach to a non existent event, and the kernel must not
warn about it.

Link: https://lkml.kernel.org/r/20211027120854.0680aa0f@gandalf.local.home

Fixes: 7491e2c442 ("tracing: Add a probe that attaches to trace events")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-27 21:47:55 -04:00
Robin H. Johnson
a90afe8d02 tracing: Show size of requested perf buffer
If the perf buffer isn't large enough, provide a hint about how large it
needs to be for whatever is running.

Link: https://lkml.kernel.org/r/20210831043723.13481-1-robbat2@gentoo.org

Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-27 12:25:09 -04:00
王贇
d33cc65737 ftrace: do CPU checking after preemption disabled
With CONFIG_DEBUG_PREEMPT we observed reports like:

  BUG: using smp_processor_id() in preemptible
  caller is perf_ftrace_function_call+0x6f/0x2e0
  CPU: 1 PID: 680 Comm: a.out Not tainted
  Call Trace:
   <TASK>
   dump_stack_lvl+0x8d/0xcf
   check_preemption_disabled+0x104/0x110
   ? optimize_nops.isra.7+0x230/0x230
   ? text_poke_bp_batch+0x9f/0x310
   perf_ftrace_function_call+0x6f/0x2e0
   ...
   __text_poke+0x5/0x620
   text_poke_bp_batch+0x9f/0x310

This telling us the CPU could be changed after task is preempted, and
the checking on CPU before preemption will be invalid.

Since now ftrace_test_recursion_trylock() will help to disable the
preemption, this patch just do the checking after trylock() to address
the issue.

Link: https://lkml.kernel.org/r/54880691-5fe2-33e7-d12f-1fa6136f5183@linux.alibaba.com

CC: Steven Rostedt <rostedt@goodmis.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Jisheng Zhang <jszhang@kernel.org>
Reported-by: Abaci <abaci@linux.alibaba.com>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-27 11:22:09 -04:00
王贇
ce5e48036c ftrace: disable preemption when recursion locked
As the documentation explained, ftrace_test_recursion_trylock()
and ftrace_test_recursion_unlock() were supposed to disable and
enable preemption properly, however currently this work is done
outside of the function, which could be missing by mistake.

And since the internal using of trace_test_and_set_recursion()
and trace_clear_recursion() also require preemption disabled, we
can just merge the logical.

This patch will make sure the preemption has been disabled when
trace_test_and_set_recursion() return bit >= 0, and
trace_clear_recursion() will enable the preemption if previously
enabled.

Link: https://lkml.kernel.org/r/13bde807-779c-aa4c-0672-20515ae365ea@linux.alibaba.com

CC: Petr Mladek <pmladek@suse.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Jisheng Zhang <jszhang@kernel.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Miroslav Benes <mbenes@suse.cz>
Reported-by: Abaci <abaci@linux.alibaba.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
[ Removed extra line in comment - SDR ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-27 11:21:49 -04:00
Kalesh Singh
722eddaa40 tracing/histogram: Optimize division by a power of 2
The division is a slow operation. If the divisor is a power of 2, use a
shift instead.

Results were obtained using Android's version of perf (simpleperf[1]) as
described below:

1. hist_field_div() is modified to call 2 test functions:
   test_hist_field_div_[not]_optimized(); passing them the
   same args. Use noinline and volatile to ensure these are
   not optimized out by the compiler.
2. Create a hist event trigger that uses division:
      events/kmem/rss_stat$ echo 'hist:keys=common_pid:x=size/<divisor>'
         >> trigger
      events/kmem/rss_stat$ echo 'hist:keys=common_pid:vals=$x'
         >> trigger
3. Run Android's lmkd_test[2] to generate rss_stat events, and
   record CPU samples with Android's simpleperf:
      simpleperf record -a --exclude-perf --post-unwind=yes -m 16384 -g
         -f 2000 -o perf.data

== Results ==

Divisor is a power of 2 (divisor == 32):

   test_hist_field_div_not_optimized  | 8,717,091 cpu-cycles
   test_hist_field_div_optimized      | 1,643,137 cpu-cycles

If the divisor is a power of 2, the optimized version is ~5.3x faster.

Divisor is not a power of 2 (divisor == 33):

   test_hist_field_div_not_optimized  | 4,444,324 cpu-cycles
   test_hist_field_div_optimized      | 5,497,958 cpu-cycles

If the divisor is not a power of 2, as expected, the optimized version is
slightly slower (~24% slower).

[1] https://android.googlesource.com/platform/system/extras/+/master/simpleperf/doc/README.md
[2] https://cs.android.com/android/platform/superproject/+/master:system/memory/lmkd/tests/lmkd_test.cpp

Link: https://lkml.kernel.org/r/20211025200852.3002369-7-kaleshsingh@google.com

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26 20:27:20 -04:00
Kalesh Singh
f47716b7a9 tracing/histogram: Covert expr to const if both operands are constants
If both operands of a hist trigger expression are constants, convert the
expression to a constant. This optimization avoids having to perform the
same calculation multiple times and also saves on memory since the
merged constants are represented by a single struct hist_field instead
or multiple.

Link: https://lkml.kernel.org/r/20211025200852.3002369-6-kaleshsingh@google.com

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26 20:27:19 -04:00
Kalesh Singh
c5eac6ee8b tracing/histogram: Simplify handling of .sym-offset in expressions
The '-' in .sym-offset can confuse the hist trigger arithmetic
expression parsing. Simplify the handling of this by replacing the
'sym-offset' with 'symXoffset'. This allows us to correctly evaluate
expressions where the user may have inadvertently added a .sym-offset
modifier to one of the operands in an expression, instead of bailing
out. In this case the .sym-offset has no effect on the evaluation of the
expression. The only valid use of the .sym-offset is as a hist key
modifier.

Link: https://lkml.kernel.org/r/20211025200852.3002369-5-kaleshsingh@google.com

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26 20:27:19 -04:00
Kalesh Singh
9710b2f341 tracing: Fix operator precedence for hist triggers expression
The current histogram expression evaluation logic evaluates the
expression from right to left. This can lead to incorrect results
if the operations are not associative (as is the case for subtraction
and, the now added, division operators).
	e.g. 16-8-4-2 should be 2 not 10 --> 16-8-4-2 = ((16-8)-4)-2
	     64/8/4/2 should be 1 not 16 --> 64/8/4/2 = ((64/8)/4)/2

Division and multiplication are currently limited to single operation
expression due to operator precedence support not yet implemented.

Rework the expression parsing to support the correct evaluation of
expressions containing operators of different precedences; and fix
the associativity error by evaluating expressions with operators of
the same precedence from left to right.

Examples:
        (1) echo 'hist:keys=common_pid:a=8,b=4,c=2,d=1,w=$a-$b-$c-$d' \
                  >> event/trigger
        (2) echo 'hist:keys=common_pid:x=$a/$b/3/2' >> event/trigger
        (3) echo 'hist:keys=common_pid:y=$a+10/$c*1024' >> event/trigger
        (4) echo 'hist:keys=common_pid:z=$a/$b+$c*$d' >> event/trigger

Link: https://lkml.kernel.org/r/20211025200852.3002369-4-kaleshsingh@google.com

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26 20:27:19 -04:00
Kalesh Singh
bcef044150 tracing: Add division and multiplication support for hist triggers
Adds basic support for division and multiplication operations for
hist trigger variable expressions.

For simplicity this patch only supports, division and multiplication
for a single operation expression (e.g. x=$a/$b), as currently
expressions are always evaluated right to left. This can lead to some
incorrect results:

	e.g. echo 'hist:keys=common_pid:x=8-4-2' >> event/trigger

	     8-4-2 should evaluate to 2 i.e. (8-4)-2
	     but currently x evaluate to  6 i.e. 8-(4-2)

Multiplication and division in sub-expressions will work correctly, once
correct operator precedence support is added (See next patch in this
series).

For the undefined case of division by 0, the histogram expression
evaluates to (u64)(-1). Since this cannot be detected when the
expression is created, it is the responsibility of the user to be
aware and account for this possibility.

Examples:
	echo 'hist:keys=common_pid:a=8,b=4,x=$a/$b' \
                   >> event/trigger

	echo 'hist:keys=common_pid:y=5*$b' \
                   >> event/trigger

Link: https://lkml.kernel.org/r/20211025200852.3002369-3-kaleshsingh@google.com

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26 20:27:19 -04:00
Kalesh Singh
52cfb37353 tracing: Add support for creating hist trigger variables from literal
Currently hist trigger expressions don't support the use of numeric
literals:
	e.g. echo 'hist:keys=common_pid:x=$y-1234'
		--> is not valid expression syntax

Having the ability to use numeric constants in hist triggers supports
a wider range of expressions for creating variables.

Add support for creating trace event histogram variables from numeric
literals.

	e.g. echo 'hist:keys=common_pid:x=1234,y=size-1024' >> event/trigger

A negative numeric constant is created, using unary minus operator
(parentheses are required).

	e.g. echo 'hist:keys=common_pid:z=-(2)' >> event/trigger

Constants can be used with division/multiplication (added in the
next patch in this series) to implement granularity filters for frequent
trace events. For instance we can limit emitting the rss_stat
trace event to when there is a 512KB cross over in the rss size:

  # Create a synthetic event to monitor instead of the high frequency
  # rss_stat event
  echo 'rss_stat_throttled unsigned int mm_id; unsigned int curr;
	int member; long size' >> tracing/synthetic_events

  # Create a hist trigger that emits the synthetic rss_stat_throttled
  # event only when the rss size crosses a 512KB boundary.
  echo 'hist:keys=keys=mm_id,member:bucket=size/0x80000:onchange($bucket)
      .rss_stat_throttled(mm_id,curr,member,size)'
        >> events/kmem/rss_stat/trigger

A use case for using constants with addition/subtraction is not yet
known, but for completeness the use of constants are supported for all
operators.

Link: https://lkml.kernel.org/r/20211025200852.3002369-2-kaleshsingh@google.com

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26 20:27:19 -04:00
Wang ShaoBo
1d62889142 tracing/hwlat: Make some internal symbols static
The sparse tool complains as follows:

kernel/trace/trace_hwlat.c:82:27: warning: symbol 'hwlat_single_cpu_data' was not declared. Should it be static?
kernel/trace/trace_hwlat.c:83:1: warning: symbol '__pcpu_scope_hwlat_per_cpu_data' was not declared. Should it be static?

This symbol is not used outside of trace_hwlat.c, so this commit
marks it static.

Link: https://lkml.kernel.org/r/20211021035225.1050685-1-bobo.shaobowang@huawei.com

Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26 09:18:28 -04:00
Mathieu Desnoyers
3c20bd3af5 tracing: Fix missing trace_boot_init_histograms kstrdup NULL checks
trace_boot_init_histograms misses NULL pointer checks for kstrdup
failure.

Link: https://lkml.kernel.org/r/20211015195550.22742-1-mathieu.desnoyers@efficios.com

Fixes: 64dc7f6958 ("tracing/boot: Show correct histogram error command")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26 09:18:10 -04:00
Daniel Bristot de Oliveira
aeafcb82d9 trace/timerlat: Add migrate-disabled field to the timerlat header
Since "54357f0c9149 tracing: Add migrate-disabled counter to tracing
output," the migrate disabled field is also printed in the !PREEMPR_RT
kernel config. While this information was added to the vast majority of
tracers, osnoise and timerlat were not updated (because they are new
tracers).

Fix timerlat header by adding the information about migrate disabled.

Link: https://lkml.kernel.org/r/bc0c234ab49946cdd63effa6584e1d5e8662cb44.1634308385.git.bristot@kernel.org

Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: x86@kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Fixes: 54357f0c91 ("tracing: Add migrate-disabled counter to tracing output.")
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-25 23:02:36 -04:00
Daniel Bristot de Oliveira
e0f3b18be7 trace/osnoise: Add migrate-disabled field to the osnoise header
Since "54357f0c9149 tracing: Add migrate-disabled counter to tracing
output," the migrate disabled field is also printed in the !PREEMPR_RT
kernel config. While this information was added to the vast majority of
tracers, osnoise and timerlat were not updated (because they are new
tracers).

Fix osnoise header by adding the information about migrate disabled.

Link: https://lkml.kernel.org/r/9cb3d54e29e0588dbba12e81486bd8a09adcd8ca.1634308385.git.bristot@kernel.org

Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: x86@kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Fixes: 54357f0c91 ("tracing: Add migrate-disabled counter to tracing output.")
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-25 23:02:36 -04:00
chongjiapeng
172f7ba977 ftrace: Make ftrace_profile_pages_init static
This symbol is not used outside of ftrace.c, so marks it static.

Fixes the following sparse warning:

kernel/trace/ftrace.c:579:5: warning: symbol 'ftrace_profile_pages_init'
was not declared. Should it be static?

Link: https://lkml.kernel.org/r/1634640534-18280-1-git-send-email-jiapeng.chong@linux.alibaba.com

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Fixes: cafb168a1c ("tracing: make the function profiler per cpu")
Signed-off-by: chongjiapeng <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-25 22:27:07 -04:00
David S. Miller
bdfa75ad70 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Lots of simnple overlapping additions.

With a build fix from Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-22 11:41:16 +01:00
Hengqi Chen
9eeb3aa33a bpf: Add bpf_skc_to_unix_sock() helper
The helper is used in tracing programs to cast a socket
pointer to a unix_sock pointer.
The return value could be NULL if the casting is illegal.

Suggested-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211021134752.1223426-2-hengqi.chen@gmail.com
2021-10-21 15:11:06 -07:00
Arnd Bergmann
8720aeecc2 tracing: use %ps format string to print symbols
clang started warning about excessive stack usage in
hist_trigger_print_key()

kernel/trace/trace_events_hist.c:4723:13: error: stack frame size (1336) exceeds limit (1024) in function 'hist_trigger_print_key' [-Werror,-Wframe-larger-than]

The problem is that there are two 512-byte arrays on the stack if
hist_trigger_stacktrace_print() gets inlined. I don't think this has
changed in the past five years, but something probably changed the
inlining decisions made by the compiler, so the problem is now made
more obvious.

Rather than printing the symbol names into separate buffers, it
seems we can simply use the special %ps format string modifier
to print the pointers symbolically and get rid of both buffers.

Marking hist_trigger_stacktrace_print() would be a simpler
way of avoiding the warning, but that would not address the
excessive stack usage.

Link: https://lkml.kernel.org/r/20211019153337.294790-1-arnd@kernel.org

Fixes: 69a0200c2e ("tracing: Add hist trigger support for stacktraces as keys")
Link: https://lore.kernel.org/all/20211015095704.49a99859@gandalf.local.home/
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Tested-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-21 14:19:01 -04:00
Steven Rostedt (VMware)
ed29271894 ftrace/direct: Do not disable when switching direct callers
Currently to switch a set of "multi" direct trampolines from one
trampoline to another, a full shutdown of the current set needs to be
done, followed by an update to what trampoline the direct callers would
call, and then re-enabling the callers. This leaves a time when the
functions will not be calling anything, and events may be missed.

Instead, use a trick to allow all the functions with direct trampolines
attached will always call either the new or old trampoline while the
switch is happening. To do this, first attach a "dummy" callback via
ftrace to all the functions that the current direct trampoline is attached
to. This will cause the functions to call the "list func" instead of the
direct trampoline. The list function will call the direct trampoline
"helper" that will set the function it should call as it returns back to
the ftrace trampoline.

At this moment, the direct caller descriptor can safely update the direct
call trampoline. The list function will pick either the new or old
function (depending on the memory coherency model of the architecture).

Now removing the dummy function from each of the locations of the direct
trampoline caller, will put back the direct call, but now to the new
trampoline.

A better visual is:

[ Changing direct call from my_direct_1 to my_direct_2 ]

  <traced_func>:
     call my_direct_1

 ||||||||||||||||||||
 vvvvvvvvvvvvvvvvvvvv

  <traced_func>:
     call ftrace_caller

  <ftrace_caller>:
    [..]
    call ftrace_ops_list_func

	ftrace_ops_list_func()
	{
		ops->func() -> direct_helper -> set rax to my_direct_1 or my_direct_2
	}

   call rax (to either my_direct_1 or my_direct_2

 ||||||||||||||||||||
 vvvvvvvvvvvvvvvvvvvv

  <traced_func>:
     call my_direct_2

Link: https://lore.kernel.org/all/20211014162819.5c85618b@gandalf.local.home/

Acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-21 14:19:00 -04:00
Jiri Olsa
ccf5a89efd ftrace: Add multi direct modify interface
Adding interface to modify registered direct function
for ftrace_ops. Adding following function:

   modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)

The function changes the currently registered direct
function for all attached functions.

Link: https://lkml.kernel.org/r/20211008091336.33616-8-jolsa@kernel.org

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-21 14:19:00 -04:00
Jiri Olsa
f64dd4627e ftrace: Add multi direct register/unregister interface
Adding interface to register multiple direct functions
within single call. Adding following functions:

  register_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)
  unregister_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)

The register_ftrace_direct_multi registers direct function (addr)
with all functions in ops filter. The ops filter can be updated
before with ftrace_set_filter_ip calls.

All requested functions must not have direct function currently
registered, otherwise register_ftrace_direct_multi will fail.

The unregister_ftrace_direct_multi unregisters ops related direct
functions.

Link: https://lkml.kernel.org/r/20211008091336.33616-7-jolsa@kernel.org

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-21 14:19:00 -04:00
Jiri Olsa
1904a81445 ftrace: Add ftrace_add_rec_direct function
Factor out the code that adds (ip, addr) tuple to direct_functions
hash in new ftrace_add_rec_direct function. It will be used in
following patches.

Link: https://lkml.kernel.org/r/20211008091336.33616-6-jolsa@kernel.org

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-21 14:19:00 -04:00
Steven Rostedt (VMware)
4e341cad6b tracing: Fix selftest config check for function graph start up test
There's a new test in trace_selftest_startup_function_graph() that
requires the use of ftrace args being supported as well does some tricks
with dynamic tracing. Although this code checks HAVE_DYNAMIC_FTRACE_WITH_ARGS
it fails to check DYNAMIC_FTRACE, and the kernel fails to build due to
that dependency.

Also only define the prototype of trace_direct_tramp() if it is used.

Link: https://lkml.kernel.org/r/20211021134357.7f48e173@gandalf.local.home

Acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-21 14:18:48 -04:00
Jiri Olsa
130c080658 tracing: Add trampoline/graph selftest
Adding selftest for checking that direct trampoline can
co-exist together with graph tracer on same function.

This is supported for CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
config option, which is defined only for x86_64 for now.

Link: https://lkml.kernel.org/r/20211008091336.33616-5-jolsa@kernel.org

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-20 23:44:44 -04:00
Steven Rostedt (VMware)
0c0593b45c x86/ftrace: Make function graph use ftrace directly
We don't need special hook for graph tracer entry point,
but instead we can use graph_ops::func function to install
the return_hooker.

This moves the graph tracing setup _before_ the direct
trampoline prepares the stack, so the return_hooker will
be called when the direct trampoline is finished.

This simplifies the code, because we don't need to take into
account the direct trampoline setup when preparing the graph
tracer hooker and we can allow function graph tracer on entries
registered with direct trampoline.

Link: https://lkml.kernel.org/r/20211008091336.33616-4-jolsa@kernel.org

[fixed compile error reported by kernel test robot <lkp@intel.com>]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-20 23:44:43 -04:00
Steven Rostedt (VMware)
91ebe8bcbf tracing/perf: Add interrupt_context_level() helper
Now that there are three different instances of doing the addition trick
to the preempt_count() and NMI_MASK, HARDIRQ_MASK and SOFTIRQ_OFFSET
macros, it deserves a helper function defined in the preempt.h header.

Add the interrupt_context_level() helper and replace the three instances
that do that logic with it.

Link: https://lore.kernel.org/all/20211015142541.4badd8a9@gandalf.local.home/

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-19 20:33:20 -04:00
Steven Rostedt (VMware)
9b84fadc44 tracing: Reuse logic from perf's get_recursion_context()
Instead of having branches that adds noise to the branch prediction, use
the addition logic to set the bit for the level of interrupt context that
the state is currently in. This copies the logic from perf's
get_recursion_context() function.

Link: https://lore.kernel.org/all/20211015161702.GF174703@worktop.programming.kicks-ass.net/

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-19 20:33:20 -04:00
Kalesh Singh
7ce1bb83a1 tracing/cfi: Fix cmp_entries_* functions signature mismatch
If CONFIG_CFI_CLANG=y, attempting to read an event histogram will cause
the kernel to panic due to failed CFI check.

    1. echo 'hist:keys=common_pid' >> events/sched/sched_switch/trigger
    2. cat events/sched/sched_switch/hist
    3. kernel panics on attempting to read hist

This happens because the sort() function expects a generic
int (*)(const void *, const void *) pointer for the compare function.
To prevent this CFI failure, change tracing map cmp_entries_* function
signatures to match this.

Also, fix the build error reported by the kernel test robot [1].

[1] https://lore.kernel.org/r/202110141140.zzi4dRh4-lkp@intel.com/

Link: https://lkml.kernel.org/r/20211014045217.3265162-1-kaleshsingh@google.com

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-19 20:33:20 -04:00
Steven Rostedt (VMware)
34cdd18b8d tracing: Use linker magic instead of recasting ftrace_ops_list_func()
In an effort to enable -Wcast-function-type in the top-level Makefile to
support Control Flow Integrity builds, all function casts need to be
removed.

This means that ftrace_ops_list_func() can no longer be defined as
ftrace_ops_no_ops(). The reason for ftrace_ops_no_ops() is to use that when
an architecture calls ftrace_ops_list_func() with only two parameters
(called from assembly). And to make sure there's no C side-effects, those
archs call ftrace_ops_no_ops() which only has two parameters, as
ftrace_ops_list_func() has four parameters.

Instead of a typecast, use vmlinux.lds.h to define ftrace_ops_list_func() to
arch_ftrace_ops_list_func() that will define the proper set of parameters.

Link: https://lore.kernel.org/r/20200614070154.6039-1-oscar.carter@gmx.com
Link: https://lkml.kernel.org/r/20200617165616.52241bde@oasis.local.home
Link: https://lore.kernel.org/all/20211005053922.GA702049@embeddedor/

Requested-by: Oscar Carter <oscar.carter@gmx.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-19 20:33:12 -04:00
Christoph Hellwig
8a7d267b4a block: don't call blk_status_to_errno in blk_update_request
We only need to call it to resolve the blk_status_t -> errno mapping for
tracing, so move the conversion into the tracepoints that are not called
at all when tracing isn't enabled.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:54:57 -06:00
Steven Rostedt (VMware)
ed65df63a3 tracing: Have all levels of checks prevent recursion
While writing an email explaining the "bit = 0" logic for a discussion on
making ftrace_test_recursion_trylock() disable preemption, I discovered a
path that makes the "not do the logic if bit is zero" unsafe.

The recursion logic is done in hot paths like the function tracer. Thus,
any code executed causes noticeable overhead. Thus, tricks are done to try
to limit the amount of code executed. This included the recursion testing
logic.

Having recursion testing is important, as there are many paths that can
end up in an infinite recursion cycle when tracing every function in the
kernel. Thus protection is needed to prevent that from happening.

Because it is OK to recurse due to different running context levels (e.g.
an interrupt preempts a trace, and then a trace occurs in the interrupt
handler), a set of bits are used to know which context one is in (normal,
softirq, irq and NMI). If a recursion occurs in the same level, it is
prevented*.

Then there are infrastructure levels of recursion as well. When more than
one callback is attached to the same function to trace, it calls a loop
function to iterate over all the callbacks. Both the callbacks and the
loop function have recursion protection. The callbacks use the
"ftrace_test_recursion_trylock()" which has a "function" set of context
bits to test, and the loop function calls the internal
trace_test_and_set_recursion() directly, with an "internal" set of bits.

If an architecture does not implement all the features supported by ftrace
then the callbacks are never called directly, and the loop function is
called instead, which will implement the features of ftrace.

Since both the loop function and the callbacks do recursion protection, it
was seemed unnecessary to do it in both locations. Thus, a trick was made
to have the internal set of recursion bits at a more significant bit
location than the function bits. Then, if any of the higher bits were set,
the logic of the function bits could be skipped, as any new recursion
would first have to go through the loop function.

This is true for architectures that do not support all the ftrace
features, because all functions being traced must first go through the
loop function before going to the callbacks. But this is not true for
architectures that support all the ftrace features. That's because the
loop function could be called due to two callbacks attached to the same
function, but then a recursion function inside the callback could be
called that does not share any other callback, and it will be called
directly.

i.e.

 traced_function_1: [ more than one callback tracing it ]
   call loop_func

 loop_func:
   trace_recursion set internal bit
   call callback

 callback:
   trace_recursion [ skipped because internal bit is set, return 0 ]
   call traced_function_2

 traced_function_2: [ only traced by above callback ]
   call callback

 callback:
   trace_recursion [ skipped because internal bit is set, return 0 ]
   call traced_function_2

 [ wash, rinse, repeat, BOOM! out of shampoo! ]

Thus, the "bit == 0 skip" trick is not safe, unless the loop function is
call for all functions.

Since we want to encourage architectures to implement all ftrace features,
having them slow down due to this extra logic may encourage the
maintainers to update to the latest ftrace features. And because this
logic is only safe for them, remove it completely.

 [*] There is on layer of recursion that is allowed, and that is to allow
     for the transition between interrupt context (normal -> softirq ->
     irq -> NMI), because a trace may occur before the context update is
     visible to the trace recursion logic.

Link: https://lore.kernel.org/all/609b565a-ed6e-a1da-f025-166691b5d994@linux.alibaba.com/
Link: https://lkml.kernel.org/r/20211018154412.09fcad3c@gandalf.local.home

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Jisheng Zhang <jszhang@kernel.org>
Cc: =?utf-8?b?546L6LSH?= <yun.wang@linux.alibaba.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: stable@vger.kernel.org
Fixes: edc15cafcb ("tracing: Avoid unnecessary multiple recursion checks")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-18 18:12:09 -04:00
Steven Rostedt (VMware)
7d5fda1c84 tracing: Fix event probe removal from dynamic events
When an event probe is to be removed via the API that created it via the
dynamic events, an -ENOENT error is returned.

This is because the removal of the event probe does not expect to see the
event system and name that the event probe is attached to, even though
that's part of the API to create it. As the removal of probes is to use
the same API as they are created.

In fact, the removal is not consistent with the kprobes and uprobes
removal. Fix that by allowing various ways to remove the eprobe.

The eprobe is created with:

 e:[GROUP/]NAME SYSTEM/EVENT [OPTIONS]

Have it get removed by echoing in the following into dynamic_events:

 # Remove all eprobes with NAME
 echo '-:NAME' >> dynamic_events

 # Remove a specific eprobe
 echo '-:GROUP/NAME' >> dynamic_events
 echo '-:GROUP/NAME SYSTEM/EVENT' >> dynamic_events
 echo '-:NAME SYSTEM/EVENT' >> dynamic_events
 echo '-:GROUP/NAME SYSTEM/EVENT OPTIONS' >> dynamic_events
 echo '-:NAME SYSTEM/EVENT OPTIONS' >> dynamic_events

Link: https://lkml.kernel.org/r/20211012081925.0e19cc4f@gandalf.local.home
Link: https://lkml.kernel.org/r/20211013205533.630722129@goodmis.org

Suggested-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 7491e2c442 ("tracing: Add a probe that attaches to trace events")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-13 19:26:57 -04:00
Changbin Du
affc659246 tracing: in_irq() cleanup
Replace the obsolete and ambiguos macro in_irq() with new
macro in_hardirq().

Link: https://lkml.kernel.org/r/20210930000342.6016-1-changbin.du@gmail.com

Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-13 18:19:41 -04:00