Commit Graph

5433 Commits

Author SHA1 Message Date
Zeng Heng
01c44bf833 rv/monitors: add 'static' qualifier for local symbols
The sparse tool complains as follows:

kernel/trace/rv/monitors/wwnr/wwnr.c:18:19:
warning: symbol 'rv_wwnr' was not declared. Should it be static?

The `rv_wwnr` symbol is not dereferenced by other extern files,
so add static qualifier for it.

So does wip module.

Link: https://lkml.kernel.org/r/20220824034357.2014202-2-zengheng4@huawei.com

Cc: <mingo@redhat.com>
Fixes:	ccc319dcb4 ("rv/monitor: Add the wwnr monitor")
Fixes:	8812d21219 ("rv/monitor: Add the wip monitor skeleton created by dot2k")
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26 13:01:09 -04:00
Masami Hiramatsu (Google)
752be5c5c9 tracing/eprobe: Add eprobe filter support
Add the filter option to the event probe. This is useful if user wants
to derive a new event based on the condition of the original event.

E.g.
 echo 'e:egroup/stat_runtime_4core sched/sched_stat_runtime \
        runtime=$runtime:u32 if cpu < 4' >> ../dynamic_events

Then it can filter the events only on first 4 cores.
Note that the fields used for 'if' must be the fields in the original
events, not eprobe events.

Link: https://lkml.kernel.org/r/165932114513.2850673.2592206685744598080.stgit@devnote2

Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26 13:01:08 -04:00
Jiri Olsa
05b24ff9b2 bpf: Prevent bpf program recursion for raw tracepoint probes
We got report from sysbot [1] about warnings that were caused by
bpf program attached to contention_begin raw tracepoint triggering
the same tracepoint by using bpf_trace_printk helper that takes
trace_printk_lock lock.

 Call Trace:
  <TASK>
  ? trace_event_raw_event_bpf_trace_printk+0x5f/0x90
  bpf_trace_printk+0x2b/0xe0
  bpf_prog_a9aec6167c091eef_prog+0x1f/0x24
  bpf_trace_run2+0x26/0x90
  native_queued_spin_lock_slowpath+0x1c6/0x2b0
  _raw_spin_lock_irqsave+0x44/0x50
  bpf_trace_printk+0x3f/0xe0
  bpf_prog_a9aec6167c091eef_prog+0x1f/0x24
  bpf_trace_run2+0x26/0x90
  native_queued_spin_lock_slowpath+0x1c6/0x2b0
  _raw_spin_lock_irqsave+0x44/0x50
  bpf_trace_printk+0x3f/0xe0
  bpf_prog_a9aec6167c091eef_prog+0x1f/0x24
  bpf_trace_run2+0x26/0x90
  native_queued_spin_lock_slowpath+0x1c6/0x2b0
  _raw_spin_lock_irqsave+0x44/0x50
  bpf_trace_printk+0x3f/0xe0
  bpf_prog_a9aec6167c091eef_prog+0x1f/0x24
  bpf_trace_run2+0x26/0x90
  native_queued_spin_lock_slowpath+0x1c6/0x2b0
  _raw_spin_lock_irqsave+0x44/0x50
  __unfreeze_partials+0x5b/0x160
  ...

The can be reproduced by attaching bpf program as raw tracepoint on
contention_begin tracepoint. The bpf prog calls bpf_trace_printk
helper. Then by running perf bench the spin lock code is forced to
take slow path and call contention_begin tracepoint.

Fixing this by skipping execution of the bpf program if it's
already running, Using bpf prog 'active' field, which is being
currently used by trampoline programs for the same reason.

Moving bpf_prog_inc_misses_counter to syscall.c because
trampoline.c is compiled in just for CONFIG_BPF_JIT option.

Reviewed-by: Stanislav Fomichev <sdf@google.com>
Reported-by: syzbot+2251879aa068ad9c960d@syzkaller.appspotmail.com
[1] https://lore.kernel.org/bpf/YxhFe3EwqchC%2FfYf@krava/T/#t
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220916071914.7156-1-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-21 18:05:44 -07:00
Roberto Sassu
865b0566d8 bpf: Add bpf_verify_pkcs7_signature() kfunc
Add the bpf_verify_pkcs7_signature() kfunc, to give eBPF security modules
the ability to check the validity of a signature against supplied data, by
using user-provided or system-provided keys as trust anchor.

The new kfunc makes it possible to enforce mandatory policies, as eBPF
programs might be allowed to make security decisions only based on data
sources the system administrator approves.

The caller should provide the data to be verified and the signature as eBPF
dynamic pointers (to minimize the number of parameters) and a bpf_key
structure containing a reference to the keyring with keys trusted for
signature verification, obtained from bpf_lookup_user_key() or
bpf_lookup_system_key().

For bpf_key structures obtained from the former lookup function,
bpf_verify_pkcs7_signature() completes the permission check deferred by
that function by calling key_validate(). key_task_permission() is already
called by the PKCS#7 code.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Acked-by: KP Singh <kpsingh@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20220920075951.929132-9-roberto.sassu@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-21 17:32:49 -07:00
Roberto Sassu
f3cf4134c5 bpf: Add bpf_lookup_*_key() and bpf_key_put() kfuncs
Add the bpf_lookup_user_key(), bpf_lookup_system_key() and bpf_key_put()
kfuncs, to respectively search a key with a given key handle serial number
and flags, obtain a key from a pre-determined ID defined in
include/linux/verification.h, and cleanup.

Introduce system_keyring_id_check() to validate the keyring ID parameter of
bpf_lookup_system_key().

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20220920075951.929132-8-roberto.sassu@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-21 17:32:49 -07:00
Peter Zijlstra (Intel)
9440155ccb ftrace: Add HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
x86 will shortly start using -fpatchable-function-entry for purposes
other than ftrace, make sure the __patchable_function_entry section
isn't merged in the mcount_loc section.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220903131154.420467-2-jolsa@kernel.org
2022-09-16 22:16:48 +02:00
Xiu Jianfeng
93d71986a6 rv/reactor: add __init/__exit annotations to module init/exit funcs
Add missing __init/__exit annotations to module init/exit funcs.

Link: https://lkml.kernel.org/r/20220906141210.132607-1-xiujianfeng@huawei.com

Fixes: 135b881ea8 ("rv/reactor: Add the printk reactor")
Fixes: e88043c0ac ("rv/reactor: Add the panic reactor")
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-06 22:26:00 -04:00
Masami Hiramatsu (Google)
cecf8e128e tracing: Fix to check event_mutex is held while accessing trigger list
Since the check_user_trigger() is called outside of RCU
read lock, this list_for_each_entry_rcu() caused a suspicious
RCU usage warning.

 # echo hist:keys=pid > events/sched/sched_stat_runtime/trigger
 # cat events/sched/sched_stat_runtime/trigger
[   43.167032]
[   43.167418] =============================
[   43.167992] WARNING: suspicious RCU usage
[   43.168567] 5.19.0-rc5-00029-g19ebe4651abf #59 Not tainted
[   43.169283] -----------------------------
[   43.169863] kernel/trace/trace_events_trigger.c:145 RCU-list traversed in non-reader section!!
...

However, this file->triggers list is safe when it is accessed
under event_mutex is held.
To fix this warning, adds a lockdep_is_held check to the
list_for_each_entry_rcu().

Link: https://lkml.kernel.org/r/166226474977.223837.1992182913048377113.stgit@devnote2

Cc: stable@vger.kernel.org
Fixes: 7491e2c442 ("tracing: Add a probe that attaches to trace events")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-06 22:26:00 -04:00
Yipeng Zou
54c3931957 tracing: hold caller_addr to hardirq_{enable,disable}_ip
Currently, The arguments passing to lockdep_hardirqs_{on,off} was fixed
in CALLER_ADDR0.
The function trace_hardirqs_on_caller should have been intended to use
caller_addr to represent the address that caller wants to be traced.

For example, lockdep log in riscv showing the last {enabled,disabled} at
__trace_hardirqs_{on,off} all the time(if called by):
[   57.853175] hardirqs last  enabled at (2519): __trace_hardirqs_on+0xc/0x14
[   57.853848] hardirqs last disabled at (2520): __trace_hardirqs_off+0xc/0x14

After use trace_hardirqs_xx_caller, we can get more effective information:
[   53.781428] hardirqs last  enabled at (2595): restore_all+0xe/0x66
[   53.782185] hardirqs last disabled at (2596): ret_from_exception+0xa/0x10

Link: https://lkml.kernel.org/r/20220901104515.135162-2-zouyipeng@huawei.com

Cc: stable@vger.kernel.org
Fixes: c3bc8fd637 ("tracing: Centralize preemptirq tracepoints and unify their usage")
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-06 22:26:00 -04:00
Daniel Bristot de Oliveira
baf2c00240 rv/monitors: Make monitor's automata definition static
Monitor's automata definition is only used locally, so make
them static for all existing monitors.

Link: https://lore.kernel.org/all/202208210332.gtHXje45-lkp@intel.com
Link: https://lore.kernel.org/all/202208210358.6HH3OrVs-lkp@intel.com
Link: https://lkml.kernel.org/r/a50e27c3738d6ef809f4201857229fed64799234.1661266564.git.bristot@kernel.org

Fixes: ccc319dcb4 ("rv/monitor: Add the wwnr monitor")
Fixes: 8812d21219 ("rv/monitor: Add the wip monitor skeleton created by dot2k")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-06 22:13:25 -04:00
Wang Jingjin
123d645577 ftrace: Fix build warning for ops_references_rec() not used
The change that made IPMODIFY and DIRECT ops work together needed access
to the ops_references_ip() function, which it pulled out of the module
only code. But now if both CONFIG_MODULES and
CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS is not set, we get the below
warning:

    ‘ops_references_rec’ defined but not used.

Since ops_references_rec() only calls ops_references_ip() replace the
usage of ops_references_rec() with ops_references_ip() and encompass the
function with an #ifdef of DIRECT_CALLS || MODULES being defined.

Link: https://lkml.kernel.org/r/20220801084745.1187987-1-wangjingjin1@huawei.com

Fixes: 53cd885bc5 ("ftrace: Allow IPMODIFY and DIRECT ops on the same function")
Signed-off-by: Wang Jingjin <wangjingjin1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-22 09:41:12 -04:00
Linus Torvalds
7fb312d225 Various fixes for tracing:
- Fix a return value of traceprobe_parse_event_name()
 
  - Fix NULL pointer dereference from failed ftrace enabling
 
  - Fix NULL pointer dereference when asking for registers from eprobes
 
  - Make eprobes consistent with kprobes/uprobes, filters and histograms
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYwKRrhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qosDAP9WySmPxjoMfR0hbjmnepLy2zJtBbIq
 ABWR3LDrjvLlYwD9H/wrD+6ctOZtXh5XJc0Vn5z6XEyNtqrVGSse7Lm+sg4=
 =qb/R
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Various fixes for tracing:

   - Fix a return value of traceprobe_parse_event_name()

   - Fix NULL pointer dereference from failed ftrace enabling

   - Fix NULL pointer dereference when asking for registers from eprobes

   - Make eprobes consistent with kprobes/uprobes, filters and
     histograms"

* tag 'trace-v6.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Have filter accept "common_cpu" to be consistent
  tracing/probes: Have kprobes and uprobes use $COMM too
  tracing/eprobes: Have event probes be consistent with kprobes and uprobes
  tracing/eprobes: Fix reading of string fields
  tracing/eprobes: Do not hardcode $comm as a string
  tracing/eprobes: Do not allow eprobes to use $stack, or % for regs
  ftrace: Fix NULL pointer dereference in is_ftrace_trampoline when ftrace is dead
  tracing/perf: Fix double put of trace event when init fails
  tracing: React to error return from traceprobe_parse_event_name()
2022-08-21 14:49:42 -07:00
Steven Rostedt (Google)
b2380577d4 tracing: Have filter accept "common_cpu" to be consistent
Make filtering consistent with histograms. As "cpu" can be a field of an
event, allow for "common_cpu" to keep it from being confused with the
"cpu" field of the event.

Link: https://lkml.kernel.org/r/20220820134401.513062765@goodmis.org
Link: https://lore.kernel.org/all/20220820220920.e42fa32b70505b1904f0a0ad@kernel.org/

Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Fixes: 1e3bac71c5 ("tracing/histogram: Rename "cpu" to "common_cpu"")
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-21 15:56:08 -04:00
Steven Rostedt (Google)
ab8384442e tracing/probes: Have kprobes and uprobes use $COMM too
Both $comm and $COMM can be used to get current->comm in eprobes and the
filtering and histogram logic. Make kprobes and uprobes consistent in this
regard and allow both $comm and $COMM as well. Currently kprobes and
uprobes only handle $comm, which is inconsistent with the other utilities,
and can be confusing to users.

Link: https://lkml.kernel.org/r/20220820134401.317014913@goodmis.org
Link: https://lore.kernel.org/all/20220820220442.776e1ddaf8836e82edb34d01@kernel.org/

Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Fixes: 533059281e ("tracing: probeevent: Introduce new argument fetching code")
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-21 15:56:08 -04:00
Steven Rostedt (Google)
6a832ec3d6 tracing/eprobes: Have event probes be consistent with kprobes and uprobes
Currently, if a symbol "@" is attempted to be used with an event probe
(eprobes), it will cause a NULL pointer dereference crash.

Both kprobes and uprobes can reference data other than the main registers.
Such as immediate address, symbols and the current task name. Have eprobes
do the same thing.

For "comm", if "comm" is used and the event being attached to does not
have the "comm" field, then make it the "$comm" that kprobes has. This is
consistent to the way histograms and filters work.

Link: https://lkml.kernel.org/r/20220820134401.136924220@goodmis.org

Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Fixes: 7491e2c442 ("tracing: Add a probe that attaches to trace events")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-21 15:56:08 -04:00
Steven Rostedt (Google)
f04dec9346 tracing/eprobes: Fix reading of string fields
Currently when an event probe (eprobe) hooks to a string field, it does
not display it as a string, but instead as a number. This makes the field
rather useless. Handle the different kinds of strings, dynamic, static,
relational/dynamic etc.

Now when a string field is used, the ":string" type can be used to display
it:

  echo "e:sw sched/sched_switch comm=$next_comm:string" > dynamic_events

Link: https://lkml.kernel.org/r/20220820134400.959640191@goodmis.org

Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Fixes: 7491e2c442 ("tracing: Add a probe that attaches to trace events")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-21 15:56:08 -04:00
Steven Rostedt (Google)
02333de90e tracing/eprobes: Do not hardcode $comm as a string
The variable $comm is hard coded as a string, which is true for both
kprobes and uprobes, but for event probes (eprobes) it is a field name. In
most cases the "comm" field would be a string, but there's no guarantee of
that fact.

Do not assume that comm is a string. Not to mention, it currently forces
comm fields to fault, as string processing for event probes is currently
broken.

Link: https://lkml.kernel.org/r/20220820134400.756152112@goodmis.org

Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Fixes: 7491e2c442 ("tracing: Add a probe that attaches to trace events")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-21 15:56:08 -04:00
Steven Rostedt (Google)
2673c60ee6 tracing/eprobes: Do not allow eprobes to use $stack, or % for regs
While playing with event probes (eprobes), I tried to see what would
happen if I attempted to retrieve the instruction pointer (%rip) knowing
that event probes do not use pt_regs. The result was:

 BUG: kernel NULL pointer dereference, address: 0000000000000024
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 1 PID: 1847 Comm: trace-cmd Not tainted 5.19.0-rc5-test+ #309
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01
v03.03 07/14/2016
 RIP: 0010:get_event_field.isra.0+0x0/0x50
 Code: ff 48 c7 c7 c0 8f 74 a1 e8 3d 8b f5 ff e8 88 09 f6 ff 4c 89 e7 e8
50 6a 13 00 48 89 ef 5b 5d 41 5c 41 5d e9 42 6a 13 00 66 90 <48> 63 47 24
8b 57 2c 48 01 c6 8b 47 28 83 f8 02 74 0e 83 f8 04 74
 RSP: 0018:ffff916c394bbaf0 EFLAGS: 00010086
 RAX: ffff916c854041d8 RBX: ffff916c8d9fbf50 RCX: ffff916c255d2000
 RDX: 0000000000000000 RSI: ffff916c255d2008 RDI: 0000000000000000
 RBP: 0000000000000000 R08: ffff916c3a2a0c08 R09: ffff916c394bbda8
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff916c854041d8
 R13: ffff916c854041b0 R14: 0000000000000000 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff916c9ea40000(0000)
knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000024 CR3: 000000011b60a002 CR4: 00000000001706e0
 Call Trace:
  <TASK>
  get_eprobe_size+0xb4/0x640
  ? __mod_node_page_state+0x72/0xc0
  __eprobe_trace_func+0x59/0x1a0
  ? __mod_lruvec_page_state+0xaa/0x1b0
  ? page_remove_file_rmap+0x14/0x230
  ? page_remove_rmap+0xda/0x170
  event_triggers_call+0x52/0xe0
  trace_event_buffer_commit+0x18f/0x240
  trace_event_raw_event_sched_wakeup_template+0x7a/0xb0
  try_to_wake_up+0x260/0x4c0
  __wake_up_common+0x80/0x180
  __wake_up_common_lock+0x7c/0xc0
  do_notify_parent+0x1c9/0x2a0
  exit_notify+0x1a9/0x220
  do_exit+0x2ba/0x450
  do_group_exit+0x2d/0x90
  __x64_sys_exit_group+0x14/0x20
  do_syscall_64+0x3b/0x90
  entry_SYSCALL_64_after_hwframe+0x46/0xb0

Obviously this is not the desired result.

Move the testing for TPARG_FL_TPOINT which is only used for event probes
to the top of the "$" variable check, as all the other variables are not
used for event probes. Also add a check in the register parsing "%" to
fail if an event probe is used.

Link: https://lkml.kernel.org/r/20220820134400.564426983@goodmis.org

Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Fixes: 7491e2c442 ("tracing: Add a probe that attaches to trace events")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-21 15:56:08 -04:00
Yang Jihong
c3b0f72e80 ftrace: Fix NULL pointer dereference in is_ftrace_trampoline when ftrace is dead
ftrace_startup does not remove ops from ftrace_ops_list when
ftrace_startup_enable fails:

register_ftrace_function
  ftrace_startup
    __register_ftrace_function
      ...
      add_ftrace_ops(&ftrace_ops_list, ops)
      ...
    ...
    ftrace_startup_enable // if ftrace failed to modify, ftrace_disabled is set to 1
    ...
  return 0 // ops is in the ftrace_ops_list.

When ftrace_disabled = 1, unregister_ftrace_function simply returns without doing anything:
unregister_ftrace_function
  ftrace_shutdown
    if (unlikely(ftrace_disabled))
            return -ENODEV;  // return here, __unregister_ftrace_function is not executed,
                             // as a result, ops is still in the ftrace_ops_list
    __unregister_ftrace_function
    ...

If ops is dynamically allocated, it will be free later, in this case,
is_ftrace_trampoline accesses NULL pointer:

is_ftrace_trampoline
  ftrace_ops_trampoline
    do_for_each_ftrace_op(op, ftrace_ops_list) // OOPS! op may be NULL!

Syzkaller reports as follows:
[ 1203.506103] BUG: kernel NULL pointer dereference, address: 000000000000010b
[ 1203.508039] #PF: supervisor read access in kernel mode
[ 1203.508798] #PF: error_code(0x0000) - not-present page
[ 1203.509558] PGD 800000011660b067 P4D 800000011660b067 PUD 130fb8067 PMD 0
[ 1203.510560] Oops: 0000 [#1] SMP KASAN PTI
[ 1203.511189] CPU: 6 PID: 29532 Comm: syz-executor.2 Tainted: G    B   W         5.10.0 #8
[ 1203.512324] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 1203.513895] RIP: 0010:is_ftrace_trampoline+0x26/0xb0
[ 1203.514644] Code: ff eb d3 90 41 55 41 54 49 89 fc 55 53 e8 f2 00 fd ff 48 8b 1d 3b 35 5d 03 e8 e6 00 fd ff 48 8d bb 90 00 00 00 e8 2a 81 26 00 <48> 8b ab 90 00 00 00 48 85 ed 74 1d e8 c9 00 fd ff 48 8d bb 98 00
[ 1203.518838] RSP: 0018:ffffc900012cf960 EFLAGS: 00010246
[ 1203.520092] RAX: 0000000000000000 RBX: 000000000000007b RCX: ffffffff8a331866
[ 1203.521469] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000000010b
[ 1203.522583] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff8df18b07
[ 1203.523550] R10: fffffbfff1be3160 R11: 0000000000000001 R12: 0000000000478399
[ 1203.524596] R13: 0000000000000000 R14: ffff888145088000 R15: 0000000000000008
[ 1203.525634] FS:  00007f429f5f4700(0000) GS:ffff8881daf00000(0000) knlGS:0000000000000000
[ 1203.526801] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1203.527626] CR2: 000000000000010b CR3: 0000000170e1e001 CR4: 00000000003706e0
[ 1203.528611] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1203.529605] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Therefore, when ftrace_startup_enable fails, we need to rollback registration
process and remove ops from ftrace_ops_list.

Link: https://lkml.kernel.org/r/20220818032659.56209-1-yangjihong1@huawei.com

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-21 15:56:07 -04:00
Steven Rostedt (Google)
7249921d94 tracing/perf: Fix double put of trace event when init fails
If in perf_trace_event_init(), the perf_trace_event_open() fails, then it
will call perf_trace_event_unreg() which will not only unregister the perf
trace event, but will also call the put() function of the tp_event.

The problem here is that the trace_event_try_get_ref() is called by the
caller of perf_trace_event_init() and if perf_trace_event_init() returns a
failure, it will then call trace_event_put(). But since the
perf_trace_event_unreg() already called the trace_event_put() function, it
triggers a WARN_ON().

 WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46 trace_event_dyn_put_ref+0x15/0x20

If perf_trace_event_reg() does not call the trace_event_try_get_ref() then
the perf_trace_event_unreg() should not be calling trace_event_put(). This
breaks symmetry and causes bugs like these.

Pull out the trace_event_put() from perf_trace_event_unreg() and call it
in the locations that perf_trace_event_unreg() is called. This not only
fixes this bug, but also brings back the proper symmetry of the reg/unreg
vs get/put logic.

Link: https://lore.kernel.org/all/cover.1660347763.git.kjlx@templeofstupid.com/
Link: https://lkml.kernel.org/r/20220816192817.43d5e17f@gandalf.local.home

Cc: stable@vger.kernel.org
Fixes: 1d18538e6a ("tracing: Have dynamic events have a ref counter")
Reported-by: Krister Johansen <kjlx@templeofstupid.com>
Reviewed-by: Krister Johansen <kjlx@templeofstupid.com>
Tested-by: Krister Johansen <kjlx@templeofstupid.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-21 15:56:07 -04:00
Lukas Bulwahn
d8a64313c1 tracing: React to error return from traceprobe_parse_event_name()
The function traceprobe_parse_event_name() may set the first two function
arguments to a non-null value and still return -EINVAL to indicate an
unsuccessful completion of the function. Hence, it is not sufficient to
just check the result of the two function arguments for being not null,
but the return value also needs to be checked.

Commit 95c104c378 ("tracing: Auto generate event name when creating a
group of events") changed the error-return-value checking of the second
traceprobe_parse_event_name() invocation in __trace_eprobe_create() and
removed checking the return value to jump to the error handling case.

Reinstate using the return value in the error-return-value checking.

Link: https://lkml.kernel.org/r/20220811071734.20700-1-lukas.bulwahn@gmail.com

Fixes: 95c104c378 ("tracing: Auto generate event name when creating a group of events")
Acked-by: Linyu Yuan <quic_linyyuan@quicinc.com>
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-21 15:56:07 -04:00
Linus Torvalds
965a9d75e3 Tracing updates for 5.20 / 6.0
- Runtime verification infrastructure
   This is the biggest change for this pull request. It introduces the
   runtime verification that is necessary for running Linux on safety
   critical systems. It allows for deterministic automata models to be
   inserted into the kernel that will attach to tracepoints, where the
   information on these tracepoints will move the model from state to state.
   If a state is encountered that does not belong to the model, it will then
   activate a given reactor, that could just inform the user or even panic
   the kernel (for which safety critical systems will detect and can recover
   from).
 
 - Two monitor models are also added: Wakeup In Preemptive (WIP - not to be
   confused with "work in progress"), and Wakeup While Not Running (WWNR).
 
 - Added __vstring() helper to the TRACE_EVENT() macro to replace several
   vsnprintf() usages that were all doing it wrong.
 
 - eprobes now can have their event autogenerated when the event name is left
   off.
 
 - The rest is various cleanups and fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYu0yzRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qj4HAP4tQtV55rjj4DQ5XIXmtI3/64PmyRSJ
 +y4DEXi1UvEUCQD/QAuQfWoT/7gh35ltkfeS4t3ockzy14rrkP5drZigiQA=
 =kEtM
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

 - Runtime verification infrastructure

   This is the biggest change here. It introduces the runtime
   verification that is necessary for running Linux on safety critical
   systems.

   It allows for deterministic automata models to be inserted into the
   kernel that will attach to tracepoints, where the information on
   these tracepoints will move the model from state to state.

   If a state is encountered that does not belong to the model, it will
   then activate a given reactor, that could just inform the user or
   even panic the kernel (for which safety critical systems will detect
   and can recover from).

 - Two monitor models are also added: Wakeup In Preemptive (WIP - not to
   be confused with "work in progress"), and Wakeup While Not Running
   (WWNR).

 - Added __vstring() helper to the TRACE_EVENT() macro to replace
   several vsnprintf() usages that were all doing it wrong.

 - eprobes now can have their event autogenerated when the event name is
   left off.

 - The rest is various cleanups and fixes.

* tag 'trace-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (50 commits)
  rv: Unlock on error path in rv_unregister_reactor()
  tracing: Use alignof__(struct {type b;}) instead of offsetof()
  tracing/eprobe: Show syntax error logs in error_log file
  scripts/tracing: Fix typo 'the the' in comment
  tracepoints: It is CONFIG_TRACEPOINTS not CONFIG_TRACEPOINT
  tracing: Use free_trace_buffer() in allocate_trace_buffers()
  tracing: Use a struct alignof to determine trace event field alignment
  rv/reactor: Add the panic reactor
  rv/reactor: Add the printk reactor
  rv/monitor: Add the wwnr monitor
  rv/monitor: Add the wip monitor
  rv/monitor: Add the wip monitor skeleton created by dot2k
  Documentation/rv: Add deterministic automata instrumentation documentation
  Documentation/rv: Add deterministic automata monitor synthesis documentation
  tools/rv: Add dot2k
  Documentation/rv: Add deterministic automaton documentation
  tools/rv: Add dot2c
  Documentation/rv: Add a basic documentation
  rv/include: Add instrumentation helper functions
  rv/include: Add deterministic automata monitor definition via C macros
  ...
2022-08-05 09:41:12 -07:00
Dan Carpenter
f1a15b977f rv: Unlock on error path in rv_unregister_reactor()
Unlock the "rv_interface_lock" mutex before returning.

Link: https://lkml.kernel.org/r/YuvYzNfGMgV+PIhd@kili

Fixes: 04acadcb44 ("rv: Add runtime reactors interface")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-04 22:49:17 -04:00
Linus Torvalds
f86d1fbbe7 Networking changes for 6.0.
Core
 ----
 
  - Refactor the forward memory allocation to better cope with memory
    pressure with many open sockets, moving from a per socket cache to
    a per-CPU one
 
  - Replace rwlocks with RCU for better fairness in ping, raw sockets
    and IP multicast router.
 
  - Network-side support for IO uring zero-copy send.
 
  - A few skb drop reason improvements, including codegen the source file
    with string mapping instead of using macro magic.
 
  - Rename reference tracking helpers to a more consistent
    netdev_* schema.
 
  - Adapt u64_stats_t type to address load/store tearing issues.
 
  - Refine debug helper usage to reduce the log noise caused by bots.
 
 BPF
 ---
  - Improve socket map performance, avoiding skb cloning on read
    operation.
 
  - Add support for 64 bits enum, to match types exposed by kernel.
 
  - Introduce support for sleepable uprobes program.
 
  - Introduce support for enum textual representation in libbpf.
 
  - New helpers to implement synproxy with eBPF/XDP.
 
  - Improve loop performances, inlining indirect calls when
    possible.
 
  - Removed all the deprecated libbpf APIs.
 
  - Implement new eBPF-based LSM flavor.
 
  - Add type match support, which allow accurate queries to the
    eBPF used types.
 
  - A few TCP congetsion control framework usability improvements.
 
  - Add new infrastructure to manipulate CT entries via eBPF programs.
 
  - Allow for livepatch (KLP) and BPF trampolines to attach to the same
    kernel function.
 
 Protocols
 ---------
 
  - Introduce per network namespace lookup tables for unix sockets,
    increasing scalability and reducing contention.
 
  - Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support.
 
  - Add support to forciby close TIME_WAIT TCP sockets via user-space
    tools.
 
  - Significant performance improvement for the TLS 1.3 receive path,
    both for zero-copy and not-zero-copy.
 
  - Support for changing the initial MTPCP subflow priority/backup
    status
 
  - Introduce virtually contingus buffers for sockets over RDMA,
    to cope better with memory pressure.
 
  - Extend CAN ethtool support with timestamping capabilities
 
  - Refactor CAN build infrastructure to allow building only the needed
    features.
 
 Driver API
 ----------
 
  - Remove devlink mutex to allow parallel commands on multiple links.
 
  - Add support for pause stats in distributed switch.
 
  - Implement devlink helpers to query and flash line cards.
 
  - New helper for phy mode to register conversion.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro.
 
  - Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch.
 
  - Ethernet DSA driver for the Microchip LAN937x switch.
 
  - Ethernet PHY driver for the Aquantia AQR113C EPHY.
 
  - CAN driver for the OBD-II ELM327 interface.
 
  - CAN driver for RZ/N1 SJA1000 CAN controller.
 
  - Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device.
 
 Drivers
 -------
 
  - Intel Ethernet NICs:
    - i40e: add support for vlan pruning
    - i40e: add support for XDP framented packets
    - ice: improved vlan offload support
    - ice: add support for PPPoE offload
 
  - Mellanox Ethernet (mlx5)
    - refactor packet steering offload for performance and scalability
    - extend support for TC offload
    - refactor devlink code to clean-up the locking schema
    - support stacked vlans for bridge offloads
    - use TLS objects pool to improve connection rate
 
  - Netronome Ethernet NICs (nfp):
    - extend support for IPv6 fields mangling offload
    - add support for vepa mode in HW bridge
    - better support for virtio data path acceleration (VDPA)
    - enable TSO by default
 
  - Microsoft vNIC driver (mana)
    - add support for XDP redirect
 
  - Others Ethernet drivers:
    - bonding: add per-port priority support
    - microchip lan743x: extend phy support
    - Fungible funeth: support UDP segmentation offload and XDP xmit
    - Solarflare EF100: add support for virtual function representors
    - MediaTek SoC: add XDP support
 
  - Mellanox Ethernet/IB switch (mlxsw):
    - dropped support for unreleased H/W (XM router).
    - improved stats accuracy
    - unified bridge model coversion improving scalability
      (parts 1-6)
    - support for PTP in Spectrum-2 asics
 
  - Broadcom PHYs
    - add PTP support for BCM54210E
    - add support for the BCM53128 internal PHY
 
  - Marvell Ethernet switches (prestera):
    - implement support for multicast forwarding offload
 
  - Embedded Ethernet switches:
    - refactor OcteonTx MAC filter for better scalability
    - improve TC H/W offload for the Felix driver
    - refactor the Microchip ksz8 and ksz9477 drivers to share
      the probe code (parts 1, 2), add support for phylink
      mac configuration
 
  - Other WiFi:
    - Microchip wilc1000: diable WEP support and enable WPA3
    - Atheros ath10k: encapsulation offload support
 
 Old code removal:
 
  - Neterion vxge ethernet driver: this is untouched since more than
    10 years.
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmLqN+oSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkB9kQAI9VqW0c3SfiTJnkVBEIovZ6Tnh5stD2
 UYFkh1BdchLsYxi7W4XMpVPSzRztiTP87mIx5c/KvIzj+QNeWL1XWRJSPdI9HhTD
 pTAA/tM2OG7bqrbyQiKDNfpQdNl7+kk1RwnYd+f9RFl1QVuIJaYhmjVwrsN5xF/+
 jUsotpROarM2dGFWiFwJbKhP2zMDT+6qEEahM8pEPggKhv8wRLYjany2cZVEe4e0
 WGUpbINAS8gEKm0Ob922WaDfDrcK/N1Z0jNz/kMaENkK18Vvc7F6bCO0DzAawKX9
 QZMMwm6mHp3EThflJAMAzCGIYiIcwLhykgdyj8rrjPhFrWbMD2Sdsbo21HOXU/8j
 u4aAhVl+d+h7emmbgBoJ8sycVJ7BQlXz7lX20sTgADv9xI4/dPhQ17CMRuwX6fXX
 JSrn6P6e1LTV5CEg6vrlSPnKPY6uhFn/cPw47FxCjRwJ9phVnp+8uZWQmf9Pz3yf
 Ok/tcj+juFbsmuOshHy2cbRkuNZNS0oRWlSTBo5795ZwOLSakMonR3L+ev2aOvzz
 DVrFp2Y/iIVwMSFdCbouYdYnhArPRhOAtCmZc2afY8aBN7aaMgrdTy3+mzUoHy3I
 FG3K+VuKpfi0vY4zn6ZoLZDIpyXIoJJ93RcSGltD32t3Dp1RaQMVEI4s45k05PVm
 1nYpXKHA8qML
 =hxEG
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking changes from Paolo Abeni:
 "Core:

   - Refactor the forward memory allocation to better cope with memory
     pressure with many open sockets, moving from a per socket cache to
     a per-CPU one

   - Replace rwlocks with RCU for better fairness in ping, raw sockets
     and IP multicast router.

   - Network-side support for IO uring zero-copy send.

   - A few skb drop reason improvements, including codegen the source
     file with string mapping instead of using macro magic.

   - Rename reference tracking helpers to a more consistent netdev_*
     schema.

   - Adapt u64_stats_t type to address load/store tearing issues.

   - Refine debug helper usage to reduce the log noise caused by bots.

  BPF:

   - Improve socket map performance, avoiding skb cloning on read
     operation.

   - Add support for 64 bits enum, to match types exposed by kernel.

   - Introduce support for sleepable uprobes program.

   - Introduce support for enum textual representation in libbpf.

   - New helpers to implement synproxy with eBPF/XDP.

   - Improve loop performances, inlining indirect calls when possible.

   - Removed all the deprecated libbpf APIs.

   - Implement new eBPF-based LSM flavor.

   - Add type match support, which allow accurate queries to the eBPF
     used types.

   - A few TCP congetsion control framework usability improvements.

   - Add new infrastructure to manipulate CT entries via eBPF programs.

   - Allow for livepatch (KLP) and BPF trampolines to attach to the same
     kernel function.

  Protocols:

   - Introduce per network namespace lookup tables for unix sockets,
     increasing scalability and reducing contention.

   - Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support.

   - Add support to forciby close TIME_WAIT TCP sockets via user-space
     tools.

   - Significant performance improvement for the TLS 1.3 receive path,
     both for zero-copy and not-zero-copy.

   - Support for changing the initial MTPCP subflow priority/backup
     status

   - Introduce virtually contingus buffers for sockets over RDMA, to
     cope better with memory pressure.

   - Extend CAN ethtool support with timestamping capabilities

   - Refactor CAN build infrastructure to allow building only the needed
     features.

  Driver API:

   - Remove devlink mutex to allow parallel commands on multiple links.

   - Add support for pause stats in distributed switch.

   - Implement devlink helpers to query and flash line cards.

   - New helper for phy mode to register conversion.

  New hardware / drivers:

   - Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro.

   - Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch.

   - Ethernet DSA driver for the Microchip LAN937x switch.

   - Ethernet PHY driver for the Aquantia AQR113C EPHY.

   - CAN driver for the OBD-II ELM327 interface.

   - CAN driver for RZ/N1 SJA1000 CAN controller.

   - Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device.

  Drivers:

   - Intel Ethernet NICs:
      - i40e: add support for vlan pruning
      - i40e: add support for XDP framented packets
      - ice: improved vlan offload support
      - ice: add support for PPPoE offload

   - Mellanox Ethernet (mlx5)
      - refactor packet steering offload for performance and scalability
      - extend support for TC offload
      - refactor devlink code to clean-up the locking schema
      - support stacked vlans for bridge offloads
      - use TLS objects pool to improve connection rate

   - Netronome Ethernet NICs (nfp):
      - extend support for IPv6 fields mangling offload
      - add support for vepa mode in HW bridge
      - better support for virtio data path acceleration (VDPA)
      - enable TSO by default

   - Microsoft vNIC driver (mana)
      - add support for XDP redirect

   - Others Ethernet drivers:
      - bonding: add per-port priority support
      - microchip lan743x: extend phy support
      - Fungible funeth: support UDP segmentation offload and XDP xmit
      - Solarflare EF100: add support for virtual function representors
      - MediaTek SoC: add XDP support

   - Mellanox Ethernet/IB switch (mlxsw):
      - dropped support for unreleased H/W (XM router).
      - improved stats accuracy
      - unified bridge model coversion improving scalability (parts 1-6)
      - support for PTP in Spectrum-2 asics

   - Broadcom PHYs
      - add PTP support for BCM54210E
      - add support for the BCM53128 internal PHY

   - Marvell Ethernet switches (prestera):
      - implement support for multicast forwarding offload

   - Embedded Ethernet switches:
      - refactor OcteonTx MAC filter for better scalability
      - improve TC H/W offload for the Felix driver
      - refactor the Microchip ksz8 and ksz9477 drivers to share the
        probe code (parts 1, 2), add support for phylink mac
        configuration

   - Other WiFi:
      - Microchip wilc1000: diable WEP support and enable WPA3
      - Atheros ath10k: encapsulation offload support

  Old code removal:

   - Neterion vxge ethernet driver: this is untouched since more than 10 years"

* tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1890 commits)
  doc: sfp-phylink: Fix a broken reference
  wireguard: selftests: support UML
  wireguard: allowedips: don't corrupt stack when detecting overflow
  wireguard: selftests: update config fragments
  wireguard: ratelimiter: use hrtimer in selftest
  net/mlx5e: xsk: Discard unaligned XSK frames on striding RQ
  net: usb: ax88179_178a: Bind only to vendor-specific interface
  selftests: net: fix IOAM test skip return code
  net: usb: make USB_RTL8153_ECM non user configurable
  net: marvell: prestera: remove reduntant code
  octeontx2-pf: Reduce minimum mtu size to 60
  net: devlink: Fix missing mutex_unlock() call
  net/tls: Remove redundant workqueue flush before destroy
  net: txgbe: Fix an error handling path in txgbe_probe()
  net: dsa: Fix spelling mistakes and cleanup code
  Documentation: devlink: add add devlink-selftests to the table of contents
  dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock
  net: ionic: fix error check for vlan flags in ionic_set_nic_features()
  net: ice: fix error NETIF_F_HW_VLAN_CTAG_FILTER check in ice_vsi_sync_fltr()
  nfp: flower: add support for tunnel offload without key ID
  ...
2022-08-03 16:29:08 -07:00
Linus Torvalds
7d9d077c78 RCU pull request for v5.20 (or whatever)
This pull request contains the following branches:
 
 doc.2022.06.21a: Documentation updates.
 
 fixes.2022.07.19a: Miscellaneous fixes.
 
 nocb.2022.07.19a: Callback-offload updates, perhaps most notably a new
 	RCU_NOCB_CPU_DEFAULT_ALL Kconfig option that causes all CPUs to
 	be offloaded at boot time, regardless of kernel boot parameters.
 	This is useful to battery-powered systems such as ChromeOS
 	and Android.  In addition, a new RCU_NOCB_CPU_CB_BOOST kernel
 	boot parameter prevents offloaded callbacks from interfering
 	with real-time workloads and with energy-efficiency mechanisms.
 
 poll.2022.07.21a: Polled grace-period updates, perhaps most notably
 	making these APIs account for both normal and expedited grace
 	periods.
 
 rcu-tasks.2022.06.21a: Tasks RCU updates, perhaps most notably reducing
 	the CPU overhead of RCU tasks trace grace periods by more than
 	a factor of two on a system with 15,000 tasks.	The reduction
 	is expected to increase with the number of tasks, so it seems
 	reasonable to hypothesize that a system with 150,000 tasks might
 	see a 20-fold reduction in CPU overhead.
 
 torture.2022.06.21a: Torture-test updates.
 
 ctxt.2022.07.05a: Updates that merge RCU's dyntick-idle tracking into
 	context tracking, thus reducing the overhead of transitioning to
 	kernel mode from either idle or nohz_full userspace execution
 	for kernels that track context independently of RCU.  This is
 	expected to be helpful primarily for kernels built with
 	CONFIG_NO_HZ_FULL=y.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmLgMcgTHHBhdWxtY2tA
 a2VybmVsLm9yZwAKCRCevxLzctn7jArXD/0fjbCwqpRjHVTzjMY8jN4zDkqZZD6m
 g8Fx27hZ4ToNFwRptyHwNezrNj14skjAJEXfdjaVw32W62ivXvf0HINvSzsTLCSq
 k2kWyBdXLc9CwY5p5W4smnpn5VoAScjg5PoPL59INoZ/Zziji323C7Zepl/1DYJt
 0T6bPCQjo1ZQoDUCyVpSjDmAqxnderWG0MeJVt74GkLqmnYLANg0GH8c7mH4+9LL
 kVGlLp5nlPgNJ4FEoFdMwNU8T/ETmaVld/m2dkiawjkXjJzB2XKtBigU91DDmXz5
 7DIdV4ABrxiy4kGNqtIe/jFgnKyVD7xiDpyfjd6KTeDr/rDS8u2ZH7+1iHsyz3g0
 Np/tS3vcd0KR+gI/d0eXxPbgm5sKlCmKw/nU2eArpW/+4LmVXBUfHTG9Jg+LJmBc
 JrUh6aEdIZJZHgv/nOQBNig7GJW43IG50rjuJxAuzcxiZNEG5lUSS23ysaA9CPCL
 PxRWKSxIEfK3kdmvVO5IIbKTQmIBGWlcWMTcYictFSVfBgcCXpPAksGvqA5JiUkc
 egW+xLFo/7K+E158vSKsVqlWZcEeUbsNJ88QOlpqnRgH++I2Yv/LhK41XfJfpH+Y
 ALxVaDd+mAq6v+qSHNVq9wT3ozXIPy/zK1hDlMIqx40h2YvaEsH4je+521oSoN9r
 vX60+QNxvUBLwA==
 =vUNm
 -----END PGP SIGNATURE-----

Merge tag 'rcu.2022.07.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull RCU updates from Paul McKenney:

 - Documentation updates

 - Miscellaneous fixes

 - Callback-offload updates, perhaps most notably a new
   RCU_NOCB_CPU_DEFAULT_ALL Kconfig option that causes all CPUs to be
   offloaded at boot time, regardless of kernel boot parameters.

   This is useful to battery-powered systems such as ChromeOS and
   Android. In addition, a new RCU_NOCB_CPU_CB_BOOST kernel boot
   parameter prevents offloaded callbacks from interfering with
   real-time workloads and with energy-efficiency mechanisms

 - Polled grace-period updates, perhaps most notably making these APIs
   account for both normal and expedited grace periods

 - Tasks RCU updates, perhaps most notably reducing the CPU overhead of
   RCU tasks trace grace periods by more than a factor of two on a
   system with 15,000 tasks.

   The reduction is expected to increase with the number of tasks, so it
   seems reasonable to hypothesize that a system with 150,000 tasks
   might see a 20-fold reduction in CPU overhead

 - Torture-test updates

 - Updates that merge RCU's dyntick-idle tracking into context tracking,
   thus reducing the overhead of transitioning to kernel mode from
   either idle or nohz_full userspace execution for kernels that track
   context independently of RCU.

   This is expected to be helpful primarily for kernels built with
   CONFIG_NO_HZ_FULL=y

* tag 'rcu.2022.07.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (98 commits)
  rcu: Add irqs-disabled indicator to expedited RCU CPU stall warnings
  rcu: Diagnose extended sync_rcu_do_polled_gp() loops
  rcu: Put panic_on_rcu_stall() after expedited RCU CPU stall warnings
  rcutorture: Test polled expedited grace-period primitives
  rcu: Add polled expedited grace-period primitives
  rcutorture: Verify that polled GP API sees synchronous grace periods
  rcu: Make Tiny RCU grace periods visible to polled APIs
  rcu: Make polled grace-period API account for expedited grace periods
  rcu: Switch polled grace-period APIs to ->gp_seq_polled
  rcu/nocb: Avoid polling when my_rdp->nocb_head_rdp list is empty
  rcu/nocb: Add option to opt rcuo kthreads out of RT priority
  rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread()
  rcu/nocb: Add an option to offload all CPUs on boot
  rcu/nocb: Fix NOCB kthreads spawn failure with rcu_nocb_rdp_deoffload() direct call
  rcu/nocb: Invert rcu_state.barrier_mutex VS hotplug lock locking order
  rcu/nocb: Add/del rdp to iterate from rcuog itself
  rcu/tree: Add comment to describe GP-done condition in fqs loop
  rcu: Initialize first_gp_fqs at declaration in rcu_gp_fqs()
  rcu/kvfree: Remove useless monitor_todo flag
  rcu: Cleanup RCU urgency state for offline CPU
  ...
2022-08-02 19:12:45 -07:00
Linus Torvalds
c013d0af81 for-5.20/block-2022-07-29
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmLko3gQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpmQaD/90NKFj4v8I456TUQyg1jimXEsL+e84E6o2
 ALWVb6JzQvlPVQXNLnK5YKIunMWOTtTMz0nyB8sVRwVJVJO0P5d7QopAkZM8fkyU
 MK5OCzoryENw4DTc2wJS4in6cSbGylIuN74wMzlf7+M67JTImfoZQhbTMcjwzZfn
 b3OlL6sID7zMXwGcuOJPZyUJICCpDhzdSF9JXqKma5PQuG2SBmQyvFxJAcsoFBPc
 YetnoRIOIN6yBvsIZaPaYq7XI9MIvF0e67EQtyCEHj4tHpyVnyDWkeObVFULsISU
 gGEKbkYPvNUzRAU5Q1NBBHh1tTfkf/MaUxTuZwoEwZ/s04IGBGMmrZGyfvdfzYo6
 M7NwSEg/TrUSNfTwn65mQi7uOXu1pGkJrqz84Flm8u9Qid9Vd7LExLG5p/ggnWdH
 5th93MDEmtEg29e9DXpEAuS5d0t3TtSvosflaKpyfNNfr+P0rWCN6GM/uW62VUTK
 ls69SQh/AQJRbg64jU4xper6WhaYtSXK7TKEnxJycoEn9gYNyCcdot2uekth0xRH
 ChHGmRlteiqe/y4uFWn/2dcxWjoleiHbFjTaiRL75WVl8wIDEjw02LGuoZ61Ss9H
 WOV+MT7KqNjBGe6lreUY+O/PO02dzmoR6heJXN19p8zr/pBuLCTGX7UpO7rzgaBR
 4N1HEozvIw==
 =celk
 -----END PGP SIGNATURE-----

Merge tag 'for-5.20/block-2022-07-29' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:

 - Improve the type checking of request flags (Bart)

 - Ensure queue mapping for a single queues always picks the right queue
   (Bart)

 - Sanitize the io priority handling (Jan)

 - rq-qos race fix (Jinke)

 - Reserved tags handling improvements (John)

 - Separate memory alignment from file/disk offset aligment for O_DIRECT
   (Keith)

 - Add new ublk driver, userspace block driver using io_uring for
   communication with the userspace backend (Ming)

 - Use try_cmpxchg() to cleanup the code in various spots (Uros)

 - Finally remove bdevname() (Christoph)

 - Clean up the zoned device handling (Christoph)

 - Clean up independent access range support (Christoph)

 - Clean up and improve block sysfs handling (Christoph)

 - Clean up and improve teardown of block devices.

   This turns the usual two step process into something that is simpler
   to implement and handle in block drivers (Christoph)

 - Clean up chunk size handling (Christoph)

 - Misc cleanups and fixes (Bart, Bo, Dan, GuoYong, Jason, Keith, Liu,
   Ming, Sebastian, Yang, Ying)

* tag 'for-5.20/block-2022-07-29' of git://git.kernel.dk/linux-block: (178 commits)
  ublk_drv: fix double shift bug
  ublk_drv: make sure that correct flags(features) returned to userspace
  ublk_drv: fix error handling of ublk_add_dev
  ublk_drv: fix lockdep warning
  block: remove __blk_get_queue
  block: call blk_mq_exit_queue from disk_release for never added disks
  blk-mq: fix error handling in __blk_mq_alloc_disk
  ublk: defer disk allocation
  ublk: rewrite ublk_ctrl_get_queue_affinity to not rely on hctx->cpumask
  ublk: fold __ublk_create_dev into ublk_ctrl_add_dev
  ublk: cleanup ublk_ctrl_uring_cmd
  ublk: simplify ublk_ch_open and ublk_ch_release
  ublk: remove the empty open and release block device operations
  ublk: remove UBLK_IO_F_PREFLUSH
  ublk: add a MAINTAINERS entry
  block: don't allow the same type rq_qos add more than once
  mmc: fix disk/queue leak in case of adding disk failure
  ublk_drv: fix an IS_ERR() vs NULL check
  ublk: remove UBLK_IO_F_INTEGRITY
  ublk_drv: remove unneeded semicolon
  ...
2022-08-02 13:46:35 -07:00
Masami Hiramatsu (Google)
2f63e5d2e3 tracing/eprobe: Show syntax error logs in error_log file
Show the syntax errors for event probes in error_log file as same as
other dynamic events, so that user can understand what is the problem.

Link: https://lkml.kernel.org/r/165932113556.2850673.3483079297896607612.stgit@devnote2

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-02 15:22:55 -04:00
Zhiqiang Liu
59927cbe3f tracing: Use free_trace_buffer() in allocate_trace_buffers()
In allocate_trace_buffers(), if allocating tr->max_buffer
fails, we can directly call free_trace_buffer to free
tr->array_buffer.

Link: https://lkml.kernel.org/r/65f0702d-07f6-08de-2a07-4c50af56a67b@huawei.com

Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-02 15:13:19 -04:00
Daniel Bristot de Oliveira
e88043c0ac rv/reactor: Add the panic reactor
Sample reactor that panics the system when an exception is found. This
is useful both to capture a vmcore, or to fail-safe a critical system.

Link: https://lkml.kernel.org/r/729aae3aba95f35738b8f8180e626d747d1d9da2.1659052063.git.bristot@kernel.org

Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Gabriele Paoloni <gpaoloni@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Tao Zhou <tao.zhou@linux.dev>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-30 14:01:30 -04:00
Daniel Bristot de Oliveira
135b881ea8 rv/reactor: Add the printk reactor
A reactor that printks the reaction message.

Link: https://lkml.kernel.org/r/b65f18a7fd6dc6659a3008fd7b7392de3465d47b.1659052063.git.bristot@kernel.org

Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Gabriele Paoloni <gpaoloni@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Tao Zhou <tao.zhou@linux.dev>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-30 14:01:30 -04:00
Daniel Bristot de Oliveira
ccc319dcb4 rv/monitor: Add the wwnr monitor
Per task wakeup while not running (wwnr) monitor.

This model is broken, the reason is that a task can be running in the
processor without being set as RUNNABLE. Think about a task about to
sleep:

1:      set_current_state(TASK_UNINTERRUPTIBLE);
2:      schedule();

And then imagine an IRQ happening in between the lines one and two,
waking the task up. BOOM, the wakeup will happen while the task is
running.

Q: Why do we need this model, so?
A: To test the reactors.

Link: https://lkml.kernel.org/r/473c0fc39967250fdebcff8b620311c11dccad30.1659052063.git.bristot@kernel.org

Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Gabriele Paoloni <gpaoloni@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Tao Zhou <tao.zhou@linux.dev>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-30 14:01:30 -04:00
Daniel Bristot de Oliveira
10bde81c74 rv/monitor: Add the wip monitor
The wakeup in preemptive (wip) monitor verifies if the
wakeup events always take place with preemption disabled:

                     |
                     |
                     v
                   #==================#
                   H    preemptive    H <+
                   #==================#  |
                     |                   |
                     | preempt_disable   | preempt_enable
                     v                   |
    sched_waking   +------------------+  |
  +--------------- |                  |  |
  |                |  non_preemptive  |  |
  +--------------> |                  | -+
                   +------------------+

The wakeup event always takes place with preemption disabled because
of the scheduler synchronization. However, because the preempt_count
and its trace event are not atomic with regard to interrupts, some
inconsistencies might happen.

The documentation illustrates one of these cases.

Link: https://lkml.kernel.org/r/c98ca678df81115fddc04921b3c79720c836b18f.1659052063.git.bristot@kernel.org

Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Gabriele Paoloni <gpaoloni@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Tao Zhou <tao.zhou@linux.dev>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-30 14:01:30 -04:00
Daniel Bristot de Oliveira
8812d21219 rv/monitor: Add the wip monitor skeleton created by dot2k
THIS CODE IS NOT LINKED TO THE MAKEFILE.

This model does not compile because it lacks the instrumentation
part, which will be added next. In the typical case, there will be
only one patch, but it was split into two patches for educational
purposes.

This is the direct output this command line:
  $ dot2k -d tools/verification/models/wip.dot -t per_cpu

Link: https://lkml.kernel.org/r/5eb7a9118917e8a814c5e49853a72fc62be0a101.1659052063.git.bristot@kernel.org

Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Gabriele Paoloni <gpaoloni@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Tao Zhou <tao.zhou@linux.dev>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-30 14:01:29 -04:00
Daniel Bristot de Oliveira
ff0aaf6712 Documentation/rv: Add a basic documentation
Add the runtime-verification.rst document, explaining the basics of RV
and how to use the interface.

Link: https://lkml.kernel.org/r/4be7d1a88ab1e2eb0767521e1ab52a149a154bc4.1659052063.git.bristot@kernel.org

Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Gabriele Paoloni <gpaoloni@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Tao Zhou <tao.zhou@linux.dev>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-30 14:01:29 -04:00
Daniel Bristot de Oliveira
792575348f rv/include: Add deterministic automata monitor definition via C macros
In Linux terms, the runtime verification monitors are encapsulated
inside the "RV monitor" abstraction. The "RV monitor" includes a set
of instances of the monitor (per-cpu monitor, per-task monitor, and
so on), the helper functions that glue the monitor to the system
reference model, and the trace output as a reaction for event parsing
and exceptions, as depicted below:

Linux  +----- RV Monitor ----------------------------------+ Formal
 Realm |                                                   |  Realm
 +-------------------+     +----------------+     +-----------------+
 |   Linux kernel    |     |     Monitor    |     |     Reference   |
 |     Tracing       |  -> |   Instance(s)  | <-  |       Model     |
 | (instrumentation) |     | (verification) |     | (specification) |
 +-------------------+     +----------------+     +-----------------+
        |                          |                       |
        |                          V                       |
        |                     +----------+                 |
        |                     | Reaction |                 |
        |                     +--+--+--+-+                 |
        |                        |  |  |                   |
        |                        |  |  +-> trace output ?  |
        +------------------------|--|----------------------+
                                 |  +----> panic ?
                                 +-------> <user-specified>

Add the rv/da_monitor.h, enabling automatic code generation for the
*Monitor Instance(s)* using C macros, and code to support it.

The benefits of the usage of macro for monitor synthesis are 3-fold as it:

- Reduces the code duplication;
- Facilitates the bug fix/improvement;
- Avoids the case of developers changing the core of the monitor code
  to manipulate the model in a (let's say) non-standard way.

This initial implementation presents three different types of monitor
instances:

- DECLARE_DA_MON_GLOBAL(name, type)
- DECLARE_DA_MON_PER_CPU(name, type)
- DECLARE_DA_MON_PER_TASK(name, type)

The first declares the functions for a global deterministic automata monitor,
the second for monitors with per-cpu instances, and the third with per-task
instances.

Link: https://lkml.kernel.org/r/51b0bf425a281e226dfeba7401d2115d6091f84e.1659052063.git.bristot@kernel.org

Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Gabriele Paoloni <gpaoloni@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Tao Zhou <tao.zhou@linux.dev>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-30 14:01:28 -04:00
Daniel Bristot de Oliveira
04acadcb44 rv: Add runtime reactors interface
A runtime monitor can cause a reaction to the detection of an
exception on the model's execution. By default, the monitors have
tracing reactions, printing the monitor output via tracepoints.
But other reactions can be added (on-demand) via this interface.

The user interface resembles the kernel tracing interface and
presents these files:

"available_reactors"
  - Reading shows the available reactors, one per line.

   For example:
     # cat available_reactors
     nop
     panic
     printk

 "reacting_on"
   - It is an on/off general switch for reactors, disabling
   all reactions.

 "monitors/MONITOR/reactors"
   - List available reactors, with the select reaction for the given
   MONITOR inside []. The default one is the nop (no operation)
   reactor.
   - Writing the name of a reactor enables it to the given
   MONITOR.

   For example:
     # cat monitors/wip/reactors
     [nop]
     panic
     printk
     # echo panic > monitors/wip/reactors
     # cat monitors/wip/reactors
     nop
     [panic]
     printk

Link: https://lkml.kernel.org/r/1794eb994637457bdeaa6bad0b8263d2f7eece0c.1659052063.git.bristot@kernel.org

Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Gabriele Paoloni <gpaoloni@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Tao Zhou <tao.zhou@linux.dev>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-30 14:01:28 -04:00
Daniel Bristot de Oliveira
102227b970 rv: Add Runtime Verification (RV) interface
RV is a lightweight (yet rigorous) method that complements classical
exhaustive verification techniques (such as model checking and
theorem proving) with a more practical approach to complex systems.

RV works by analyzing the trace of the system's actual execution,
comparing it against a formal specification of the system behavior.
RV can give precise information on the runtime behavior of the
monitored system while enabling the reaction for unexpected
events, avoiding, for example, the propagation of a failure on
safety-critical systems.

The development of this interface roots in the development of the
paper:

De Oliveira, Daniel Bristot; Cucinotta, Tommaso; De Oliveira, Romulo
Silva. Efficient formal verification for the Linux kernel. In:
International Conference on Software Engineering and Formal Methods.
Springer, Cham, 2019. p. 315-332.

And:

De Oliveira, Daniel Bristot. Automata-based formal analysis
and verification of the real-time Linux kernel. PhD Thesis, 2020.

The RV interface resembles the tracing/ interface on purpose. The current
path for the RV interface is /sys/kernel/tracing/rv/.

It presents these files:

 "available_monitors"
   - List the available monitors, one per line.

   For example:
     # cat available_monitors
     wip
     wwnr

 "enabled_monitors"
   - Lists the enabled monitors, one per line;
   - Writing to it enables a given monitor;
   - Writing a monitor name with a '!' prefix disables it;
   - Truncating the file disables all enabled monitors.

   For example:
     # cat enabled_monitors
     # echo wip > enabled_monitors
     # echo wwnr >> enabled_monitors
     # cat enabled_monitors
     wip
     wwnr
     # echo '!wip' >> enabled_monitors
     # cat enabled_monitors
     wwnr
     # echo > enabled_monitors
     # cat enabled_monitors
     #

   Note that more than one monitor can be enabled concurrently.

 "monitoring_on"
   - It is an on/off general switcher for monitoring. Note
   that it does not disable enabled monitors or detach events,
   but stop the per-entity monitors of monitoring the events
   received from the system. It resembles the "tracing_on" switcher.

 "monitors/"
   Each monitor will have its one directory inside "monitors/". There
   the monitor specific files will be presented.
   The "monitors/" directory resembles the "events" directory on
   tracefs.

   For example:
     # cd monitors/wip/
     # ls
     desc  enable
     # cat desc
     wakeup in preemptive per-cpu testing monitor.
     # cat enable
     0

For further information, see the comments in the header of
kernel/trace/rv/rv.c from this patch.

Link: https://lkml.kernel.org/r/a4bfe038f50cb047bfb343ad0e12b0e646ab308b.1659052063.git.bristot@kernel.org

Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Gabriele Paoloni <gpaoloni@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Tao Zhou <tao.zhou@linux.dev>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-30 14:01:28 -04:00
Linyu Yuan
95c104c378 tracing: Auto generate event name when creating a group of events
Currently when creating a specific group of trace events,
take kprobe event as example, the user must use the following format:
p:GRP/EVENT [MOD:]KSYM[+OFFS]|KADDR [FETCHARGS],
which means user must enter EVENT name, one example is:

  echo 'p:usb_gadget/config_usb_cfg_link config_usb_cfg_link $arg1' >> kprobe_events

It is not simple if there are too many entries because the event name is
the same as symbol name.

This change allows user to specify no EVENT name, format changed as:

   p:GRP/ [MOD:]KSYM[+OFFS]|KADDR [FETCHARGS]

It will generate event name automatically and one example is:

  echo 'p:usb_gadget/ config_usb_cfg_link $arg1' >> kprobe_events.

Link: https://lore.kernel.org/all/1656296348-16111-4-git-send-email-quic_linyyuan@quicinc.com/

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Linyu Yuan <quic_linyyuan@quicinc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-24 19:11:17 -04:00
Linyu Yuan
f360ea5641 tracing: eprobe: Remove duplicate is_good_name() operation
traceprobe_parse_event_name() already validate SYSTEM and EVENT name,
there is no need to call is_good_name() after it.

Link: https://lore.kernel.org/all/1656296348-16111-3-git-send-email-quic_linyyuan@quicinc.com/

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Linyu Yuan <quic_linyyuan@quicinc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-24 19:11:17 -04:00
Linyu Yuan
b774926c73 tracing: eprobe: Add missing log index
Add trace_probe_log_set_index(1) to allow report correct error
if user input wrong SYSTEM.EVENT format.

Link: https://lore.kernel.org/all/1656296348-16111-2-git-send-email-quic_linyyuan@quicinc.com/

Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Linyu Yuan <quic_linyyuan@quicinc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-24 19:11:17 -04:00
Jakub Kicinski
b3fce974d4 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
bpf-next 2022-07-22

We've added 73 non-merge commits during the last 12 day(s) which contain
a total of 88 files changed, 3458 insertions(+), 860 deletions(-).

The main changes are:

1) Implement BPF trampoline for arm64 JIT, from Xu Kuohai.

2) Add ksyscall/kretsyscall section support to libbpf to simplify tracing kernel
   syscalls through kprobe mechanism, from Andrii Nakryiko.

3) Allow for livepatch (KLP) and BPF trampolines to attach to the same kernel
   function, from Song Liu & Jiri Olsa.

4) Add new kfunc infrastructure for netfilter's CT e.g. to insert and change
   entries, from Kumar Kartikeya Dwivedi & Lorenzo Bianconi.

5) Add a ksym BPF iterator to allow for more flexible and efficient interactions
   with kernel symbols, from Alan Maguire.

6) Bug fixes in libbpf e.g. for uprobe binary path resolution, from Dan Carpenter.

7) Fix BPF subprog function names in stack traces, from Alexei Starovoitov.

8) libbpf support for writing custom perf event readers, from Jon Doron.

9) Switch to use SPDX tag for BPF helper man page, from Alejandro Colomar.

10) Fix xsk send-only sockets when in busy poll mode, from Maciej Fijalkowski.

11) Reparent BPF maps and their charging on memcg offlining, from Roman Gushchin.

12) Multiple follow-up fixes around BPF lsm cgroup infra, from Stanislav Fomichev.

13) Use bootstrap version of bpftool where possible to speed up builds, from Pu Lehui.

14) Cleanup BPF verifier's check_func_arg() handling, from Joanne Koong.

15) Make non-prealloced BPF map allocations low priority to play better with
    memcg limits, from Yafang Shao.

16) Fix BPF test runner to reject zero-length data for skbs, from Zhengchao Shao.

17) Various smaller cleanups and improvements all over the place.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (73 commits)
  bpf: Simplify bpf_prog_pack_[size|mask]
  bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)
  bpf, x64: Allow to use caller address from stack
  ftrace: Allow IPMODIFY and DIRECT ops on the same function
  ftrace: Add modify_ftrace_direct_multi_nolock
  bpf/selftests: Fix couldn't retrieve pinned program in xdp veth test
  bpf: Fix build error in case of !CONFIG_DEBUG_INFO_BTF
  selftests/bpf: Fix test_verifier failed test in unprivileged mode
  selftests/bpf: Add negative tests for new nf_conntrack kfuncs
  selftests/bpf: Add tests for new nf_conntrack kfuncs
  selftests/bpf: Add verifier tests for trusted kfunc args
  net: netfilter: Add kfuncs to set and change CT status
  net: netfilter: Add kfuncs to set and change CT timeout
  net: netfilter: Add kfuncs to allocate and insert CT
  net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup
  bpf: Add documentation for kfuncs
  bpf: Add support for forcing kfunc args to be trusted
  bpf: Switch to new kfunc flags infrastructure
  tools/resolve_btfids: Add support for 8-byte BTF sets
  bpf: Introduce 8-byte BTF set
  ...
====================

Link: https://lore.kernel.org/r/20220722221218.29943-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-22 16:55:44 -07:00
Song Liu
53cd885bc5 ftrace: Allow IPMODIFY and DIRECT ops on the same function
IPMODIFY (livepatch) and DIRECT (bpf trampoline) ops are both important
users of ftrace. It is necessary to allow them work on the same function
at the same time.

First, DIRECT ops no longer specify IPMODIFY flag. Instead, DIRECT flag is
handled together with IPMODIFY flag in __ftrace_hash_update_ipmodify().

Then, a callback function, ops_func, is added to ftrace_ops. This is used
by ftrace core code to understand whether the DIRECT ops can share with an
IPMODIFY ops. To share with IPMODIFY ops, the DIRECT ops need to implement
the callback function and adjust the direct trampoline accordingly.

If DIRECT ops is attached before the IPMODIFY ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_PEER on the DIRECT ops before registering the
IPMODIFY ops.

If IPMODIFY ops is attached before the DIRECT ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_SELF in __ftrace_hash_update_ipmodify. Owner of the
DIRECT ops may return 0 if the DIRECT trampoline can share with IPMODIFY,
so error code otherwise. The error code is propagated to
register_ftrace_direct_multi so that onwer of the DIRECT trampoline can
handle it properly.

For more details, please refer to comment before enum ftrace_ops_cmd.

Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20220602193706.2607681-2-song@kernel.org/
Link: https://lore.kernel.org/all/20220718055449.3960512-1-song@kernel.org/
Link: https://lore.kernel.org/bpf/20220720002126.803253-3-song@kernel.org
2022-07-22 22:04:30 +02:00
Song Liu
f96f644ab9 ftrace: Add modify_ftrace_direct_multi_nolock
This is similar to modify_ftrace_direct_multi, but does not acquire
direct_mutex. This is useful when direct_mutex is already locked by the
user.

Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/bpf/20220720002126.803253-2-song@kernel.org
2022-07-22 22:04:24 +02:00
Bart Van Assche
020e3618cc blktrace: Fix the blk_fill_rwbs() kernel-doc header
Reflect recent changes in the blk_fill_rwbs() kernel-doc header.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Chaitanya Kulkarni <kch@nvidia.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 919dbca867 ("blktrace: Use the new blk_opf_t type")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220715184735.2326034-3-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-15 13:10:04 -06:00
Jakub Kicinski
816cd16883 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
include/net/sock.h
  310731e2f1 ("net: Fix data-races around sysctl_mem.")
  e70f3c7012 ("Revert "net: set SK_MEM_QUANTUM to 4096"")
https://lore.kernel.org/all/20220711120211.7c8b7cba@canb.auug.org.au/

net/ipv4/fib_semantics.c
  747c143072 ("ip: fix dflt addr selection for connected nexthop")
  d62607c3fe ("net: rename reference+tracking helpers")

net/tls/tls.h
include/net/tls.h
  3d8c51b25a ("net/tls: Check for errors in tls_device_init")
  5879031423 ("tls: create an internal header")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-14 15:27:35 -07:00
Bart Van Assche
919dbca867 blktrace: Use the new blk_opf_t type
Improve static type checking by using the new blk_opf_t type for a function
argument that represents a combination of a request operation and request
flags. Rename that argument from 'op' into 'opf' to make its role more
clear.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-12-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14 12:14:30 -06:00
Bart Van Assche
22c80aac88 blktrace: Trace remapped requests correctly
Trace the remapped operation and its flags instead of only the data
direction of remapped operations. This issue was detected by analyzing
the warnings reported by sparse related to the new blk_opf_t type.

Reviewed-by: Jun'ichi Nomura <junichi.nomura@nec.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Chaitanya Kulkarni <kch@nvidia.com>
Fixes: 1b9a9ab78b ("blktrace: use op accessors")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-11-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14 12:14:30 -06:00
Christoph Hellwig
900d156bac block: remove bdevname
Replace the remaining calls of bdevname with snprintf using the %pg
format specifier.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220713055317.1888500-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14 10:27:56 -06:00
Zheng Yejian
fb991f1942 tracing/histograms: Simplify create_hist_fields()
When I look into implements of create_hist_fields(), I think there can be
following two simplifications:
  1. If something wrong happened in parse_var_defs(), free_var_defs() would
     have been called in it, so no need goto free again after calling it;
  2. After calling create_key_fields(), regardless of the value of 'ret', it
     then always runs into 'out: ', so the judge of 'ret' is redundant.

Link: https://lkml.kernel.org/r/20220630013152.164871-1-zhengyejian1@huawei.com

Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Reviewed-by: Tom Rix <trix@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-12 17:35:11 -04:00
Xiang wangx
94c255ac67 tracing/user_events: Fix syntax errors in comments
Delete the redundant word 'have'.

Link: https://lkml.kernel.org/r/20220606023007.23377-1-wangxiang@cdjrlc.com

Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-12 17:35:11 -04:00
Steven Rostedt (Google)
0a6d7d4541 ftrace: Be more specific about arch impact when function tracer is enabled
It was brought up that on ARMv7, that because the FUNCTION_TRACER does not
use nops to keep function tracing disabled because of the use of a link
register, it does have some performance impact.

The start of functions when -pg is used to compile the kernel is:

	push    {lr}
	bl      8010e7c0 <__gnu_mcount_nc>

When function tracing is tuned off, it becomes:

	push    {lr}
	add   sp, sp, #4

Which just puts the stack back to its normal location. But these two
instructions at the start of every function does incur some overhead.

Be more honest in the Kconfig FUNCTION_TRACER description and specify that
the overhead being in the noise was x86 specific, but other architectures
may vary.

Link: https://lore.kernel.org/all/20220705105416.GE5208@pengutronix.de/
Link: https://lkml.kernel.org/r/20220706161231.085a83da@gandalf.local.home

Reported-by: Sascha Hauer <sha@pengutronix.de>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-12 16:36:34 -04:00
Douglas Anderson
495fcec864 tracing: Fix sleeping while atomic in kdb ftdump
If you drop into kdb and type "ftdump" you'll get a sleeping while
atomic warning from memory allocation in trace_find_next_entry().

This appears to have been caused by commit ff895103a8 ("tracing:
Save off entry when peeking at next entry"), which added the
allocation in that path. The problematic commit was already fixed by
commit 8e99cf91b9 ("tracing: Do not allocate buffer in
trace_find_next_entry() in atomic") but that fix missed the kdb case.

The fix here is easy: just move the assignment of the static buffer to
the place where it should have been to begin with:
trace_init_global_iter(). That function is called in two places, once
is right before the assignment of the static buffer added by the
previous fix and once is in kdb.

Note that it appears that there's a second static buffer that we need
to assign that was added in commit efbbdaa22b ("tracing: Show real
address for trace event arguments"), so we'll move that too.

Link: https://lkml.kernel.org/r/20220708170919.1.I75844e5038d9425add2ad853a608cb44bb39df40@changeid

Fixes: ff895103a8 ("tracing: Save off entry when peeking at next entry")
Fixes: efbbdaa22b ("tracing: Show real address for trace event arguments")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-12 16:35:58 -04:00
Zheng Yejian
7edc3945bd tracing/histograms: Fix memory leak problem
This reverts commit 46bbe5c671.

As commit 46bbe5c671 ("tracing: fix double free") said, the
"double free" problem reported by clang static analyzer is:
  > In parse_var_defs() if there is a problem allocating
  > var_defs.expr, the earlier var_defs.name is freed.
  > This free is duplicated by free_var_defs() which frees
  > the rest of the list.

However, if there is a problem allocating N-th var_defs.expr:
  + in parse_var_defs(), the freed 'earlier var_defs.name' is
    actually the N-th var_defs.name;
  + then in free_var_defs(), the names from 0th to (N-1)-th are freed;

                        IF ALLOCATING PROBLEM HAPPENED HERE!!! -+
                                                                 \
                                                                  |
          0th           1th                 (N-1)-th      N-th    V
          +-------------+-------------+-----+-------------+-----------
var_defs: | name | expr | name | expr | ... | name | expr | name | ///
          +-------------+-------------+-----+-------------+-----------

These two frees don't act on same name, so there was no "double free"
problem before. Conversely, after that commit, we get a "memory leak"
problem because the above "N-th var_defs.name" is not freed.

If enable CONFIG_DEBUG_KMEMLEAK and inject a fault at where the N-th
var_defs.expr allocated, then execute on shell like:
  $ echo 'hist:key=call_site:val=$v1,$v2:v1=bytes_req,v2=bytes_alloc' > \
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger

Then kmemleak reports:
  unreferenced object 0xffff8fb100ef3518 (size 8):
    comm "bash", pid 196, jiffies 4295681690 (age 28.538s)
    hex dump (first 8 bytes):
      76 31 00 00 b1 8f ff ff                          v1......
    backtrace:
      [<0000000038fe4895>] kstrdup+0x2d/0x60
      [<00000000c99c049a>] event_hist_trigger_parse+0x206f/0x20e0
      [<00000000ae70d2cc>] trigger_process_regex+0xc0/0x110
      [<0000000066737a4c>] event_trigger_write+0x75/0xd0
      [<000000007341e40c>] vfs_write+0xbb/0x2a0
      [<0000000087fde4c2>] ksys_write+0x59/0xd0
      [<00000000581e9cdf>] do_syscall_64+0x3a/0x80
      [<00000000cf3b065c>] entry_SYSCALL_64_after_hwframe+0x46/0xb0

Link: https://lkml.kernel.org/r/20220711014731.69520-1-zhengyejian1@huawei.com

Cc: stable@vger.kernel.org
Fixes: 46bbe5c671 ("tracing: fix double free")
Reported-by: Hulk Robot <hulkci@huawei.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-12 16:35:42 -04:00
Jakub Kicinski
0076cad301 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2022-07-09

We've added 94 non-merge commits during the last 19 day(s) which contain
a total of 125 files changed, 5141 insertions(+), 6701 deletions(-).

The main changes are:

1) Add new way for performing BTF type queries to BPF, from Daniel Müller.

2) Add inlining of calls to bpf_loop() helper when its function callback is
   statically known, from Eduard Zingerman.

3) Implement BPF TCP CC framework usability improvements, from Jörn-Thorben Hinz.

4) Add LSM flavor for attaching per-cgroup BPF programs to existing LSM
   hooks, from Stanislav Fomichev.

5) Remove all deprecated libbpf APIs in prep for 1.0 release, from Andrii Nakryiko.

6) Add benchmarks around local_storage to BPF selftests, from Dave Marchevsky.

7) AF_XDP sample removal (given move to libxdp) and various improvements around AF_XDP
   selftests, from Magnus Karlsson & Maciej Fijalkowski.

8) Add bpftool improvements for memcg probing and bash completion, from Quentin Monnet.

9) Add arm64 JIT support for BPF-2-BPF coupled with tail calls, from Jakub Sitnicki.

10) Sockmap optimizations around throughput of UDP transmissions which have been
    improved by 61%, from Cong Wang.

11) Rework perf's BPF prologue code to remove deprecated functions, from Jiri Olsa.

12) Fix sockmap teardown path to avoid sleepable sk_psock_stop, from John Fastabend.

13) Fix libbpf's cleanup around legacy kprobe/uprobe on error case, from Chuang Wang.

14) Fix libbpf's bpf_helpers.h to work with gcc for the case of its sec/pragma
    macro, from James Hilliard.

15) Fix libbpf's pt_regs macros for riscv to use a0 for RC register, from Yixun Lan.

16) Fix bpftool to show the name of type BPF_OBJ_LINK, from Yafang Shao.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (94 commits)
  selftests/bpf: Fix xdp_synproxy build failure if CONFIG_NF_CONNTRACK=m/n
  bpf: Correctly propagate errors up from bpf_core_composites_match
  libbpf: Disable SEC pragma macro on GCC
  bpf: Check attach_func_proto more carefully in check_return_code
  selftests/bpf: Add test involving restrict type qualifier
  bpftool: Add support for KIND_RESTRICT to gen min_core_btf command
  MAINTAINERS: Add entry for AF_XDP selftests files
  selftests, xsk: Rename AF_XDP testing app
  bpf, docs: Remove deprecated xsk libbpf APIs description
  selftests/bpf: Add benchmark for local_storage RCU Tasks Trace usage
  libbpf, riscv: Use a0 for RC register
  libbpf: Remove unnecessary usdt_rel_ip assignments
  selftests/bpf: Fix few more compiler warnings
  selftests/bpf: Fix bogus uninitialized variable warning
  bpftool: Remove zlib feature test from Makefile
  libbpf: Cleanup the legacy uprobe_event on failed add/attach_event()
  libbpf: Fix wrong variable used in perf_event_uprobe_open_legacy()
  libbpf: Cleanup the legacy kprobe_event on failed add/attach_event()
  selftests/bpf: Add type match test against kernel's task_struct
  selftests/bpf: Add nested type to type based tests
  ...
====================

Link: https://lore.kernel.org/r/20220708233145.32365-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-09 12:24:16 -07:00
Frederic Weisbecker
493c182282 context_tracking: Take NMI eqs entrypoints over RCU
The RCU dynticks counter is going to be merged into the context tracking
subsystem. Prepare with moving the NMI extended quiescent states
entrypoints to context tracking. For now those are dumb redirection to
existing RCU calls.

Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
Cc: Alex Belits <abelits@marvell.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-07-05 13:32:59 -07:00
Frederic Weisbecker
6f0e6c1598 context_tracking: Take IRQ eqs entrypoints over RCU
The RCU dynticks counter is going to be merged into the context tracking
subsystem. Prepare with moving the IRQ extended quiescent states
entrypoints to context tracking. For now those are dumb redirection to
existing RCU calls.

[ paulmck: Apply Stephen Rothwell feedback from -next. ]
[ paulmck: Apply Nathan Chancellor feedback. ]

Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
Cc: Alex Belits <abelits@marvell.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-07-05 13:32:59 -07:00
Jakub Kicinski
0d8730f07c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c
  9c5de246c1 ("net: sparx5: mdb add/del handle non-sparx5 devices")
  fbb89d02e3 ("net: sparx5: Allow mdb entries to both CPU and ports")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-30 16:31:00 -07:00
Christoph Hellwig
cc5c516df0 block: simplify blktrace sysfs attribute creation
Add the trace attributes to the default gendisk attributes, just like
we already do for partitions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220628171850.1313069-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-28 11:32:42 -06:00
Linus Torvalds
a237cfd6b7 block-5.19-2022-06-24
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmK19ZkQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpsyBD/9whWHSyTpVftMJHJC64E+orbjeu2jFOg3P
 XjrwQ6caqCu+JjsnZWi7EzQv/UyJrWgjX3blPOl+1kiiM4LyPkzHlaFTJFZWNmUz
 6JFmAGWybOywUauwynIPT32Jc7ccbqGMy7QbEUWNiAxa384F57fPx2rUcEgD8nSB
 GGr+ljZdEcfdJ6BRrwBexfS07aVrt8SYwnluO6h1YHbxlO9cw+6ga8Y0z6ncBVFW
 XDqkzwH3oqUaGlHqu879rtQpedaB2zz/sUvWiy+KbBTyN4K/vE5ja3/x8OyIcfIU
 8LWwOCSOZeo1i7juXNFr7Ay1ldGsn6D5QeT0oNGDUh893JUz4kVI+iyoIqvVkLTE
 4MQohMddahlsf95sH/xuvJQ2WMXmy1ZjkaaDCws55zqIr6baRuI44jOFElKy2Peu
 0X7qWwri9Uk/zzCo6LhOKbcvLBriy89YchY9I3thSWIrSTx9QoIbgs8z2kG2qYx8
 pbfSKASUfyXcdgOIanCmS2+On7T7lDLEflCm/WvTDoc07OI+CpsjnIFv1CtB/sy2
 bwC35rCuo+EukDZiFDtsNsvbeVnCo3l0WC+rozean5j6zz64hdpC/eofwJmn2RfZ
 U9Ob0aDl4P+8Jta6Iqlfae78ZPAPcCzyvLb0k0l3ceMIY65x3Ss85JEFF7SA/VwJ
 /Dn12frm4Q==
 =3uDs
 -----END PGP SIGNATURE-----

Merge tag 'block-5.19-2022-06-24' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - Series fixing issues with sysfs locking and name reuse (Christoph)

 - NVMe pull request via Christoph:
      - Fix the mixed up CRIMS/CRWMS constants (Joel Granados)
      - Add another broken identifier quirk (Leo Savernik)
      - Fix up a quirk because Samsung reuses PCI IDs over different
        products (Christoph Hellwig)

 - Remove old WARN_ON() that doesn't apply anymore (Li)

 - Fix for using a stale cached request value for rq-qos throttling
   mechanisms that may schedule(), like iocost (me)

 - Remove unused parameter to blk_independent_access_range() (Damien)

* tag 'block-5.19-2022-06-24' of git://git.kernel.dk/linux-block:
  block: remove WARN_ON() from bd_link_disk_holder
  nvme: move the Samsung X5 quirk entry to the core quirks
  nvme: fix the CRIMS and CRWMS definitions to match the spec
  nvme: add a bogus subsystem NQN quirk for Micron MTFDKBA2T0TFH
  block: pop cached rq before potentially blocking rq_qos_throttle()
  block: remove queue from struct blk_independent_access_range
  block: freeze the queue earlier in del_gendisk
  block: remove per-disk debugfs files in blk_unregister_queue
  block: serialize all debugfs operations using q->debugfs_mutex
  block: disable the elevator int del_gendisk
2022-06-24 11:07:54 -07:00
Jakub Kicinski
93817be8b6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-23 12:33:24 -07:00
Linus Torvalds
fa1796a835 Tracing fixes:
- Check for NULL in kretprobe_dispatcher()
   NULL can now be passed in, make sure it can handle it
 
 - Clean up unneeded #endif #ifdef of the same preprocessor check in the
   middle of the block.
 
 - Comment clean up
 
 - Remove unneeded initialization of the "ret" variable in
   __trace_uprobe_create()
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYrMu9hQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qpuZAP9gS8Xcd7nenV3i9j4lCFktWQrvQwvh
 wyNb9UuLqPVMUQEAkk4hzq38P2UvEOZ+v+WdJnXfOb3wpFhrxWFycz5ZVAw=
 =9WXA
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Check for NULL in kretprobe_dispatcher()

   NULL can now be passed in, make sure it can handle it

 - Clean up unneeded #endif #ifdef of the same preprocessor
   check in the middle of the block.

 - Comment clean up

 - Remove unneeded initialization of the "ret" variable in
   __trace_uprobe_create()

* tag 'trace-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/uprobes: Remove unwanted initialization in __trace_uprobe_create()
  tracefs: Fix syntax errors in comments
  tracing: Simplify conditional compilation code in tracing_set_tracer()
  tracing/kprobes: Check whether get_kretprobe() returns NULL in kretprobe_dispatcher()
2022-06-23 12:24:49 -05:00
Delyan Kratunov
aca80dd95e uprobe: gate bpf call behind BPF_EVENTS
The call into bpf from uprobes needs to be gated now that it doesn't use
the trace_events.h helpers.

Randy found this as a randconfig build failure on linux-next [1].

  [1]: https://lore.kernel.org/linux-next/2de99180-7d55-2fdf-134d-33198c27cc58@infradead.org/

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Delyan Kratunov <delyank@fb.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/cb8bfbbcde87ed5d811227a393ef4925f2aadb7b.camel@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-20 17:04:03 -07:00
Jakub Kicinski
9fb424c4c2 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2022-06-17

We've added 72 non-merge commits during the last 15 day(s) which contain
a total of 92 files changed, 4582 insertions(+), 834 deletions(-).

The main changes are:

1) Add 64 bit enum value support to BTF, from Yonghong Song.

2) Implement support for sleepable BPF uprobe programs, from Delyan Kratunov.

3) Add new BPF helpers to issue and check TCP SYN cookies without binding to a
   socket especially useful in synproxy scenarios, from Maxim Mikityanskiy.

4) Fix libbpf's internal USDT address translation logic for shared libraries as
   well as uprobe's symbol file offset calculation, from Andrii Nakryiko.

5) Extend libbpf to provide an API for textual representation of the various
   map/prog/attach/link types and use it in bpftool, from Daniel Müller.

6) Provide BTF line info for RV64 and RV32 JITs, and fix a put_user bug in the
   core seen in 32 bit when storing BPF function addresses, from Pu Lehui.

7) Fix libbpf's BTF pointer size guessing by adding a list of various aliases
   for 'long' types, from Douglas Raillard.

8) Fix bpftool to readd setting rlimit since probing for memcg-based accounting
   has been unreliable and caused a regression on COS, from Quentin Monnet.

9) Fix UAF in BPF cgroup's effective program computation triggered upon BPF link
   detachment, from Tadeusz Struk.

10) Fix bpftool build bootstrapping during cross compilation which was pointing
    to the wrong AR process, from Shahab Vahedi.

11) Fix logic bug in libbpf's is_pow_of_2 implementation, from Yuze Chi.

12) BPF hash map optimization to avoid grabbing spinlocks of all CPUs when there
    is no free element. Also add a benchmark as reproducer, from Feng Zhou.

13) Fix bpftool's codegen to bail out when there's no BTF, from Michael Mullin.

14) Various minor cleanup and improvements all over the place.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (72 commits)
  bpf: Fix bpf_skc_lookup comment wrt. return type
  bpf: Fix non-static bpf_func_proto struct definitions
  selftests/bpf: Don't force lld on non-x86 architectures
  selftests/bpf: Add selftests for raw syncookie helpers in TC mode
  bpf: Allow the new syncookie helpers to work with SKBs
  selftests/bpf: Add selftests for raw syncookie helpers
  bpf: Add helpers to issue and check SYN cookies in XDP
  bpf: Allow helpers to accept pointers with a fixed size
  bpf: Fix documentation of th_len in bpf_tcp_{gen,check}_syncookie
  selftests/bpf: add tests for sleepable (uk)probes
  libbpf: add support for sleepable uprobe programs
  bpf: allow sleepable uprobe programs to attach
  bpf: implement sleepable uprobes by chaining gps
  bpf: move bpf_prog to bpf.h
  libbpf: Fix internal USDT address translation logic for shared libraries
  samples/bpf: Check detach prog exist or not in xdp_fwd
  selftests/bpf: Avoid skipping certain subtests
  selftests/bpf: Fix test_varlen verification failure with latest llvm
  bpftool: Do not check return value from libbpf_set_strict_mode()
  Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
  ...
====================

Link: https://lore.kernel.org/r/20220617220836.7373-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17 19:35:19 -07:00
Gautam Menghani
12c3e0c92f tracing/uprobes: Remove unwanted initialization in __trace_uprobe_create()
Remove the unwanted initialization of variable 'ret'. This fixes the clang
scan warning: Value stored to 'ret' is never read [deadcode.DeadStores]

Link: https://lkml.kernel.org/r/20220612144232.145209-1-gautammenghani201@gmail.com

Signed-off-by: Gautam Menghani <gautammenghani201@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-06-17 19:12:07 -04:00
sunliming
f4b0d31809 tracing: Simplify conditional compilation code in tracing_set_tracer()
Two conditional compilation directives "#ifdef CONFIG_TRACER_MAX_TRACE"
are used consecutively, and no other code in between. Simplify conditional
the compilation code and only use one "#ifdef CONFIG_TRACER_MAX_TRACE".

Link: https://lkml.kernel.org/r/20220602140613.545069-1-sunliming@kylinos.cn

Signed-off-by: sunliming <sunliming@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-06-17 18:42:17 -04:00
Masami Hiramatsu (Google)
cc72b72073 tracing/kprobes: Check whether get_kretprobe() returns NULL in kretprobe_dispatcher()
There is a small chance that get_kretprobe(ri) returns NULL in
kretprobe_dispatcher() when another CPU unregisters the kretprobe
right after __kretprobe_trampoline_handler().

To avoid this issue, kretprobe_dispatcher() checks the get_kretprobe()
return value again. And if it is NULL, it returns soon because that
kretprobe is under unregistering process.

This issue has been introduced when the kretprobe is decoupled
from the struct kretprobe_instance by commit d741bf41d7
("kprobes: Remove kretprobe hash"). Before that commit, the
struct kretprob_instance::rp directly points the kretprobe
and it is never be NULL.

Link: https://lkml.kernel.org/r/165366693881.797669.16926184644089588731.stgit@devnote2

Reported-by: Yonghong Song <yhs@fb.com>
Fixes: d741bf41d7 ("kprobes: Remove kretprobe hash")
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: bpf <bpf@vger.kernel.org>
Cc: Kernel Team <kernel-team@fb.com>
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-06-17 17:40:06 -04:00
Masami Hiramatsu (Google)
c0f3bb4054 rethook: Reject getting a rethook if RCU is not watching
Since the rethook_recycle() will involve the call_rcu() for reclaiming
the rethook_instance, the rethook must be set up at the RCU available
context (non idle). This rethook_recycle() in the rethook trampoline
handler is inevitable, thus the RCU available check must be done before
setting the rethook trampoline.

This adds a rcu_is_watching() check in the rethook_try_get() so that
it will return NULL if it is called when !rcu_is_watching().

Fixes: 54ecbe6f1e ("rethook: Add a generic return hook")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/165461827269.280167.7379263615545598958.stgit@devnote2
2022-06-17 21:53:35 +02:00
Christoph Hellwig
5cf9c91ba9 block: serialize all debugfs operations using q->debugfs_mutex
Various places like I/O schedulers or the QOS infrastructure try to
register debugfs files on demans, which can race with creating and
removing the main queue debugfs directory.  Use the existing
debugfs_mutex to serialize all debugfs operations that rely on
q->debugfs_dir or the directories hanging off it.

To make the teardown code a little simpler declare all debugfs dentry
pointers and not just the main one uncoditionally in blkdev.h.

Move debugfs_mutex next to the dentries that it protects and document
what it is used for.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220614074827.458955-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-17 07:31:05 -06:00
Jiri Olsa
eb5fb03256 bpf: Force cookies array to follow symbols sorting
When user specifies symbols and cookies for kprobe_multi link
interface it's very likely the cookies will be misplaced and
returned to wrong functions (via get_attach_cookie helper).

The reason is that to resolve the provided functions we sort
them before passing them to ftrace_lookup_symbols, but we do
not do the same sort on the cookie values.

Fixing this by using sort_r function with custom swap callback
that swaps cookie values as well.

Fixes: 0236fec57a ("bpf: Resolve symbols with ftrace_lookup_symbols for kprobe multi link")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220615112118.497303-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16 19:42:21 -07:00
Jiri Olsa
eb1b2985fe ftrace: Keep address offset in ftrace_lookup_symbols
We want to store the resolved address on the same index as
the symbol string, because that's the user (bpf kprobe link)
code assumption.

Also making sure we don't store duplicates that might be
present in kallsyms.

Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Fixes: bed0d9a50d ("ftrace: Add ftrace_lookup_symbols function")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220615112118.497303-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16 19:42:21 -07:00
Delyan Kratunov
8c7dcb84e3 bpf: implement sleepable uprobes by chaining gps
uprobes work by raising a trap, setting a task flag from within the
interrupt handler, and processing the actual work for the uprobe on the
way back to userspace. As a result, uprobe handlers already execute in a
might_fault/_sleep context. The primary obstacle to sleepable bpf uprobe
programs is therefore on the bpf side.

Namely, the bpf_prog_array attached to the uprobe is protected by normal
rcu. In order for uprobe bpf programs to become sleepable, it has to be
protected by the tasks_trace rcu flavor instead (and kfree() called after
a corresponding grace period).

Therefore, the free path for bpf_prog_array now chains a tasks_trace and
normal grace periods one after the other.

Users who iterate under tasks_trace read section would
be safe, as would users who iterate under normal read sections (from
non-sleepable locations).

The downside is that the tasks_trace latency affects all perf_event-attached
bpf programs (and not just uprobe ones). This is deemed safe given the
possible attach rates for kprobe/uprobe/tp programs.

Separately, non-sleepable programs need access to dynamically sized
rcu-protected maps, so bpf_run_prog_array_sleepables now conditionally takes
an rcu read section, in addition to the overarching tasks_trace section.

Signed-off-by: Delyan Kratunov <delyank@fb.com>
Link: https://lore.kernel.org/r/ce844d62a2fd0443b08c5ab02e95bc7149f9aeb1.1655248076.git.delyank@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16 19:27:29 -07:00
Linus Torvalds
825464e79d Networking fixes for 5.19-rc2, including fixes from bpf and netfilter.
Current release - regressions:
   - eth: amt: fix possible null-ptr-deref in amt_rcv()
 
 Previous releases - regressions:
   - tcp: use alloc_large_system_hash() to allocate table_perturb
 
   - af_unix: fix a data-race in unix_dgram_peer_wake_me()
 
   - nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling
 
   - eth: ixgbe: fix unexpected VLAN rx in promisc mode on VF
 
 Previous releases - always broken:
   - ipv6: fix signed integer overflow in __ip6_append_data
 
   - netfilter:
     - nat: really support inet nat without l3 address
     - nf_tables: memleak flow rule from commit path
 
   - bpf: fix calling global functions from BPF_PROG_TYPE_EXT programs
 
   - openvswitch: fix misuse of the cached connection on tuple changes
 
   - nfc: nfcmrvl: fix memory leak in nfcmrvl_play_deferred
 
   - eth: altera: fix refcount leak in altera_tse_mdio_create
 
 Misc:
   - add Quentin Monnet to bpftool maintainers
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmKhykgSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkN7sQAIn+ZmzQqTm5MVWnlvt/GcRGjjMP2VQY
 60oS2re8QC773yWoP6PvXqxCSFc99paDCC5BmCK6DMLbp9yuVSp5W8iAPuFuyjXE
 /Nur4Ti57LcGJ8ZpcJheBD4cRFbf+xtsGzx9a1WhUDrCYASo7vqRes5Eos2dT7P7
 qjgTduhUtaj6S1CfenfTnYqemZPzSGa+1euDuQ/Bu4mjCPUTrNZZQVYjmfTYM9p1
 UzwfCQr9TtmRKo8wLFHnYDLoWHNpfp55SNL0ShAwIQqgldiJ2OdMje+a2Sa4m6uF
 etRz8H0WrGVqfneD424tdyZv4nwhHw5dnaSrGe8DGq98c4/lIIcVyC38oDAbfWqI
 l8p7ZmtvNid7rpgoQFcxKpb2TAYAI+jaFq5GySEhvj5ZAblNQgFyghfMGPoncXCO
 XW6va8TtP2lmHFScAljQiQb6GNwDO52x77/q14Jkwvr+DILRKXMZZ3hCGrKUn5JM
 lafGkdL5ufm+E9C9RlaWN3imb2KoRj+wdThgV79efEPGG1py7yLOPVMoOCP3qmLq
 torcGcfDi1LGb7ohQxN6tCMv0JgXjS5nd1i+qJnImpkhRrUmahOfmpnElHoPuzs3
 6FU8HR77Eo15x70Jt+WOMy4oXrNh2MeEm8/Fhpj84MEhKpxVn+2o/53M+++5h+ru
 YtiLwEri0dCA
 =rdoB
 -----END PGP SIGNATURE-----

Merge tag 'net-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bpf and netfilter.

  Current release - regressions:

   - eth: amt: fix possible null-ptr-deref in amt_rcv()

  Previous releases - regressions:

   - tcp: use alloc_large_system_hash() to allocate table_perturb

   - af_unix: fix a data-race in unix_dgram_peer_wake_me()

   - nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling

   - eth: ixgbe: fix unexpected VLAN rx in promisc mode on VF

  Previous releases - always broken:

   - ipv6: fix signed integer overflow in __ip6_append_data

   - netfilter:
       - nat: really support inet nat without l3 address
       - nf_tables: memleak flow rule from commit path

   - bpf: fix calling global functions from BPF_PROG_TYPE_EXT programs

   - openvswitch: fix misuse of the cached connection on tuple changes

   - nfc: nfcmrvl: fix memory leak in nfcmrvl_play_deferred

   - eth: altera: fix refcount leak in altera_tse_mdio_create

  Misc:

   - add Quentin Monnet to bpftool maintainers"

* tag 'net-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits)
  net: amd-xgbe: fix clang -Wformat warning
  tcp: use alloc_large_system_hash() to allocate table_perturb
  net: dsa: realtek: rtl8365mb: fix GMII caps for ports with internal PHY
  net: dsa: mv88e6xxx: correctly report serdes link failure
  net: dsa: mv88e6xxx: fix BMSR error to be consistent with others
  net: dsa: mv88e6xxx: use BMSR_ANEGCOMPLETE bit for filling an_complete
  net: altera: Fix refcount leak in altera_tse_mdio_create
  net: openvswitch: fix misuse of the cached connection on tuple changes
  net: ethernet: mtk_eth_soc: fix misuse of mem alloc interface netdev[napi]_alloc_frag
  ip_gre: test csum_start instead of transport header
  au1000_eth: stop using virt_to_bus()
  ipv6: Fix signed integer overflow in l2tp_ip6_sendmsg
  ipv6: Fix signed integer overflow in __ip6_append_data
  nfc: nfcmrvl: Fix memory leak in nfcmrvl_play_deferred
  nfc: st21nfca: fix incorrect sizing calculations in EVT_TRANSACTION
  nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling
  nfc: st21nfca: fix incorrect validating logic in EVT_TRANSACTION
  net: ipv6: unexport __init-annotated seg6_hmac_init()
  net: xfrm: unexport __init-annotated xfrm4_protocol_init()
  net: mdio: unexport __init-annotated mdio_bus_init()
  ...
2022-06-09 12:06:52 -07:00
Dan Carpenter
fd58f7df24 bpf: Use safer kvmalloc_array() where possible
The kvmalloc_array() function is safer because it has a check for
integer overflows.  These sizes come from the user and I was not
able to see any bounds checking so an integer overflow seems like a
realistic concern.

Fixes: 0dcac27254 ("bpf: Add multi kprobe link")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/Yo9VRVMeHbALyjUH@kili
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07 10:40:53 -07:00
Linus Torvalds
21873bd66b arm64 fixes for 5.19-rc1:
- Initialise jump labels before setup_machine_fdt(), needed by commit
   f5bda35fba ("random: use static branch for crng_ready()").
 
 - Sparse warnings: missing prototype, incorrect __user annotation.
 
 - Skip SVE kselftest if not sufficient vector lengths supported.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmKaQ/8ACgkQa9axLQDI
 XvE+eRAAjVQr4ehVgT75SIYq9cv/6A8cAlYzK4hi7cwhFLTii/9flZLmqTfjUwbY
 Ifnm1/c/ka11vAC4JfzhEcAjIm5jJ7Wlw4XqMhGC2woTjmCrDxSlG1qrdt1zZlgI
 H7RXCjCpfstCWFoZy4uDaaWZYgN6c91/IPuuamBUfADXu5NbeD9yfZALYUhiE9FP
 yZ61BLDP0f8l0x5PMmDATidZ/vtIp+Hfiq1iezX0VugIY4+v6nbi6/Ba20Wc419v
 cmDgjRc1ODaWnuWLUTiriUlt2VEtBs3LrGXa81WSrTKR9+DXih4Gwj4orpW7Rqx3
 CFeSgVCDd5V9/XGEExN5GtMtiolDWNG8dA58jwwaWHWgp8AveSGY8AxPT+8/30/S
 FQyVGjxrX1JUkIya3lbVuDLKHdzCKEX0O83dFQSm3bP6lUMZUCYCcFNpVpSYPsQr
 ZZea9dNWhwiI82z19p7igfWKwerBMdUeptxMm247jKmdz4H/i6xGtK6nJc731zCZ
 kjxoI6HCYAMH/r9kgWYldgeov/UNQXi5Q2ftsXkUepPPjpEd5YBJfkL6CzKgQ6u5
 wtOTI8fainjPVWPGbcpHnEjlwG5UMY80jcJpfkD54g9N5WxunCWf7L3OZj77toTp
 Z2kU2b189Di4bNOV9bCKFqrecqajNychMhXcc5anMfdN+284mdg=
 =25eW
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:
 "Most of issues addressed were introduced during this merging window.

   - Initialise jump labels before setup_machine_fdt(), needed by commit
     f5bda35fba ("random: use static branch for crng_ready()").

   - Sparse warnings: missing prototype, incorrect __user annotation.

   - Skip SVE kselftest if not sufficient vector lengths supported"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  kselftest/arm64: signal: Skip SVE signal test if not enough VLs supported
  arm64: Initialize jump labels before setup_machine_fdt()
  arm64: hibernate: Fix syntax errors in comments
  arm64: Remove the __user annotation for the restore_za_context() argument
  ftrace/fgraph: fix increased missing-prototypes warnings
2022-06-03 14:05:34 -07:00
Chengming Zhou
ff979b2a9d ftrace/fgraph: fix increased missing-prototypes warnings
After commit e999995c84 ("ftrace: cleanup ftrace_graph_caller enable
and disable") merged into the linux-next tree, the kernel test robot
(lkp@intel.com) has send out report that there are increased missing-prototypes
warnings caused by that commit.

COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 \
	O=build_dir ARCH=sh SHELL=/bin/bash kernel/trace/
warning: no previous prototype for 'ftrace_enable_ftrace_graph_caller' [-Wmissing-prototypes]
warning: no previous prototype for 'ftrace_disable_ftrace_graph_caller' [-Wmissing-prototypes]
warning: no previous prototype for 'ftrace_return_to_handler' [-Wmissing-prototypes]
warning: no previous prototype for 'ftrace_graph_sleep_time_control' [-Wmissing-prototypes]

BTW there are so many missing-prototypes warnings if build kernel with "W=1".

The increased warnings for 'ftrace_[enable,disable]_ftrace_graph_caller'
is caused by CONFIG_FUNCTION_GRAPH_TRACER && !CONFIG_DYNAMIC_FTRACE,
so the declarations in <linux/ftrace.h> can't be seen in fgraph.c.

And this warning can't reproduce on x86_64 since x86_64 select
HAVE_FUNCTION_GRAPH_TRACER only when DYNAMIC_FTRACE, so fgraph.c will
always see the declarations in <linux/ftrace.h>.

This patch fix the increased warnings by put the definitions in
CONFIG_DYNAMIC_FTRACE although there are no real problems exist.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20220506032737.23375-1-zhouchengming@bytedance.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-06-01 18:14:17 +01:00
Linus Torvalds
76bfd3de34 tracing updates for 5.19:
- The majority of the changes are for fixes and clean ups.
 
 Noticeable changes:
 
 - Rework trace event triggers code to be easier to interact with.
 
 - Support for embedding bootconfig with the kernel (as suppose to having it
   embedded in initram). This is useful for embedded boards without initram
   disks.
 
 - Speed up boot by parallelizing the creation of tracefs files.
 
 - Allow absolute ring buffer timestamps handle timestamps that use more than
   59 bits.
 
 - Added new tracing clock "TAI" (International Atomic Time)
 
 - Have weak functions show up in available_filter_function list as:
    __ftrace_invalid_address___<invalid-offset>
   instead of using the name of the function before it.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYpOgXRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qjkKAQDbpemxvpFyJlZqT8KgEIXubu+ag2/q
 p0XDHaPS0zF9OQEAjTxg6GMEbnFYl6fzxZtOoEbiaQ7ppfdhRI8t6sSMVA8=
 =+nDD
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "The majority of the changes are for fixes and clean ups.

  Notable changes:

   - Rework trace event triggers code to be easier to interact with.

   - Support for embedding bootconfig with the kernel (as suppose to
     having it embedded in initram). This is useful for embedded boards
     without initram disks.

   - Speed up boot by parallelizing the creation of tracefs files.

   - Allow absolute ring buffer timestamps handle timestamps that use
     more than 59 bits.

   - Added new tracing clock "TAI" (International Atomic Time)

   - Have weak functions show up in available_filter_function list as:
     __ftrace_invalid_address___<invalid-offset> instead of using the
     name of the function before it"

* tag 'trace-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (52 commits)
  ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function
  tracing: Fix comments for event_trigger_separate_filter()
  x86/traceponit: Fix comment about irq vector tracepoints
  x86,tracing: Remove unused headers
  ftrace: Clean up hash direct_functions on register failures
  tracing: Fix comments of create_filter()
  tracing: Disable kcov on trace_preemptirq.c
  tracing: Initialize integer variable to prevent garbage return value
  ftrace: Fix typo in comment
  ftrace: Remove return value of ftrace_arch_modify_*()
  tracing: Cleanup code by removing init "char *name"
  tracing: Change "char *" string form to "char []"
  tracing/timerlat: Do not wakeup the thread if the trace stops at the IRQ
  tracing/timerlat: Print stacktrace in the IRQ handler if needed
  tracing/timerlat: Notify IRQ new max latency only if stop tracing is set
  kprobes: Fix build errors with CONFIG_KRETPROBES=n
  tracing: Fix return value of trace_pid_write()
  tracing: Fix potential double free in create_var_ref()
  tracing: Use strim() to remove whitespace instead of doing it manually
  ftrace: Deal with error return code of the ftrace_process_locs() function
  ...
2022-05-29 10:31:36 -07:00
Steven Rostedt (Google)
b39181f7c6 ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function
If an unused weak function was traced, it's call to fentry will still
exist, which gets added into the __mcount_loc table. Ftrace will use
kallsyms to retrieve the name for each location in __mcount_loc to display
it in the available_filter_functions and used to enable functions via the
name matching in set_ftrace_filter/notrace. Enabling these functions do
nothing but enable an unused call to ftrace_caller. If a traced weak
function is overridden, the symbol of the function would be used for it,
which will either created duplicate names, or if the previous function was
not traced, it would be incorrectly be listed in available_filter_functions
as a function that can be traced.

This became an issue with BPF[1] as there are tooling that enables the
direct callers via ftrace but then checks to see if the functions were
actually enabled. The case of one function that was marked notrace, but
was followed by an unused weak function that was traced. The unused
function's call to fentry was added to the __mcount_loc section, and
kallsyms retrieved the untraced function's symbol as the weak function was
overridden. Since the untraced function would not get traced, the BPF
check would detect this and fail.

The real fix would be to fix kallsyms to not show addresses of weak
functions as the function before it. But that would require adding code in
the build to add function size to kallsyms so that it can know when the
function ends instead of just using the start of the next known symbol.

In the mean time, this is a work around. Add a FTRACE_MCOUNT_MAX_OFFSET
macro that if defined, ftrace will ignore any function that has its call
to fentry/mcount that has an offset from the symbol that is greater than
FTRACE_MCOUNT_MAX_OFFSET.

If CONFIG_HAVE_FENTRY is defined for x86, define FTRACE_MCOUNT_MAX_OFFSET
to zero (unless IBT is enabled), which will have ftrace ignore all locations
that are not at the start of the function (or one after the ENDBR
instruction).

A worker thread is added at boot up to scan all the ftrace record entries,
and will mark any that fail the FTRACE_MCOUNT_MAX_OFFSET test as disabled.
They will still appear in the available_filter_functions file as:

  __ftrace_invalid_address___<invalid-offset>

(showing the offset that caused it to be invalid).

This is required for tools that use libtracefs (like trace-cmd does) that
scan the available_filter_functions and enable set_ftrace_filter and
set_ftrace_notrace using indexes of the function listed in the file (this
is a speedup, as enabling thousands of files via names is an O(n^2)
operation and can take minutes to complete, where the indexing takes less
than a second).

The invalid functions cannot be removed from available_filter_functions as
the names there correspond to the ftrace records in the array that manages
them (and the indexing depends on this).

[1] https://lore.kernel.org/all/20220412094923.0abe90955e5db486b7bca279@kernel.org/

Link: https://lkml.kernel.org/r/20220526141912.794c2786@gandalf.local.home

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-28 09:31:19 -04:00
sunliming
8d4a21b5ac tracing: Fix comments for event_trigger_separate_filter()
The parameter name in comments of event_trigger_separate_filter() is
inconsistent with actual parameter name, fix it.

Link: https://lkml.kernel.org/r/20220526072957.165655-1-sunliming@kylinos.cn

Signed-off-by: sunliming <sunliming@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 22:03:52 -04:00
Song Liu
7d54c15cb8 ftrace: Clean up hash direct_functions on register failures
We see the following GPF when register_ftrace_direct fails:

[ ] general protection fault, probably for non-canonical address \
  0x200000000000010: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
[...]
[ ] RIP: 0010:ftrace_find_rec_direct+0x53/0x70
[ ] Code: 48 c1 e0 03 48 03 42 08 48 8b 10 31 c0 48 85 d2 74 [...]
[ ] RSP: 0018:ffffc9000138bc10 EFLAGS: 00010206
[ ] RAX: 0000000000000000 RBX: ffffffff813e0df0 RCX: 000000000000003b
[ ] RDX: 0200000000000000 RSI: 000000000000000c RDI: ffffffff813e0df0
[ ] RBP: ffffffffa00a3000 R08: ffffffff81180ce0 R09: 0000000000000001
[ ] R10: ffffc9000138bc18 R11: 0000000000000001 R12: ffffffff813e0df0
[ ] R13: ffffffff813e0df0 R14: ffff888171b56400 R15: 0000000000000000
[ ] FS:  00007fa9420c7780(0000) GS:ffff888ff6a00000(0000) knlGS:000000000
[ ] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ ] CR2: 000000000770d000 CR3: 0000000107d50003 CR4: 0000000000370ee0
[ ] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ ] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ ] Call Trace:
[ ]  <TASK>
[ ]  register_ftrace_direct+0x54/0x290
[ ]  ? render_sigset_t+0xa0/0xa0
[ ]  bpf_trampoline_update+0x3f5/0x4a0
[ ]  ? 0xffffffffa00a3000
[ ]  bpf_trampoline_link_prog+0xa9/0x140
[ ]  bpf_tracing_prog_attach+0x1dc/0x450
[ ]  bpf_raw_tracepoint_open+0x9a/0x1e0
[ ]  ? find_held_lock+0x2d/0x90
[ ]  ? lock_release+0x150/0x430
[ ]  __sys_bpf+0xbd6/0x2700
[ ]  ? lock_is_held_type+0xd8/0x130
[ ]  __x64_sys_bpf+0x1c/0x20
[ ]  do_syscall_64+0x3a/0x80
[ ]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ ] RIP: 0033:0x7fa9421defa9
[ ] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 9 f8 [...]
[ ] RSP: 002b:00007ffed743bd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
[ ] RAX: ffffffffffffffda RBX: 00000000069d2480 RCX: 00007fa9421defa9
[ ] RDX: 0000000000000078 RSI: 00007ffed743bd80 RDI: 0000000000000011
[ ] RBP: 00007ffed743be00 R08: 0000000000bb7270 R09: 0000000000000000
[ ] R10: 00000000069da210 R11: 0000000000000246 R12: 0000000000000001
[ ] R13: 00007ffed743c4b0 R14: 00000000069d2480 R15: 0000000000000001
[ ]  </TASK>
[ ] Modules linked in: klp_vm(OK)
[ ] ---[ end trace 0000000000000000 ]---

One way to trigger this is:
  1. load a livepatch that patches kernel function xxx;
  2. run bpftrace -e 'kfunc:xxx {}', this will fail (expected for now);
  3. repeat #2 => gpf.

This is because the entry is added to direct_functions, but not removed.
Fix this by remove the entry from direct_functions when
register_ftrace_direct fails.

Also remove the last trailing space from ftrace.c, so we don't have to
worry about it anymore.

Link: https://lkml.kernel.org/r/20220524170839.900849-1-song@kernel.org

Cc: stable@vger.kernel.org
Fixes: 763e34e74b ("ftrace: Add register_ftrace_direct()")
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:13:01 -04:00
sunliming
0a54f556b0 tracing: Fix comments of create_filter()
The name in comments of parameter "filter_string" in function
create_filter is annotated as "filter_str", just fix it.

Link: https://lkml.kernel.org/r/20220524063937.52873-1-sunliming@kylinos.cn

Signed-off-by: sunliming <sunliming@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:13:01 -04:00
Congyu Liu
bb5eb8f3b3 tracing: Disable kcov on trace_preemptirq.c
Functions in trace_preemptirq.c could be invoked from early interrupt
code that bypasses kcov trace function's in_task() check. Disable kcov
on this file to reduce random code coverage.

Link: https://lkml.kernel.org/r/20220523063033.1778974-1-liu3101@purdue.edu

Acked-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Congyu Liu <liu3101@purdue.edu>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:13:00 -04:00
Gautam Menghani
154827f8e5 tracing: Initialize integer variable to prevent garbage return value
Initialize the integer variable to 0 to fix the clang scan warning:
Undefined or garbage value returned to caller
[core.uninitialized.UndefReturn]
        return ret;

Link: https://lkml.kernel.org/r/20220522061826.1751-1-gautammenghani201@gmail.com

Cc: stable@vger.kernel.org
Fixes: 8993665abc ("tracing/boot: Support multiple handlers for per-event histogram")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Gautam Menghani <gautammenghani201@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:13:00 -04:00
Julia Lawall
50c697819d ftrace: Fix typo in comment
Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.

Link: https://lkml.kernel.org/r/20220521111145.81697-81-Julia.Lawall@inria.fr

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:13:00 -04:00
Li kunyu
3a2bfec0b0 ftrace: Remove return value of ftrace_arch_modify_*()
All instances of the function ftrace_arch_modify_prepare() and
  ftrace_arch_modify_post_process() return zero. There's no point in
  checking their return value. Just have them be void functions.

Link: https://lkml.kernel.org/r/20220518023639.4065-1-kunyu@nfschina.com

Signed-off-by: Li kunyu <kunyu@nfschina.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:13:00 -04:00
liqiong
2decd16f47 tracing: Cleanup code by removing init "char *name"
The pointer is assigned to "type->name" anyway. no need to
initialize with "preemption".

Link: https://lkml.kernel.org/r/20220513075221.26275-1-liqiong@nfschina.com

Signed-off-by: liqiong <liqiong@nfschina.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:13:00 -04:00
liqiong
2d601b9864 tracing: Change "char *" string form to "char []"
The "char []" string form declares a single variable. It is better
than "char *" which creates two variables in the final assembly.

Link: https://lkml.kernel.org/r/20220512143230.28796-1-liqiong@nfschina.com

Signed-off-by: liqiong <liqiong@nfschina.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:13:00 -04:00
Daniel Bristot de Oliveira
9c556e5a4d tracing/timerlat: Do not wakeup the thread if the trace stops at the IRQ
There is no need to wakeup the timerlat/ thread if stop tracing is hit
at the timerlat's IRQ handler.

Return before waking up timerlat's thread.

Link: https://lkml.kernel.org/r/b392356c91b56aedd2b289513cc56a84cf87e60d.1652175637.git.bristot@kernel.org

Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:13:00 -04:00
Daniel Bristot de Oliveira
4dd2aea24e tracing/timerlat: Print stacktrace in the IRQ handler if needed
If print_stack and stop_tracing_us are set, and stop_tracing_us is hit
with latency higher than or equal to print_stack, print the
stack at the IRQ handler as it is useful to define the root cause for
the IRQ latency.

Link: https://lkml.kernel.org/r/fd04530ce98ae9270e41bb124ee5bf67b05ecfed.1652175637.git.bristot@kernel.org

Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:13:00 -04:00
Daniel Bristot de Oliveira
aa748949b4 tracing/timerlat: Notify IRQ new max latency only if stop tracing is set
Currently, the notification of a new max latency is sent from
timerlat's IRQ handler anytime a new max latency is found.

While this behavior is not wrong, the send IPI overhead itself
will increase the thread latency and that is not the desired
effect (tracing overhead).

Moreover, the thread will notify a new max latency again because
the thread latency as it is always higher than the IRQ latency
that woke it up.

The only case in which it is helpful to notify a new max latency
from IRQ is when stop tracing (for the IRQ) is set, as in this
case, the thread will not be dispatched.

Notify a new max latency from the IRQ handler only if stop tracing is
set for the IRQ handler.

Link: https://lkml.kernel.org/r/2c2d9a56c0886c8402ba320de32856cbbb10c2bb.1652175637.git.bristot@kernel.org

Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Reported-by: Clark Williams <williams@redhat.com>
Fixes: a955d7eac1 ("trace: Add timerlat tracer")
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:13:00 -04:00
Wonhyuk Yang
b27f266f74 tracing: Fix return value of trace_pid_write()
Setting set_event_pid with trailing whitespace lead to endless write
system calls like below.

    $ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid
    execve("/usr/bin/echo", ["echo", "123 "], ...) = 0
    ...
    write(1, "123 \n", 5)                   = 4
    write(1, "\n", 1)                       = 0
    write(1, "\n", 1)                       = 0
    write(1, "\n", 1)                       = 0
    write(1, "\n", 1)                       = 0
    write(1, "\n", 1)                       = 0
    ....

This is because, the result of trace_get_user's are not returned when it
read at least one pid. To fix it, update read variable even if
parser->idx == 0.

The result of applied patch is below.

    $ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid
    execve("/usr/bin/echo", ["echo", "123 "], ...) = 0
    ...
    write(1, "123 \n", 5)                   = 5
    close(1)                                = 0

Link: https://lkml.kernel.org/r/20220503050546.288911-1-vvghjk1234@gmail.com

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Baik Song An <bsahn@etri.re.kr>
Cc: Hong Yeon Kim <kimhy@etri.re.kr>
Cc: Taeung Song <taeung@reallinux.co.kr>
Cc: linuxgeek@linuxgeek.io
Cc: stable@vger.kernel.org
Fixes: 4909010788 ("tracing: Add set_event_pid directory for future use")
Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:12:59 -04:00
Keita Suzuki
99696a2592 tracing: Fix potential double free in create_var_ref()
In create_var_ref(), init_var_ref() is called to initialize the fields
of variable ref_field, which is allocated in the previous function call
to create_hist_field(). Function init_var_ref() allocates the
corresponding fields such as ref_field->system, but frees these fields
when the function encounters an error. The caller later calls
destroy_hist_field() to conduct error handling, which frees the fields
and the variable itself. This results in double free of the fields which
are already freed in the previous function.

Fix this by storing NULL to the corresponding fields when they are freed
in init_var_ref().

Link: https://lkml.kernel.org/r/20220425063739.3859998-1-keitasuzuki.park@sslab.ics.keio.ac.jp

Fixes: 067fe038e7 ("tracing: Add variable reference handling to hist triggers")
CC: stable@vger.kernel.org
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Keita Suzuki <keitasuzuki.park@sslab.ics.keio.ac.jp>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:12:59 -04:00
Yuntao Wang
cb24693d94 tracing: Use strim() to remove whitespace instead of doing it manually
The tracing_set_trace_write() function just removes the trailing whitespace
from the user supplied tracer name, but the leading whitespace should also
be removed.

In addition, if the user supplied tracer name contains only a few
whitespace characters, the first one will not be removed using the current
method, which results it a single whitespace character left in the buf.

To fix all of these issues, we use strim() to correctly remove both the
leading and trailing whitespace.

Link: https://lkml.kernel.org/r/20220121095623.1826679-1-ytcoode@gmail.com

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:12:59 -04:00
Yuntao Wang
2889c658b2 ftrace: Deal with error return code of the ftrace_process_locs() function
The ftrace_process_locs() function may return -ENOMEM error code, which
should be handled by the callers.

Link: https://lkml.kernel.org/r/20220120065949.1813231-1-ytcoode@gmail.com

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:12:59 -04:00
Yuntao Wang
e4931b824a tracing: Use trace_create_file() to simplify creation of tracefs entries
Creating tracefs entries with tracefs_create_file() followed by pr_warn()
is tedious and repetitive, we can use trace_create_file() to simplify
this process and make the code more readable.

Link: https://lkml.kernel.org/r/20220114131052.534382-1-ytcoode@gmail.com

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26 21:12:52 -04:00
Linus Torvalds
44d35720c9 sysctl changes for v5.19-rc1
For two kernel releases now kernel/sysctl.c has been being cleaned up
 slowly, since the tables were grossly long, sprinkled with tons of #ifdefs and
 all this caused merge conflicts with one susbystem or another.
 
 This tree was put together to help try to avoid conflicts with these cleanups
 going on different trees at time. So nothing exciting on this pull request,
 just cleanups.
 
 I actually had this sysctl-next tree up since v5.18 but I missed sending a
 pull request for it on time during the last merge window. And so these changes
 have been being soaking up on sysctl-next and so linux-next for a while.
 The last change was merged May 4th.
 
 Most of the compile issues were reported by 0day and fixed.
 
 To help avoid a conflict with bpf folks at Daniel Borkmann's request
 I merged bpf-next/pr/bpf-sysctl into sysctl-next to get the effor which
 moves the BPF sysctls from kernel/sysctl.c to BPF core.
 
 Possible merge conflicts and known resolutions as per linux-next:
 
 bfp:
 https://lkml.kernel.org/r/20220414112812.652190b5@canb.auug.org.au
 
 rcu:
 https://lkml.kernel.org/r/20220420153746.4790d532@canb.auug.org.au
 
 powerpc:
 https://lkml.kernel.org/r/20220520154055.7f964b76@canb.auug.org.au
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmKOq8ASHG1jZ3JvZkBr
 ZXJuZWwub3JnAAoJEM4jHQowkoinDAkQAJVo5YVM9f74UwYp4PQhTpjxJBCjRoZD
 z1u9bp5rMj2ujTC8Fr7VmzKaHrb8+r1C1WvCvZtIzemYNB4lZUrHpVDYfXuXiPRB
 ihPmEjhlPO5PFBx6cVCpI3cu9bEhG00rLc1QXnABx/pXwNPcOTJAGZJVamZvqubk
 chjgZrb7N+adHPfvS55v1+zpwdeKfpp5U3zuu5qlT/nn0GS0HCVzOj5fj4oC4wtJ
 IqfUubo+FX50Ga58yQABWNrjaPD9Crykz5ohVazy3ElQl0hJ4VsK65ct3blqc2vz
 1Bb8kPpWuv6aZ5nr1lCVE8qvF4ZIL33ySvpg5BSdWLQEDrBbSpzvJe9Yn7wgR+eq
 y7fhpO24+zRM82EoDMEvyxX9u1n1RsvoXRtf3ds9BGf63MUxk8a1cgjlU6vuyO2U
 JhDmfM1xzdKvPoY4COOnHzcAiIqzItTqKd09N5y0cahmYstROU8lvp9huhTAHqk1
 SjQMbLIZG7OnX8ZeQcR1EB8sq/IOPZT48ejj0iJmQ8FyMaep71MOQLYyLPAq4lgh
 JHXm8P6QdB57jfJbqAeNSyZoK0qdxOUR/83Zcah7Jjns6vkju1DNatEsaEEI2y2M
 4n7/rkHeZ3TyFHBUX4e9FomKvGLsAalDBRiqsuxLSOPMU8rGrNLAslOAtKwvp90X
 4ht3M2VP098l
 =btwh
 -----END PGP SIGNATURE-----

Merge tag 'sysctl-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux

Pull sysctl updates from Luis Chamberlain:
 "For two kernel releases now kernel/sysctl.c has been being cleaned up
  slowly, since the tables were grossly long, sprinkled with tons of
  #ifdefs and all this caused merge conflicts with one susbystem or
  another.

  This tree was put together to help try to avoid conflicts with these
  cleanups going on different trees at time. So nothing exciting on this
  pull request, just cleanups.

  Thanks a lot to the Uniontech and Huawei folks for doing some of this
  nasty work"

* tag 'sysctl-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (28 commits)
  sched: Fix build warning without CONFIG_SYSCTL
  reboot: Fix build warning without CONFIG_SYSCTL
  kernel/kexec_core: move kexec_core sysctls into its own file
  sysctl: minor cleanup in new_dir()
  ftrace: fix building with SYSCTL=y but DYNAMIC_FTRACE=n
  fs/proc: Introduce list_for_each_table_entry for proc sysctl
  mm: fix unused variable kernel warning when SYSCTL=n
  latencytop: move sysctl to its own file
  ftrace: fix building with SYSCTL=n but DYNAMIC_FTRACE=y
  ftrace: Fix build warning
  ftrace: move sysctl_ftrace_enabled to ftrace.c
  kernel/do_mount_initrd: move real_root_dev sysctls to its own file
  kernel/delayacct: move delayacct sysctls to its own file
  kernel/acct: move acct sysctls to its own file
  kernel/panic: move panic sysctls to its own file
  kernel/lockdep: move lockdep sysctls to its own file
  mm: move page-writeback sysctls to their own file
  mm: move oom_kill sysctls to their own file
  kernel/reboot: move reboot sysctls to its own file
  sched: Move energy_aware sysctls to topology.c
  ...
2022-05-26 16:57:20 -07:00
Li Huafei
e35c2d8e22 tracing: Reset the function filter after completing trampoline/graph selftest
The direct trampoline and graph coexistence test sets global_ops to
trace only 'trace_selftest_dynamic_test_func', but does not reset it
after the test is completed, resulting in the function filter being set
already after the system starts. Although it can be reset through the
tracefs interface, it is more or less confusing to the user, and we
should reset it to trace all functions after the trampoline/graph test
completes.

Link: https://lkml.kernel.org/r/20220427034119.24668-1-lihuafei1@huawei.com
Link: https://lore.kernel.org/all/20220418073958.104029-1-lihuafei1@huawei.com/

Fixes: 130c080658 ("tracing: Add trampoline/graph selftest")
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-25 16:57:37 -04:00
Steven Rostedt (Google)
499f12168a tracing: Have event format check not flag %p* on __get_dynamic_array()
The print fmt check against trace events to make sure that the format does
not use pointers that may be freed from the time of the trace to the time
the event is read, gives a false positive on %pISpc when reading data that
was saved in __get_dynamic_array() when it is perfectly fine to do so, as
the data being read is on the ring buffer.

Link: https://lore.kernel.org/all/20220407144524.2a592ed6@canb.auug.org.au/

Cc: stable@vger.kernel.org
Fixes: 5013f454a3 ("tracing: Add check of trace event print fmts for dereferencing pointers")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-25 16:57:37 -04:00
Linus Torvalds
7e062cda7d Networking changes for 5.19.
Core
 ----
 
  - Support TCPv6 segmentation offload with super-segments larger than
    64k bytes using the IPv6 Jumbogram extension header (AKA BIG TCP).
 
  - Generalize skb freeing deferral to per-cpu lists, instead of
    per-socket lists.
 
  - Add a netdev statistic for packets dropped due to L2 address
    mismatch (rx_otherhost_dropped).
 
  - Continue work annotating skb drop reasons.
 
  - Accept alternative netdev names (ALT_IFNAME) in more netlink
    requests.
 
  - Add VLAN support for AF_PACKET SOCK_RAW GSO.
 
  - Allow receiving skb mark from the socket as a cmsg.
 
  - Enable memcg accounting for veth queues, sysctl tables and IPv6.
 
 BPF
 ---
 
  - Add libbpf support for User Statically-Defined Tracing (USDTs).
 
  - Speed up symbol resolution for kprobes multi-link attachments.
 
  - Support storing typed pointers to referenced and unreferenced
    objects in BPF maps.
 
  - Add support for BPF link iterator.
 
  - Introduce access to remote CPU map elements in BPF per-cpu map.
 
  - Allow middle-of-the-road settings for the
    kernel.unprivileged_bpf_disabled sysctl.
 
  - Implement basic types of dynamic pointers e.g. to allow for
    dynamically sized ringbuf reservations without extra memory copies.
 
 Protocols
 ---------
 
  - Retire port only listening_hash table, add a second bind table
    hashed by port and address. Avoid linear list walk when binding
    to very popular ports (e.g. 443).
 
  - Add bridge FDB bulk flush filtering support allowing user space
    to remove all FDB entries matching a condition.
 
  - Introduce accept_unsolicited_na sysctl for IPv6 to implement
    router-side changes for RFC9131.
 
  - Support for MPTCP path manager in user space.
 
  - Add MPTCP support for fallback to regular TCP for connections
    that have never connected additional subflows or transmitted
    out-of-sequence data (partial support for RFC8684 fallback).
 
  - Avoid races in MPTCP-level window tracking, stabilize and improve
    throughput.
 
  - Support lockless operation of GRE tunnels with seq numbers enabled.
 
  - WiFi support for host based BSS color collision detection.
 
  - Add support for SO_TXTIME/SCM_TXTIME on CAN sockets.
 
  - Support transmission w/o flow control in CAN ISOTP (ISO 15765-2).
 
  - Support zero-copy Tx with TLS 1.2 crypto offload (sendfile).
 
  - Allow matching on the number of VLAN tags via tc-flower.
 
  - Add tracepoint for tcp_set_ca_state().
 
 Driver API
 ----------
 
  - Improve error reporting from classifier and action offload.
 
  - Add support for listing line cards in switches (devlink).
 
  - Add helpers for reporting page pool statistics with ethtool -S.
 
  - Add support for reading clock cycles when using PTP virtual clocks,
    instead of having the driver convert to time before reporting.
    This makes it possible to report time from different vclocks.
 
  - Support configuring low-latency Tx descriptor push via ethtool.
 
  - Separate Clause 22 and Clause 45 MDIO accesses more explicitly.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - Marvell's Octeon NIC PCI Endpoint support (octeon_ep)
    - Sunplus SP7021 SoC (sp7021_emac)
    - Add support for Renesas RZ/V2M (in ravb)
    - Add support for MediaTek mt7986 switches (in mtk_eth_soc)
 
  - Ethernet PHYs:
    - ADIN1100 industrial PHYs (w/ 10BASE-T1L and SQI reporting)
    - TI DP83TD510 PHY
    - Microchip LAN8742/LAN88xx PHYs
 
  - WiFi:
    - Driver for pureLiFi X, XL, XC devices (plfxlc)
    - Driver for Silicon Labs devices (wfx)
    - Support for WCN6750 (in ath11k)
    - Support Realtek 8852ce devices (in rtw89)
 
  - Mobile:
    - MediaTek T700 modems (Intel 5G 5000 M.2 cards)
 
  - CAN:
   - ctucanfd: add support for CTU CAN FD open-source IP core
     from Czech Technical University in Prague
 
 Drivers
 -------
 
  - Delete a number of old drivers still using virt_to_bus().
 
  - Ethernet NICs:
    - intel: support TSO on tunnels MPLS
    - broadcom: support multi-buffer XDP
    - nfp: support VF rate limiting
    - sfc: use hardware tx timestamps for more than PTP
    - mlx5: multi-port eswitch support
    - hyper-v: add support for XDP_REDIRECT
    - atlantic: XDP support (including multi-buffer)
    - macb: improve real-time perf by deferring Tx processing to NAPI
 
  - High-speed Ethernet switches:
    - mlxsw: implement basic line card information querying
    - prestera: add support for traffic policing on ingress and egress
 
  - Embedded Ethernet switches:
    - lan966x: add support for packet DMA (FDMA)
    - lan966x: add support for PTP programmable pins
    - ti: cpsw_new: enable bc/mc storm prevention
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - Wake-on-WLAN support for QCA6390 and WCN6855
    - device recovery (firmware restart) support
    - support setting Specific Absorption Rate (SAR) for WCN6855
    - read country code from SMBIOS for WCN6855/QCA6390
    - enable keep-alive during WoWLAN suspend
    - implement remain-on-channel support
 
  - MediaTek WiFi (mt76):
    - support Wireless Ethernet Dispatch offloading packet movement
      between the Ethernet switch and WiFi interfaces
    - non-standard VHT MCS10-11 support
    - mt7921 AP mode support
    - mt7921 IPv6 NS offload support
 
  - Ethernet PHYs:
    - micrel: ksz9031/ksz9131: cabletest support
    - lan87xx: SQI support for T1 PHYs
    - lan937x: add interrupt support for link detection
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmKNMPQACgkQMUZtbf5S
 IrsRARAAuDyYs6jFYB3p+xazZdOnbF4iAgVv71+DQGvmsCl6CB9OrsNZMlvE85OL
 Q3gjcRbgjrkN4lhgI8DmiGYbsUJnAvVjFdNjccz1Z/vTLYvuIM0ol54MUp5S+9WY
 StncOJkOGJxxR/Gi5gzVmejPDsysU3Jik+hm/fpIcz8pybXxAsFKU5waY5qfl+/T
 TZepfV0VCfqRDjqcF1qA5+jJZNU8pdodQlZ1+mh8bwu6Jk1ZkWkj6Ov8MWdwQldr
 LnPeK/9hIGzkdJYHZfajxA3t8D0K5CHzSuih2bJ9ry8ZXgVBkXEThew778/R5izW
 uB0YZs9COFlrIP7XHjtRTy/2xHOdYIPlj2nWhVdfuQDX8Crvt4VRN6EZ1rjko1ZJ
 WanfG6WHF8NH5pXBRQbh3kIMKBnYn6OIzuCfCQSqd+niHcxFIM4vRiggeXI5C5TW
 vJgEWfK6X+NfDiFVa3xyCrEmp5ieA/pNecpwd8rVkql+MtFAAw4vfsotLKOJEAru
 J/XL6UE+YuLqIJV9ACZ9x1AFXXAo661jOxBunOo4VXhXVzWS9lYYz5r5ryIkgT/8
 /Fr0zjANJWgfIuNdIBtYfQ4qG+LozGq038VA06RhFUAZ5tF9DzhqJs2Q2AFuWWBC
 ewCePJVqo1j2Ceq2mGonXRt47OEnlePoOxTk9W+cKZb7ZWE+zEo=
 =Wjii
 -----END PGP SIGNATURE-----

Merge tag 'net-next-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core
  ----

   - Support TCPv6 segmentation offload with super-segments larger than
     64k bytes using the IPv6 Jumbogram extension header (AKA BIG TCP).

   - Generalize skb freeing deferral to per-cpu lists, instead of
     per-socket lists.

   - Add a netdev statistic for packets dropped due to L2 address
     mismatch (rx_otherhost_dropped).

   - Continue work annotating skb drop reasons.

   - Accept alternative netdev names (ALT_IFNAME) in more netlink
     requests.

   - Add VLAN support for AF_PACKET SOCK_RAW GSO.

   - Allow receiving skb mark from the socket as a cmsg.

   - Enable memcg accounting for veth queues, sysctl tables and IPv6.

  BPF
  ---

   - Add libbpf support for User Statically-Defined Tracing (USDTs).

   - Speed up symbol resolution for kprobes multi-link attachments.

   - Support storing typed pointers to referenced and unreferenced
     objects in BPF maps.

   - Add support for BPF link iterator.

   - Introduce access to remote CPU map elements in BPF per-cpu map.

   - Allow middle-of-the-road settings for the
     kernel.unprivileged_bpf_disabled sysctl.

   - Implement basic types of dynamic pointers e.g. to allow for
     dynamically sized ringbuf reservations without extra memory copies.

  Protocols
  ---------

   - Retire port only listening_hash table, add a second bind table
     hashed by port and address. Avoid linear list walk when binding to
     very popular ports (e.g. 443).

   - Add bridge FDB bulk flush filtering support allowing user space to
     remove all FDB entries matching a condition.

   - Introduce accept_unsolicited_na sysctl for IPv6 to implement
     router-side changes for RFC9131.

   - Support for MPTCP path manager in user space.

   - Add MPTCP support for fallback to regular TCP for connections that
     have never connected additional subflows or transmitted
     out-of-sequence data (partial support for RFC8684 fallback).

   - Avoid races in MPTCP-level window tracking, stabilize and improve
     throughput.

   - Support lockless operation of GRE tunnels with seq numbers enabled.

   - WiFi support for host based BSS color collision detection.

   - Add support for SO_TXTIME/SCM_TXTIME on CAN sockets.

   - Support transmission w/o flow control in CAN ISOTP (ISO 15765-2).

   - Support zero-copy Tx with TLS 1.2 crypto offload (sendfile).

   - Allow matching on the number of VLAN tags via tc-flower.

   - Add tracepoint for tcp_set_ca_state().

  Driver API
  ----------

   - Improve error reporting from classifier and action offload.

   - Add support for listing line cards in switches (devlink).

   - Add helpers for reporting page pool statistics with ethtool -S.

   - Add support for reading clock cycles when using PTP virtual clocks,
     instead of having the driver convert to time before reporting. This
     makes it possible to report time from different vclocks.

   - Support configuring low-latency Tx descriptor push via ethtool.

   - Separate Clause 22 and Clause 45 MDIO accesses more explicitly.

  New hardware / drivers
  ----------------------

   - Ethernet:
      - Marvell's Octeon NIC PCI Endpoint support (octeon_ep)
      - Sunplus SP7021 SoC (sp7021_emac)
      - Add support for Renesas RZ/V2M (in ravb)
      - Add support for MediaTek mt7986 switches (in mtk_eth_soc)

   - Ethernet PHYs:
      - ADIN1100 industrial PHYs (w/ 10BASE-T1L and SQI reporting)
      - TI DP83TD510 PHY
      - Microchip LAN8742/LAN88xx PHYs

   - WiFi:
      - Driver for pureLiFi X, XL, XC devices (plfxlc)
      - Driver for Silicon Labs devices (wfx)
      - Support for WCN6750 (in ath11k)
      - Support Realtek 8852ce devices (in rtw89)

   - Mobile:
      - MediaTek T700 modems (Intel 5G 5000 M.2 cards)

   - CAN:
      - ctucanfd: add support for CTU CAN FD open-source IP core from
        Czech Technical University in Prague

  Drivers
  -------

   - Delete a number of old drivers still using virt_to_bus().

   - Ethernet NICs:
      - intel: support TSO on tunnels MPLS
      - broadcom: support multi-buffer XDP
      - nfp: support VF rate limiting
      - sfc: use hardware tx timestamps for more than PTP
      - mlx5: multi-port eswitch support
      - hyper-v: add support for XDP_REDIRECT
      - atlantic: XDP support (including multi-buffer)
      - macb: improve real-time perf by deferring Tx processing to NAPI

   - High-speed Ethernet switches:
      - mlxsw: implement basic line card information querying
      - prestera: add support for traffic policing on ingress and egress

   - Embedded Ethernet switches:
      - lan966x: add support for packet DMA (FDMA)
      - lan966x: add support for PTP programmable pins
      - ti: cpsw_new: enable bc/mc storm prevention

   - Qualcomm 802.11ax WiFi (ath11k):
      - Wake-on-WLAN support for QCA6390 and WCN6855
      - device recovery (firmware restart) support
      - support setting Specific Absorption Rate (SAR) for WCN6855
      - read country code from SMBIOS for WCN6855/QCA6390
      - enable keep-alive during WoWLAN suspend
      - implement remain-on-channel support

   - MediaTek WiFi (mt76):
      - support Wireless Ethernet Dispatch offloading packet movement
        between the Ethernet switch and WiFi interfaces
      - non-standard VHT MCS10-11 support
      - mt7921 AP mode support
      - mt7921 IPv6 NS offload support

   - Ethernet PHYs:
      - micrel: ksz9031/ksz9131: cabletest support
      - lan87xx: SQI support for T1 PHYs
      - lan937x: add interrupt support for link detection"

* tag 'net-next-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1809 commits)
  ptp: ocp: Add firmware header checks
  ptp: ocp: fix PPS source selector debugfs reporting
  ptp: ocp: add .init function for sma_op vector
  ptp: ocp: vectorize the sma accessor functions
  ptp: ocp: constify selectors
  ptp: ocp: parameterize input/output sma selectors
  ptp: ocp: revise firmware display
  ptp: ocp: add Celestica timecard PCI ids
  ptp: ocp: Remove #ifdefs around PCI IDs
  ptp: ocp: 32-bit fixups for pci start address
  Revert "net/smc: fix listen processing for SMC-Rv2"
  ath6kl: Use cc-disable-warning to disable -Wdangling-pointer
  selftests/bpf: Dynptr tests
  bpf: Add dynptr data slices
  bpf: Add bpf_dynptr_read and bpf_dynptr_write
  bpf: Dynptr support for ring buffers
  bpf: Add bpf_dynptr_from_mem for local dynptrs
  bpf: Add verifier support for dynptrs
  bpf: Suppress 'passing zero to PTR_ERR' warning
  bpf: Introduce bpf_arch_text_invalidate for bpf_prog_pack
  ...
2022-05-25 12:22:58 -07:00
Linus Torvalds
6f3f04c190 Scheduler changes in this cycle were:
- Updates to scheduler metrics:
 
     - PELT fixes & enhancements
     - PSI fixes & enhancements
     - Refactor cpu_util_without()
 
  - Updates to instrumentation/debugging:
 
     - Remove sched_trace_*() helper functions - can be done via debug info
     - Fix double update_rq_clock() warnings
 
  - Introduce & use "preemption model accessors" to simplify some of
    the Kconfig complexity.
 
  - Make softirq handling RT-safe.
 
  - Misc smaller fixes & cleanups.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmKLvXYRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hcXg//fJ1jAB9pQOg/Su9wwwbcOeaXNUpQA38e
 970nXdK6i7w+YeAT2x1ikIQZq5S/px7k9S4Fzks8U9LMhnKPxhjdnG6J69h5XLuB
 z1BtRJBB6W8BAYWzAeq1M+R8whQylciOMZOBSjeTIEdpYBK7c9QA/R1DkDqPRlBA
 7nW0mFbpYcK8Q1n1ItjP0wkpiHG4q8orp+BXiPG8rjiHdCa3GFt7g38hiqNls64H
 fOQ/Ka25tBSYrmeqQY3QsWKnKNHKQRLNareHAwI/x4V8F8d4tn/OmJzmWGDdtprn
 6/gi/E99ej1j5Do8sgx/oTp/aVg4j8AsurrpGFd4/er+euoG4UyHr42UhX6zmFM6
 /KIinp0Z/V2n9okgI9LUZ2x7mD682iXDilNDgiSAwu1bNDUvxBXPD30gThh+KasA
 HxeKxTzb4/dZV4ih4xUMsCOjUT4NFZT2rmiMorUystgyNRk28DtFCdBMtrs/zVtG
 qAktb7v5g76pKAmV4nQu4imZeSD+f+RJP2fuSUYQCJbCxQfthTZkn8GfCMYEdY7Y
 sDyBx4Te8Vu/dcnal9qMpY/m5EPruPQAkvC9zK4YvkvLUmGC742PG/xHfCdC9J2m
 Adbl/Cmn7tD9dOGYbHPsrViqwIiZUcjbnBlMN5DjJXQF6kWNOIXUEguZpBocminP
 1CSy0+gyI6o=
 =GY8N
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:

 - Updates to scheduler metrics:
     - PELT fixes & enhancements
     - PSI fixes & enhancements
     - Refactor cpu_util_without()

 - Updates to instrumentation/debugging:
     - Remove sched_trace_*() helper functions - can be done via debug
       info
     - Fix double update_rq_clock() warnings

 - Introduce & use "preemption model accessors" to simplify some of the
   Kconfig complexity.

 - Make softirq handling RT-safe.

 - Misc smaller fixes & cleanups.

* tag 'sched-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  topology: Remove unused cpu_cluster_mask()
  sched: Reverse sched_class layout
  sched/deadline: Remove superfluous rq clock update in push_dl_task()
  sched/core: Avoid obvious double update_rq_clock warning
  smp: Make softirq handling RT safe in flush_smp_call_function_queue()
  smp: Rename flush_smp_call_function_from_idle()
  sched: Fix missing prototype warnings
  sched/fair: Remove cfs_rq_tg_path()
  sched/fair: Remove sched_trace_*() helper functions
  sched/fair: Refactor cpu_util_without()
  sched/fair: Revise comment about lb decision matrix
  sched/psi: report zeroes for CPU full at the system level
  sched/fair: Delete useless condition in tg_unthrottle_up()
  sched/fair: Fix cfs_rq_clock_pelt() for throttled cfs_rq
  sched/fair: Move calculate of avg_load to a better location
  mailmap: Update my email address to @redhat.com
  MAINTAINERS: Add myself as scheduler topology reviewer
  psi: Fix trigger being fired unexpectedly at initial
  ftrace: Use preemption model accessors for trace header printout
  kcsan: Use preemption model accessors
2022-05-24 11:11:13 -07:00
Linus Torvalds
22922deae1 Objtool changes for this cycle were:
- Comprehensive interface overhaul:
    =================================
 
    Objtool's interface has some issues:
 
      - Several features are done unconditionally, without any way to turn
        them off.  Some of them might be surprising.  This makes objtool
        tricky to use, and prevents porting individual features to other
        arches.
 
      - The config dependencies are too coarse-grained.  Objtool enablement is
        tied to CONFIG_STACK_VALIDATION, but it has several other features
        independent of that.
 
      - The objtool subcmds ("check" and "orc") are clumsy: "check" is really
        a subset of "orc", so it has all the same options.  The subcmd model
        has never really worked for objtool, as it only has a single purpose:
        "do some combination of things on an object file".
 
      - The '--lto' and '--vmlinux' options are nonsensical and have
        surprising behavior.
 
    Overhaul the interface:
 
       - get rid of subcmds
 
       - make all features individually selectable
 
       - remove and/or clarify confusing/obsolete options
 
       - update the documentation
 
       - fix some bugs found along the way
 
  - Fix x32 regression
 
  - Fix Kbuild cleanup bugs
 
  - Add scripts/objdump-func helper script to disassemble a single function from an object file.
 
  - Rewrite scripts/faddr2line to be section-aware, by basing it on 'readelf',
    moving it away from 'nm', which doesn't handle multiple sections well,
    which can result in decoding failure.
 
  - Rewrite & fix symbol handling - which had a number of bugs wrt. object files
    that don't have global symbols - which is rare but possible. Also fix a
    bunch of symbol handling bugs found along the way.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmKLtcURHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jVQg//QM8nCNadJAVS9exVGX1DZI9pnf3OJaA9
 gOFML7Lv3MC+Lwdxt6Iv020rFVaeAnOcjPsis3dppFz62FZzzMWoemn5irg2BFiJ
 dp++UtJWTfKxgU2BHydU9uXD0kcJkD4AjBCIaFsgmTjAz8QvMGa9j0smuUm3cDSL
 0Bdid+LhkQqW3P2FiLWsSAzh4vqZmdwpXgERZRql8qD3NYk5hV4QDKs3gMguktat
 9gos4kGt0uwKfiEvmeNEXkoAwUsTvE/vqaOy9cVxxCqcWrrC+yQeBpwSoqhHK526
 dyHlwlYvBaPFqZnmquVUv21iv1MU6dUBJPhNIChke0NDTwVzSXdI75207FARyk5J
 3igSFEfJcU9zMvhAAsAjzD/uQP2ATowg5qa/V2xyWwtyaRgBleRffYiDsbhgDoNc
 R4/vI+vn/fQXouMhmmjPNYzu9uHQ+k89wQCJIY8Bswf7oNu6nKL3jJb/a/a7xhsH
 ZNqv+M0KEENTZcjBU2UHGyImApmkTlsp2mxUiiHs7QoV1hTfz+TcTXKPM1mIuJB8
 /HrVpv64CZ3S7p4JyGBUTNpci4mBjgBmwwAf16+dtaxyxxfoqReVWh3+bzsZbH+B
 kRjezWHh7/yCsoyDm7/LPgyPKEbozLLzMsTsjVJeWgeTgZ+xuqku3PTVctyzAI21
 DVL5oZe3iK4=
 =ARdm
 -----END PGP SIGNATURE-----

Merge tag 'objtool-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool updates from Ingo Molnar:

 - Comprehensive interface overhaul:
   =================================

   Objtool's interface has some issues:

     - Several features are done unconditionally, without any way to
       turn them off. Some of them might be surprising. This makes
       objtool tricky to use, and prevents porting individual features
       to other arches.

     - The config dependencies are too coarse-grained. Objtool
       enablement is tied to CONFIG_STACK_VALIDATION, but it has several
       other features independent of that.

     - The objtool subcmds ("check" and "orc") are clumsy: "check" is
       really a subset of "orc", so it has all the same options.

       The subcmd model has never really worked for objtool, as it only
       has a single purpose: "do some combination of things on an object
       file".

     - The '--lto' and '--vmlinux' options are nonsensical and have
       surprising behavior.

   Overhaul the interface:

      - get rid of subcmds

      - make all features individually selectable

      - remove and/or clarify confusing/obsolete options

      - update the documentation

      - fix some bugs found along the way

 - Fix x32 regression

 - Fix Kbuild cleanup bugs

 - Add scripts/objdump-func helper script to disassemble a single
   function from an object file.

 - Rewrite scripts/faddr2line to be section-aware, by basing it on
   'readelf', moving it away from 'nm', which doesn't handle multiple
   sections well, which can result in decoding failure.

 - Rewrite & fix symbol handling - which had a number of bugs wrt.
   object files that don't have global symbols - which is rare but
   possible. Also fix a bunch of symbol handling bugs found along the
   way.

* tag 'objtool-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  objtool: Fix objtool regression on x32 systems
  objtool: Fix symbol creation
  scripts/faddr2line: Fix overlapping text section failures
  scripts: Create objdump-func helper script
  objtool: Remove libsubcmd.a when make clean
  objtool: Remove inat-tables.c when make clean
  objtool: Update documentation
  objtool: Remove --lto and --vmlinux in favor of --link
  objtool: Add HAVE_NOINSTR_VALIDATION
  objtool: Rename "VMLINUX_VALIDATION" -> "NOINSTR_VALIDATION"
  objtool: Make noinstr hacks optional
  objtool: Make jump label hack optional
  objtool: Make static call annotation optional
  objtool: Make stack validation frame-pointer-specific
  objtool: Add CONFIG_OBJTOOL
  objtool: Extricate sls from stack validation
  objtool: Rework ibt and extricate from stack validation
  objtool: Make stack validation optional
  objtool: Add option to print section addresses
  objtool: Don't print parentheses in function addresses
  ...
2022-05-24 10:36:38 -07:00