Commit Graph

1296625 Commits

Author SHA1 Message Date
Sam James
eb9b9a6f5a tools: Drop nonsensical -O6
-O6 is very much not-a-thing. Really, this should've been dropped
entirely in 49b3cd306e ("tools: Set the maximum optimization level
according to the compiler being used") instead of just passing it for
not-Clang.

Just collapse it down to -O3, instead of "-O6 unless Clang, in which case
-O3".

GCC interprets > -O3 as -O3. It doesn't even interpret > -O3 as -Ofast,
which is a good thing, given -Ofast has specific (non-)requirements for
code built using it. So, this does nothing except look a bit daft.

Remove the silliness and also save a few lines in the Makefiles accordingly.

Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: Jesper Juhl <jesperjuhl76@gmail.com>
Signed-off-by: Sam James <sam@gentoo.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bill Wendling <morbo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/4f01524fa4ea91c7146a41e26ceaf9dae4c127e4.1725821201.git.sam@gentoo.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 13:08:36 -03:00
Ian Rogers
89c0a55e55 perf pmu: To info add event_type_desc
All PMU events are assumed to be "Kernel PMU event", however, this
isn't true for fake PMUs and won't be true with the addition of more
software PMUs. Make the PMU's type description name configurable -
largely for printing callbacks.

Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20240907050830.6752-5-irogers@google.com
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Xu Yang <xu.yang_2@nxp.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 11:29:20 -03:00
Ian Rogers
f08cc25843 perf evsel: Add accessor for tool_event
Currently tool events use a dedicated variable within the evsel. Later
changes will move this to the unused struct perf_event_attr config for
these events. Add an accessor to allow the later change to be well
typed and avoid changing all uses.

Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20240907050830.6752-4-irogers@google.com
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Xu Yang <xu.yang_2@nxp.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 11:28:27 -03:00
Ian Rogers
925320737a perf pmus: Fake PMU clean up
Rather than passing a fake PMU around, just pass that the fake PMU
should be used - true when doing testing. Move the fake PMU into
pmus.[ch] and try to abstract the PMU's properties in pmu.c, ie so
there is less "if fake_pmu" in non-PMU code. Give the fake PMU a made
up type number.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xu Yang <xu.yang_2@nxp.com>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240907050830.6752-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 11:27:42 -03:00
Ian Rogers
d3d5c1a00f perf list: Avoid potential out of bounds memory read
If a desc string is 0 length then -1 will be out of bounds, add a
check.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xu Yang <xu.yang_2@nxp.com>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240907050830.6752-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 11:26:27 -03:00
Andrew Kreimer
4ae354d73a perf help: Fix a typo ("bellow")
Fix a typo in comments.

Reported-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Andrew Kreimer <algonell@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-janitors@vger.kernel.org
Link: https://lore.kernel.org/r/20240907131006.18510-1-algonell@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 11:24:12 -03:00
Changbin Du
74298dd8ac perf ftrace: Detect whether ftrace is enabled on system
To make error messages more accurate, this change detects whether ftrace is
enabled on system by checking trace file "set_ftrace_pid".

Before:

  # perf ftrace
  failed to reset ftrace
  #

After:

  # perf ftrace
  ftrace is not supported on this system
  #

Committer testing:

Doing it in an unprivileged toolbox container on Fedora 40:

Before:

  acme@number:~/git/perf-tools-next$ toolbox enter perf
  ⬢[acme@toolbox perf-tools-next]$ sudo su -
  ⬢[root@toolbox ~]# ~acme/bin/perf ftrace
  failed to reset ftrace
  ⬢[root@toolbox ~]#

After this patch:

  ⬢[root@toolbox ~]# ~acme/bin/perf ftrace
  ftrace is not supported on this system
  ⬢[root@toolbox ~]#

Maybe we could check if we are in such as situation, inside an
unprivileged container, and provide a HINT line?

Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Changbin Du <changbin.du@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240911100126.900779-1-changbin.du@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 09:35:35 -03:00
Arnaldo Carvalho de Melo
83420d5f58 perf test shell probe_vfs_getname: Remove extraneous '=' from probe line number regex
Thomas reported the vfs_getname perf tests failing on s/390, it seems it
was just to some extraneous '=' somehow getting into the regexp, remove
it, now:

  root@x1:~# perf test getname
   91: Add vfs_getname probe to get syscall args filenames             : Ok
   93: Use vfs_getname probe to get syscall args filenames             : FAILED!
  126: Check open filename arg using perf trace + vfs_getname          : Ok
  root@x1:~#

Second one remains a mistery, have to take some time to nail it down.

Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>,
Link: https://lore.kernel.org/lkml/1d7f3b7b-9edc-4d90-955c-9345428563f1@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 09:35:34 -03:00
Arnaldo Carvalho de Melo
9327f0ecad perf build: Require at least clang 16.0.6 to build BPF skeletons
Howard reported problems using perf features that use BPF:

  perf $ clang -v
  Debian clang version 15.0.6
  Target: x86_64-pc-linux-gnu
  Thread model: posix
  InstalledDir: /bin
  Found candidate GCC installation: /bin/../lib/gcc/x86_64-linux-gnu/12
  Selected GCC installation: /bin/../lib/gcc/x86_64-linux-gnu/12
  Candidate multilib: .;@m64
  Selected multilib: .;@m64
  perf $ ./perf trace -e write --max-events=1
  libbpf: prog 'sys_enter_rename': BPF program load failed: Permission denied
  libbpf: prog 'sys_enter_rename': -- BEGIN PROG LOAD LOG --
  0: R1=ctx() R10=fp0

But it works with:

  perf $ clang -v
  Debian clang version 16.0.6 (15~deb12u1)
  Target: x86_64-pc-linux-gnu
  Thread model: posix
  InstalledDir: /bin
  Found candidate GCC installation: /bin/../lib/gcc/x86_64-linux-gnu/12
  Selected GCC installation: /bin/../lib/gcc/x86_64-linux-gnu/12
  Candidate multilib: .;@m64
  Selected multilib: .;@m64
  perf $ ./perf trace -e write --max-events=1
       0.000 ( 0.009 ms): gmain/1448 write(fd: 4, buf: \1\0\0\0\0\0\0\0, count: 8)                         = 8 (kworker/0:0-eve)
  perf $

So lets make that the required version, if you happen to have a slightly
older version where this work, please report so that we can adjust the
minimum required version.

Reported-by: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZuGL9ROeTV2uXoSp@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 09:35:34 -03:00
Arnaldo Carvalho de Melo
4c1af9bf97 perf trace: If a syscall arg is marked as 'const', assume it is coming _from_ userspace
We need to decide where to copy syscall arg contents, if at the
syscalls:sys_entry hook, meaning is something that is coming from
user to kernel space, or if it is a response, i.e. if it is something
the _kernel_ is filling in and thus going to userspace.

Since we have 'const' used in those syscalls, and unsure about this
being consistent, doing:

  root@number:~# echo $(grep const /sys/kernel/tracing/events/syscalls/sys_enter_*/format  | grep struct | cut -c47- | cut -d'/' -f1)
  clock_nanosleep clock_settime epoll_pwait2 futex io_pgetevents landlock_create_ruleset listmount mq_getsetattr mq_notify mq_timedreceive mq_timedsend preadv2 preadv prlimit64 process_madvise process_vm_readv process_vm_readv process_vm_writev process_vm_writev pwritev2 pwritev readv rt_sigaction rt_sigtimedwait semtimedop statmount timerfd_settime timer_settime vmsplice writev
  root@number:~#

Seems to indicate that we can use that for the ones that have the
'const' to mark it as coming from user space, do it.

Most notable/frequent syscall that now gets BTF pretty printed in a
system wide 'perf trace' session is:

  root@number:~# perf trace
     21.160 (         ): MediaSu~isor #/1028597 futex(uaddr: 0x7f49e1dfe964, op: WAIT_BITSET|PRIVATE_FLAG, utime: (struct __kernel_timespec){.tv_sec = (__kernel_time64_t)50290,.tv_nsec = (long long int)810362837,}, val3: MATCH_ANY) ...
      21.166 ( 0.000 ms): RemVidChild/6995 futex(uaddr: 0x7f49fcc7fa00, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
      21.169 ( 0.001 ms): RemVidChild/6995 sendmsg(fd: 25<socket:[78915]>, msg: 0x7f49e9af9da0, flags: DONTWAIT) = 280
      21.172 ( 0.289 ms): RemVidChild/6995 futex(uaddr: 0x7f49fcc7fa58, op: WAIT_BITSET|PRIVATE_FLAG|CLOCK_REALTIME, val3: MATCH_ANY) = 0
      21.463 ( 0.000 ms): RemVidChild/6995 futex(uaddr: 0x7f49fcc7fa00, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
      21.467 ( 0.001 ms): RemVidChild/6995 futex(uaddr: 0x7f49e28bb964, op: WAKE|PRIVATE_FLAG, val: 1)           = 1
      21.160 ( 0.314 ms): MediaSu~isor #/1028597  ... [continued]: futex())                                            = 0
      21.469 (         ): RemVidChild/6995 futex(uaddr: 0x7f49fcc7fa5c, op: WAIT_BITSET|PRIVATE_FLAG|CLOCK_REALTIME, val3: MATCH_ANY) ...
      21.475 ( 0.000 ms): MediaSu~isor #/1028597 futex(uaddr: 0x7f49d0223040, op: WAKE|PRIVATE_FLAG, val: 1)           = 0
      21.478 ( 0.001 ms): MediaSu~isor #/1028597 futex(uaddr: 0x7f49e26ac964, op: WAKE|PRIVATE_FLAG, val: 1)           = 1
  ^Croot@number:~#
  root@number:~# cat /sys/kernel/tracing/events/syscalls/sys_enter_futex/format
  name: sys_enter_futex
  ID: 454
  format:
  	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
  	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
  	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
  	field:int common_pid;	offset:4;	size:4;	signed:1;

  	field:int __syscall_nr;	offset:8;	size:4;	signed:1;
  	field:u32 * uaddr;	offset:16;	size:8;	signed:0;
  	field:int op;	offset:24;	size:8;	signed:0;
  	field:u32 val;	offset:32;	size:8;	signed:0;
  	field:const struct __kernel_timespec * utime;	offset:40;	size:8;	signed:0;
  	field:u32 * uaddr2;	offset:48;	size:8;	signed:0;
  	field:u32 val3;	offset:56;	size:8;	signed:0;

  print fmt: "uaddr: 0x%08lx, op: 0x%08lx, val: 0x%08lx, utime: 0x%08lx, uaddr2: 0x%08lx, val3: 0x%08lx", ((unsigned long)(REC->uaddr)), ((unsigned long)(REC->op)), ((unsigned long)(REC->val)), ((unsigned long)(REC->utime)), ((unsigned long)(REC->uaddr2)), ((unsigned long)(REC->val3))
  root@number:~#

Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/CAP-5=fWnuQrrBoTn6Rrn6vM_xQ2fCoc9i-AitD7abTcNi-4o1Q@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 09:35:34 -03:00
Yang Li
e37b315c17 perf parse-events: Remove duplicated include in parse-events.c
The header files parse-events.h is included twice in parse-events.c,
so one inclusion of each can be removed.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=10822
Link: https://lore.kernel.org/r/20240910005522.35994-1-yang.lee@linux.alibaba.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11 09:35:27 -03:00
Ian Rogers
02b2705017 perf callchain: Allow symbols to be optional when resolving a callchain
In uses like 'perf inject' it is not necessary to gather the symbol for
each call chain location, the map for the sample IP is wanted so that
build IDs and the like can be injected. Make gathering the symbol in the
callchain_cursor optional.

For a 'perf inject -B' command this lowers the peak RSS from 54.1MB to
29.6MB by avoiding loading symbols.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Link: https://lore.kernel.org/r/20240909203740.143492-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:47 -03:00
Ian Rogers
64eed019f3 perf inject: Lazy build-id mmap2 event insertion
Add -B option that lazily inserts mmap2 events thereby dropping all
mmap events without samples. This is similar to the behavior of -b
where only build_id events are inserted when a dso is accessed in a
sample.

File size savings can be significant in system-wide mode, consider:

  $ perf record -g -a -o perf.data sleep 1
  $ perf inject -B -i perf.data -o perf.new.data
  $ ls -al perf.data perf.new.data
           5147049 perf.data
           2248493 perf.new.data

Give test coverage of the new option in pipe test.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Link: https://lore.kernel.org/r/20240909203740.143492-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:47 -03:00
Ian Rogers
d762ba020d perf inject: Add new mmap2-buildid-all option
Add an option that allows all mmap or mmap2 events to be rewritten as
mmap2 events with build IDs.

This is similar to the existing -b/--build-ids and --buildid-all options
except instead of adding a build_id event an existing mmap/mmap2 event
is used as a template and a new mmap2 event synthesized from it.

As mmap2 events are typical this avoids the insertion of build_id
events.

Add test coverage to the pipe test.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Link: https://lore.kernel.org/r/20240909203740.143492-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:47 -03:00
Ian Rogers
ae39ba1655 perf inject: Fix build ID injection
Build ID injection wasn't inserting a sample ID and aligning events to
64 bytes rather than 8. No sample ID means events are unordered and two
different build_id events for the same path, as happens when a file is
replaced, can't be differentiated.

Add in sample ID insertion for the build_id events alongside some
refactoring. The refactoring better aligns the function arguments for
different use cases, such as synthesizing build_id events without
needing to have a dso. The misc bits are explicitly passed as with
callchains the maps/dsos may span user and kernel land, so using
sample->cpumode isn't good enough.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Casey Chen <cachen@purestorage.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Link: https://lore.kernel.org/r/20240909203740.143492-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:47 -03:00
Namhyung Kim
02648783c2 perf annotate-data: Add pr_debug_scope()
The pr_debug_scope() is to print more information about the scope DIE
during the instruction tracking so that it can help finding relevant
debug info and the source code like inlined functions more easily.

  $ perf --debug type-profile annotate --data-type
  ...
  -----------------------------------------------------------
  find data type for 0(reg0, reg12) at set_task_cpu+0xdd
  CU for kernel/sched/core.c (die:0x1268dae)
  frame base: cfa=1 fbreg=7
  scope: [3/3] (die:12b6d28) [inlined] set_task_rq       <<<--- (here)
  bb: [9f - dd]
  var [9f] reg3 type='struct task_struct*' size=0x8 (die:0x126aff0)
  var [9f] reg6 type='unsigned int' size=0x4 (die:0x1268e0d)

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240909214251.3033827-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:47 -03:00
Namhyung Kim
c8b9358778 perf annotate: Treat 'call' instruction as stack operation
I found some portion of mem-store events sampled on CALL instruction
which has no memory access.  But it actually saves a return address
into stack.  It should be considered as a stack operation like RET
instruction.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240909214251.3033827-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:47 -03:00
James Clark
332f60ac05 perf build: Remove unused feature test target
llvm-version was removed in commit 56b11a2126 ("perf bpf: Remove
support for embedding clang for compiling BPF events (-e foo.c)") but
some parts were left in the Makefile so finish removing them.

Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bill Wendling <morbo@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Daniel Wagner <dwagner@suse.de>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Manu Bretelle <chantr4@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Monnet <qmo@kernel.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Link: https://lore.kernel.org/r/20240910140405.568791-2-james.clark@linaro.org
[ Removed one leftover, 'llvm-version' from FEATURE_TESTS_EXTRA ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:47 -03:00
James Clark
206dcfca1f perf build: Autodetect minimum required llvm-dev version
The new LLVM addr2line feature requires a minimum version of 13 to
compile. Add a feature check for the version so that NO_LLVM=1 doesn't
need to be explicitly added. Leave the existing llvm feature check
intact because it's used by tools other than Perf.

This fixes the following compilation error when the llvm-dev version
doesn't match:

  util/llvm-c-helpers.cpp: In function 'char* llvm_name_for_code(dso*, const char*, u64)':
  util/llvm-c-helpers.cpp:178:21: error: 'std::remove_reference_t<llvm::DILineInfo>' {aka 'struct llvm::DILineInfo'} has no member named 'StartAddress'
    178 |   addr, res_or_err->StartAddress ? *res_or_err->StartAddress : 0);

Fixes: c3f8644c21 ("perf report: Support LLVM for addr2line()")
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bill Wendling <morbo@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Manu Bretelle <chantr4@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Monnet <qmo@kernel.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Link: https://lore.kernel.org/r/20240910140405.568791-1-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:46 -03:00
Arnaldo Carvalho de Melo
375f9262ac perf trace: Mark the rlim arg in the prlimit64 and setrlimit syscalls as coming from user space
With that it uses the generic BTF based pretty printer:

  root@number:~# perf trace -e prlimit64
       0.000 ( 0.004 ms): :3417020/3417020 prlimit64(resource: NOFILE, old_rlim: 0x7fb8842fe3b0)      = 0
       0.126 ( 0.003 ms): Chroot Helper/3417022 prlimit64(resource: NOFILE, old_rlim: 0x7fb8842fdfd0) = 0
      12.557 ( 0.005 ms): firefox/3417020 prlimit64(resource: STACK, old_rlim: 0x7ffe9ade1b80)        = 0
      26.640 ( 0.006 ms): MainThread/3417020 prlimit64(resource: STACK, old_rlim: 0x7ffe9ade1780)     = 0
      27.553 ( 0.002 ms): Web Content/3417020 prlimit64(resource: AS, old_rlim: 0x7ffe9ade1660)       = 0
      29.405 ( 0.003 ms): Web Content/3417020 prlimit64(resource: NOFILE, old_rlim: 0x7ffe9ade0c80)   = 0
      30.471 ( 0.002 ms): Web Content/3417020 prlimit64(resource: RTTIME, old_rlim: 0x7ffe9ade1370)   = 0
      30.485 ( 0.001 ms): Web Content/3417020 prlimit64(resource: RTTIME, new_rlim: (struct rlimit64){.rlim_cur = (__u64)50000,.rlim_max = (__u64)200000,}) = 0
      31.779 ( 0.001 ms): Web Content/3417020 prlimit64(resource: STACK, old_rlim: 0x7ffe9ade1670)    = 0
  ^Croot@number:~#

Better than before, still needs improvements in the configurability of
the libbpf BTF dumper to get it to the strace output standard.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZuBQI-f8CGpuhIdH@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:32:46 -03:00
Arnaldo Carvalho de Melo
f3f16112c6 perf trace: Support collecting 'union's with the BPF augmenter
And reuse the BTF based struct pretty printer, with that we can offer
initial support for the 'bpf' syscall's second argument, a 'union
bpf_attr' pointer.

But this is not that satisfactory as the libbpf btf dumper will pretty
print _all_ the union, we need to have a way to say that the first arg
selects the type for the union member to be pretty printed, something
like what pahole does translating the PERF_RECORD_ selector into a name,
and using that name to find a matching struct.

In the case of 'union bpf_attr' it would map PROG_LOAD to one of the
union members, but unfortunately there is no such mapping:

  root@number:~# pahole bpf_attr
  union bpf_attr {
  	struct {
  		__u32              map_type;           /*     0     4 */
  		__u32              key_size;           /*     4     4 */
  		__u32              value_size;         /*     8     4 */
  		__u32              max_entries;        /*    12     4 */
  		__u32              map_flags;          /*    16     4 */
  		__u32              inner_map_fd;       /*    20     4 */
  		__u32              numa_node;          /*    24     4 */
  		char               map_name[16];       /*    28    16 */
  		__u32              map_ifindex;        /*    44     4 */
  		__u32              btf_fd;             /*    48     4 */
  		__u32              btf_key_type_id;    /*    52     4 */
  		__u32              btf_value_type_id;  /*    56     4 */
  		__u32              btf_vmlinux_value_type_id; /*    60     4 */
  		/* --- cacheline 1 boundary (64 bytes) --- */
  		__u64              map_extra;          /*    64     8 */
  		__s32              value_type_btf_obj_fd; /*    72     4 */
  		__s32              map_token_fd;       /*    76     4 */
  	};                                             /*     0    80 */
  	struct {
  		__u32              map_fd;             /*     0     4 */

  		/* XXX 4 bytes hole, try to pack */

  		__u64              key;                /*     8     8 */
  		union {
  			__u64      value;              /*    16     8 */
  			__u64      next_key;           /*    16     8 */
  		};                                     /*    16     8 */
  		__u64              flags;              /*    24     8 */
  	};                                             /*     0    32 */
  	struct {
  		__u64              in_batch;           /*     0     8 */
  		__u64              out_batch;          /*     8     8 */
  		__u64              keys;               /*    16     8 */
  		__u64              values;             /*    24     8 */
  		__u32              count;              /*    32     4 */
  		__u32              map_fd;             /*    36     4 */
  		__u64              elem_flags;         /*    40     8 */
  		__u64              flags;              /*    48     8 */
  	} batch;                                       /*     0    56 */
  	struct {
  		__u32              prog_type;          /*     0     4 */
  		__u32              insn_cnt;           /*     4     4 */
  		__u64              insns;              /*     8     8 */
  		__u64              license;            /*    16     8 */
  		__u32              log_level;          /*    24     4 */
  		__u32              log_size;           /*    28     4 */
  		__u64              log_buf;            /*    32     8 */
  		__u32              kern_version;       /*    40     4 */
  		__u32              prog_flags;         /*    44     4 */
  		char               prog_name[16];      /*    48    16 */
  		/* --- cacheline 1 boundary (64 bytes) --- */
  		__u32              prog_ifindex;       /*    64     4 */
  		__u32              expected_attach_type; /*    68     4 */
  		__u32              prog_btf_fd;        /*    72     4 */
  		__u32              func_info_rec_size; /*    76     4 */
  		__u64              func_info;          /*    80     8 */
  		__u32              func_info_cnt;      /*    88     4 */
  		__u32              line_info_rec_size; /*    92     4 */
  		__u64              line_info;          /*    96     8 */
  		__u32              line_info_cnt;      /*   104     4 */
  		__u32              attach_btf_id;      /*   108     4 */
  		union {
  			__u32      attach_prog_fd;     /*   112     4 */
  			__u32      attach_btf_obj_fd;  /*   112     4 */
  		};                                     /*   112     4 */
  		__u32              core_relo_cnt;      /*   116     4 */
  		__u64              fd_array;           /*   120     8 */
  		/* --- cacheline 2 boundary (128 bytes) --- */
  		__u64              core_relos;         /*   128     8 */
  		__u32              core_relo_rec_size; /*   136     4 */
  		__u32              log_true_size;      /*   140     4 */
  		__s32              prog_token_fd;      /*   144     4 */
  	};                                             /*     0   152 */
  	struct {
  		__u64              pathname;           /*     0     8 */
  		__u32              bpf_fd;             /*     8     4 */
  		__u32              file_flags;         /*    12     4 */
  		__s32              path_fd;            /*    16     4 */
  	};                                             /*     0    24 */
  	struct {
  		union {
  			__u32      target_fd;          /*     0     4 */
  			__u32      target_ifindex;     /*     0     4 */
  		};                                     /*     0     4 */
  		__u32              attach_bpf_fd;      /*     4     4 */
  		__u32              attach_type;        /*     8     4 */
  		__u32              attach_flags;       /*    12     4 */
  		__u32              replace_bpf_fd;     /*    16     4 */
  		union {
  			__u32      relative_fd;        /*    20     4 */
  			__u32      relative_id;        /*    20     4 */
  		};                                     /*    20     4 */
  		__u64              expected_revision;  /*    24     8 */
  	};                                             /*     0    32 */
  	struct {
  		__u32              prog_fd;            /*     0     4 */
  		__u32              retval;             /*     4     4 */
  		__u32              data_size_in;       /*     8     4 */
  		__u32              data_size_out;      /*    12     4 */
  		__u64              data_in;            /*    16     8 */
  		__u64              data_out;           /*    24     8 */
  		__u32              repeat;             /*    32     4 */
  		__u32              duration;           /*    36     4 */
  		__u32              ctx_size_in;        /*    40     4 */
  		__u32              ctx_size_out;       /*    44     4 */
  		__u64              ctx_in;             /*    48     8 */
  		__u64              ctx_out;            /*    56     8 */
  		/* --- cacheline 1 boundary (64 bytes) --- */
  		__u32              flags;              /*    64     4 */
  		__u32              cpu;                /*    68     4 */
  		__u32              batch_size;         /*    72     4 */
  	} test;                                        /*     0    80 */
  	struct {
  		union {
  			__u32      start_id;           /*     0     4 */
  			__u32      prog_id;            /*     0     4 */
  			__u32      map_id;             /*     0     4 */
  			__u32      btf_id;             /*     0     4 */
  			__u32      link_id;            /*     0     4 */
  		};                                     /*     0     4 */
  		__u32              next_id;            /*     4     4 */
  		__u32              open_flags;         /*     8     4 */
  	};                                             /*     0    12 */
  	struct {
  		__u32              bpf_fd;             /*     0     4 */
  		__u32              info_len;           /*     4     4 */
  		__u64              info;               /*     8     8 */
  	} info;                                        /*     0    16 */
  	struct {
  		union {
  			__u32      target_fd;          /*     0     4 */
  			__u32      target_ifindex;     /*     0     4 */
  		};                                     /*     0     4 */
  		__u32              attach_type;        /*     4     4 */
  		__u32              query_flags;        /*     8     4 */
  		__u32              attach_flags;       /*    12     4 */
  		__u64              prog_ids;           /*    16     8 */
  		union {
  			__u32      prog_cnt;           /*    24     4 */
  			__u32      count;              /*    24     4 */
  		};                                     /*    24     4 */

  		/* XXX 4 bytes hole, try to pack */

  		__u64              prog_attach_flags;  /*    32     8 */
  		__u64              link_ids;           /*    40     8 */
  		__u64              link_attach_flags;  /*    48     8 */
  		__u64              revision;           /*    56     8 */
  	} query;                                       /*     0    64 */
  	struct {
  		__u64              name;               /*     0     8 */
  		__u32              prog_fd;            /*     8     4 */

  		/* XXX 4 bytes hole, try to pack */

  		__u64              cookie;             /*    16     8 */
  	} raw_tracepoint;                              /*     0    24 */
  	struct {
  		__u64              btf;                /*     0     8 */
  		__u64              btf_log_buf;        /*     8     8 */
  		__u32              btf_size;           /*    16     4 */
  		__u32              btf_log_size;       /*    20     4 */
  		__u32              btf_log_level;      /*    24     4 */
  		__u32              btf_log_true_size;  /*    28     4 */
  		__u32              btf_flags;          /*    32     4 */
  		__s32              btf_token_fd;       /*    36     4 */
  	};                                             /*     0    40 */
  	struct {
  		__u32              pid;                /*     0     4 */
  		__u32              fd;                 /*     4     4 */
  		__u32              flags;              /*     8     4 */
  		__u32              buf_len;            /*    12     4 */
  		__u64              buf;                /*    16     8 */
  		__u32              prog_id;            /*    24     4 */
  		__u32              fd_type;            /*    28     4 */
  		__u64              probe_offset;       /*    32     8 */
  		__u64              probe_addr;         /*    40     8 */
  	} task_fd_query;                               /*     0    48 */
  	struct {
  		union {
  			__u32      prog_fd;            /*     0     4 */
  			__u32      map_fd;             /*     0     4 */
  		};                                     /*     0     4 */
  		union {
  			__u32      target_fd;          /*     4     4 */
  			__u32      target_ifindex;     /*     4     4 */
  		};                                     /*     4     4 */
  		__u32              attach_type;        /*     8     4 */
  		__u32              flags;              /*    12     4 */
  		union {
  			__u32      target_btf_id;      /*    16     4 */
  			struct {
  				__u64 iter_info;       /*    16     8 */
  				__u32 iter_info_len;   /*    24     4 */
  			};                             /*    16    16 */
  			struct {
  				__u64 bpf_cookie;      /*    16     8 */
  			} perf_event;                  /*    16     8 */
  			struct {
  				__u32 flags;           /*    16     4 */
  				__u32 cnt;             /*    20     4 */
  				__u64 syms;            /*    24     8 */
  				__u64 addrs;           /*    32     8 */
  				__u64 cookies;         /*    40     8 */
  			} kprobe_multi;                /*    16    32 */
  			struct {
  				__u32 target_btf_id;   /*    16     4 */

  				/* XXX 4 bytes hole, try to pack */

  				__u64 cookie;          /*    24     8 */
  			} tracing;                     /*    16    16 */
  			struct {
  				__u32 pf;              /*    16     4 */
  				__u32 hooknum;         /*    20     4 */
  				__s32 priority;        /*    24     4 */
  				__u32 flags;           /*    28     4 */
  			} netfilter;                   /*    16    16 */
  			struct {
  				union {
  					__u32  relative_fd; /*    16     4 */
  					__u32  relative_id; /*    16     4 */
  				};                     /*    16     4 */

  				/* XXX 4 bytes hole, try to pack */

  				__u64 expected_revision; /*    24     8 */
  			} tcx;                         /*    16    16 */
  			struct {
  				__u64 path;            /*    16     8 */
  				__u64 offsets;         /*    24     8 */
  				__u64 ref_ctr_offsets; /*    32     8 */
  				__u64 cookies;         /*    40     8 */
  				__u32 cnt;             /*    48     4 */
  				__u32 flags;           /*    52     4 */
  				__u32 pid;             /*    56     4 */
  			} uprobe_multi;                /*    16    48 */
  			struct {
  				union {
  					__u32  relative_fd; /*    16     4 */
  					__u32  relative_id; /*    16     4 */
  				};                     /*    16     4 */

  				/* XXX 4 bytes hole, try to pack */

  				__u64 expected_revision; /*    24     8 */
  			} netkit;                      /*    16    16 */
  		};                                     /*    16    48 */
  	} link_create;                                 /*     0    64 */
  	struct {
  		__u32              link_fd;            /*     0     4 */
  		union {
  			__u32      new_prog_fd;        /*     4     4 */
  			__u32      new_map_fd;         /*     4     4 */
  		};                                     /*     4     4 */
  		__u32              flags;              /*     8     4 */
  		union {
  			__u32      old_prog_fd;        /*    12     4 */
  			__u32      old_map_fd;         /*    12     4 */
  		};                                     /*    12     4 */
  	} link_update;                                 /*     0    16 */
  	struct {
  		__u32              link_fd;            /*     0     4 */
  	} link_detach;                                 /*     0     4 */
  	struct {
  		__u32              type;               /*     0     4 */
  	} enable_stats;                                /*     0     4 */
  	struct {
  		__u32              link_fd;            /*     0     4 */
  		__u32              flags;              /*     4     4 */
  	} iter_create;                                 /*     0     8 */
  	struct {
  		__u32              prog_fd;            /*     0     4 */
  		__u32              map_fd;             /*     4     4 */
  		__u32              flags;              /*     8     4 */
  	} prog_bind_map;                               /*     0    12 */
  	struct {
  		__u32              flags;              /*     0     4 */
  		__u32              bpffs_fd;           /*     4     4 */
  	} token_create;                                /*     0     8 */
  };

  root@number:~#

So this is one case where BTF gets us only that far, not getting all
the way to automate the pretty printing of unions designed like 'union
bpf_attr', we will need a custom pretty printer for this union, as using
the libbpf union BTF dumper is way too verbose:

  root@number:~# perf trace --max-events 1 -e bpf bpftool map
       0.000 ( 0.054 ms): bpftool/3409073 bpf(cmd: PROG_LOAD, uattr: (union bpf_attr){(struct){.map_type = (__u32)1,.key_size = (__u32)2,.value_size = (__u32)2755142048,.max_entries = (__u32)32764,.map_flags = (__u32)150263906,.inner_map_fd = (__u32)21920,},(struct){.map_fd = (__u32)1,.key = (__u64)140723063628192,(union){.value = (__u64)94145833392226,.next_key = (__u64)94145833392226,},},.batch = (struct){.in_batch = (__u64)8589934593,.out_batch = (__u64)140723063628192,.keys = (__u64)94145833392226,},(struct){.prog_type = (__u32)1,.insn_cnt = (__u32)2,.insns = (__u64)140723063628192,.license = (__u64)94145833392226,},(struct){.pathname = (__u64)8589934593,.bpf_fd = (__u32)2755142048,.file_flags = (__u32)32764,.path_fd = (__s32)150263906,},(struct){(union){.target_fd = (__u32)1,.target_ifindex = (__u32)1,},.attach_bpf_fd = (__u32)2,.attach_type = (__u32)2755142048,.attach_flags = (__u32)32764,.replace_bpf_fd = (__u32)150263906,(union){.relative_fd = (__u32)21920,.relative_id = (__u32)21920,},},.test = (struct){.prog_fd = (__u32)1,.retval = (__u32)2,.data_size_in = (__u32)2755142048,.data_size_out = (__u32)32764,.data_in = (__u64)94145833392226,},(struct){(union){.start_id = (__u32)1,.prog_id = (__u32)1,.map_id = (__u32)1,.btf_id = (__u32)1,.link_id = (__u32)1,},.next_id = (__u32)2,.open_flags = (__u32)2755142048,},.info = (struct){.bpf_fd = (__u32)1,.info_len = (__u32)2,.info = (__u64)140723063628192,},.query = (struct){(union){.target_fd = (__u32)1,.target_ifindex = (__u32)1,},.attach_type = (__u32)2,.query_flags = (__u32)2755142048,.attach_flags = (__u32)32764,.prog_ids = (__u64)94145833392226,},.raw_tracepoint = (struct){.name = (__u64)8589934593,.prog_fd = (__u32)2755142048,.cookie = (__u64)94145833392226,},(struct){.btf = (__u64)8589934593,.btf_log_buf = (__u64)140723063628192,.btf_size = (__u32)150263906,.btf_log_size = (__u32)21920,},.task_fd_query = (struct){.pid = (__u32)1,.fd = (__u32)2,.flags = (__u32)2755142048,.buf_len = (__u32)32764,.buf = (__u64)94145833392226,},.link_create = (struct){(union){.prog_fd = (__u32)1,.map_fd = (__u32)1,},(u) = 3
  root@number:~# 2: prog_array  name hid_jmp_table  flags 0x0
  	key 4B  value 4B  max_entries 1024  memlock 8440B
  	owner_prog_type tracing  owner jited
  13: hash_of_maps  name cgroup_hash  flags 0x0
  	key 8B  value 4B  max_entries 2048  memlock 167584B
  	pids systemd(1)
  960: array  name libbpf_global  flags 0x0
  	key 4B  value 32B  max_entries 1  memlock 280B
  961: array  name pid_iter.rodata  flags 0x480
  	key 4B  value 4B  max_entries 1  memlock 8192B
  	btf_id 1846  frozen
  	pids bpftool(3409073)
  962: array  name libbpf_det_bind  flags 0x0
  	key 4B  value 32B  max_entries 1  memlock 280B

  root@number:~#

For simpler unions this may be better than not seeing any payload, so
keep it there.

Acked-by: Howard Chu <howardchu95@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZuBLat8cbadILNLA@x1
[ Removed needless parenteses in the if block leading to the trace__btf_scnprintf() call, as per Howard's review comments ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 17:31:51 -03:00
Howard Chu
3278024540 perf trace: Add --force-btf for debugging
If --force-btf is enabled, prefer btf_dump general pretty printer to
perf trace's customized pretty printers.

Mostly for debug purposes.

Committer testing:

diff before/after shows we need several improvements to be able to
compare the changes, first we need to cut off/disable mutable data such
as pids and timestamps, then what is left are the buffer addresses
passed from userspace, returned from kernel space, maybe we can ask
'perf trace' to go on making those reproducible.

That would entail a Pointer Address Translation (PAT) like for
networking, that would, for simple, reproducible if not for these
details, workloads, that we would then use in our regression tests.

Enough digression, this is one such diff:

   openat(dfd: CWD, filename: "/usr/share/locale/locale.alias", flags: RDONLY|CLOEXEC) = 3
  -fstat(fd: 3, statbuf: 0x7fff01f212a0)                                 = 0
  -read(fd: 3, buf: 0x5596bab2d630, count: 4096)                         = 2998
  -read(fd: 3, buf: 0x5596bab2d630, count: 4096)                         = 0
  +fstat(fd: 3, statbuf: 0x7ffc163cf0e0)                                 = 0
  +read(fd: 3, buf: 0x55b4e0631630, count: 4096)                         = 2998
  +read(fd: 3, buf: 0x55b4e0631630, count: 4096)                         = 0
   close(fd: 3)                                                          = 0
   openat(dfd: CWD, filename: "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
   openat(dfd: CWD, filename: "/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
  @@ -45,7 +45,7 @@
   openat(dfd: CWD, filename: "/usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
   openat(dfd: CWD, filename: "/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
   openat(dfd: CWD, filename: "/usr/share/locale/en/LC_MESSAGES/coreutils.mo") = -1 ENOENT (No such file or directory)
  -{ .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7fff01f21990) = 0
  +(struct __kernel_timespec){.tv_sec = (__kernel_time64_t)1,}, rmtp: 0x7ffc163cf7d0) =

The problem more close to our hands is to make the libbpf BTF pretty
printer to have a mode that closely resembles what we're trying to
resemble: strace output.

Being able to run something with 'perf trace' and with 'strace' and get
the exact same output should be of interest of anybody wanting to have
strace and 'perf trace' regression tested against each other.

That last part is 'perf trace' shot at being something so useful as
strace... ;-)

Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240824163322.60796-8-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 09:52:27 -03:00
Howard Chu
a68fd6a6cd perf trace: Collect augmented data using BPF
Include trace_augment.h for TRACE_AUG_MAX_BUF, so that BPF reads
TRACE_AUG_MAX_BUF bytes of buffer maximum.

Determine what type of argument and how many bytes to read from user space, us ing the
value in the beauty_map. This is the relation of parameter type and its corres ponding
value in the beauty map, and how many bytes we read eventually:

string: 1                          -> size of string (till null)
struct: size of struct             -> size of struct
buffer: -1 * (index of paired len) -> value of paired len (maximum: TRACE_AUG_ MAX_BUF)

After reading from user space, we output the augmented data using
bpf_perf_event_output().

If the struct augmenter, augment_sys_enter() failed, we fall back to
using bpf_tail_call().

I have to make the payload 6 times the size of augmented_arg, to pass the
BPF verifier.

Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240815013626.935097-10-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20240824163322.60796-7-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 09:52:20 -03:00
Howard Chu
b257fac12f perf trace: Pretty print buffer data
Define TRACE_AUG_MAX_BUF in trace_augment.h data, which is the maximum
buffer size we can augment. BPF will include this header too.

Print buffer in a way that's different than just printing a string, we
print all the control characters in \digits (such as \0 for null, and
\10 for newline, LF).

For character that has a bigger value than 127, we print the digits
instead of the character itself as well.

Committer notes:

Simplified the buffer scnprintf to avoid using multiple buffers as
discussed in the patch review thread.

We can't really all 'buf' args to SCA_BUF as we're collecting so far
just on the sys_enter path, so we would be printing the previous 'read'
arg buffer contents, not what the kernel puts there.

So instead of:
   static int syscall_fmt__cmp(const void *name, const void *fmtp)
  @@ -1987,8 +1989,6 @@ syscall_arg_fmt__init_array(struct syscall_arg_fmt *arg, struct tep_format_field
  -               else if (strstr(field->type, "char *") && strstr(field->name, "buf"))
  -                       arg->scnprintf = SCA_BUF;

Do:

static const struct syscall_fmt syscall_fmts[] = {
  +       { .name     = "write",      .errpid = true,
  +         .arg = { [1] = { .scnprintf = SCA_BUF /* buf */, from_user = true, }, }, },

Signed-off-by: Howard Chu <howardchu95@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240815013626.935097-8-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20240824163322.60796-6-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 09:52:13 -03:00
Howard Chu
cb32035214 perf trace: Pretty print struct data
Change the arg->augmented.args to arg->augmented.args->value to skip the
header for customized pretty printers, since we collect data in BPF
using the general augment_sys_enter(), which always adds the header.

Use btf_dump API to pretty print augmented struct pointer.

Prefer existed pretty-printer than btf general pretty-printer.

set compact = true and skip_names = true, so that no newline character
and argument name are printed.

Committer notes:

Simplified the btf_dump_snprintf callback to avoid using multiple
buffers, as discussed in the thread accessible via the Link tag below.

Also made it do:

  dump_data_opts.skip_names = !arg->trace->show_arg_names;

I.e. show the type and struct field names according to that tunable, we
probably need another tunable just for this, but for now if the user
wants to see syscall names in addition to its value, it makes sense to
see the struct field names according to that tunable.

Committer testing:

The following have explicitely set beautifiers (SCA_FILENAME,
SCA_SOCKADDR and SCA_PERF_ATTR), SCA_FILENAME is here just because we
have been wiring up the "renameat2" ("renameat" until recently), so it
doesn't use the introduced generic fallback (btf_struct_scnprintf(), see
the definition of SCA_PERF_ATTR, SCA_SOCKADDR to see the more feature
rich beautifiers, that are not using BTF):

  root@number:~# rm -f 987654 ; touch 123456 ; perf trace -e rename* mv 123456 987654
       0.000 ( 0.039 ms): mv/258478 renameat2(olddfd: CWD, oldname: "123456", newdfd: CWD, newname: "987654", flags: NOREPLACE) = 0
  root@number:~# perf trace -e connect,sendto ping -c 1 www.google.com
       0.000 ( 0.014 ms): ping/258481 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0
       0.040 ( 0.003 ms): ping/258481 sendto(fd: 5, buff: 0x55bc317a6980, len: 97, flags: DONTWAIT|NOSIGNAL) = 97
      18.742 ( 0.020 ms): ping/258481 sendto(fd: 5, buff: 0x7ffc04768df0, len: 20, addr: { .family: NETLINK }, addr_len: 0xc) = 20
  PING www.google.com (142.251.129.68) 56(84) bytes of data.
      18.783 ( 0.012 ms): ping/258481 connect(fd: 5, uservaddr: { .family: INET6, port: 0, addr: 2800:3f0:4004:810::2004 }, addrlen: 28) = 0
      18.797 ( 0.001 ms): ping/258481 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
      18.800 ( 0.004 ms): ping/258481 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 142.251.129.68 }, addrlen: 16) = 0
      18.815 ( 0.002 ms): ping/258481 connect(fd: 5, uservaddr: { .family: INET, port: 1025, addr: 142.251.129.68 }, addrlen: 16) = 0
      18.862 ( 0.023 ms): ping/258481 sendto(fd: 3, buff: 0x55bc317a0ac0, len: 64, addr: { .family: INET, port: 0, addr: 142.251.129.68 }, addr_len: 0x10) = 64
      63.330 ( 0.038 ms): ping/258481 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0
      63.435 ( 0.010 ms): ping/258481 sendto(fd: 5, buff: 0x55bc317a8340, len: 110, flags: DONTWAIT|NOSIGNAL) = 110
  64 bytes from rio07s07-in-f4.1e100.net (142.251.129.68): icmp_seq=1 ttl=49 time=44.2 ms

  --- www.google.com ping statistics ---
  1 packets transmitted, 1 received, 0% packet loss, time 0ms
  rtt min/avg/max/mdev = 44.158/44.158/44.158/0.000 ms
  root@number:~# perf trace -e perf_event_open perf stat -e instructions,cache-misses,syscalls:sys_enter*sleep* sleep 1.23456789
       0.000 ( 0.010 ms): :258487/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), config: 0xa00000000, disabled: 1, { bp_len, config2 }: 0x900000000, branch_sample_type: USER|COUNTERS, sample_regs_user: 0x3f1b7ffffffff, sample_stack_user: 258487, clockid: -599052088, sample_regs_intr: 0x60a000003eb, sample_max_stack: 14, sig_data: 120259084288 }, cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3
       0.016 ( 0.002 ms): :258487/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), config: 0x400000000, disabled: 1, { bp_len, config2 }: 0x900000000, branch_sample_type: USER|COUNTERS, sample_regs_user: 0x3f1b7ffffffff, sample_stack_user: 258487, clockid: -599044082, sample_regs_intr: 0x60a000003eb, sample_max_stack: 14, sig_data: 120259084288 }, cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
       1.838 ( 0.006 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0xa00000001, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 5
       1.846 ( 0.002 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000001, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 6
       1.849 ( 0.002 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0xa00000003, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 7
       1.851 ( 0.002 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000003, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 9
       1.853 ( 0.600 ms): perf/258487 perf_event_open(attr_uptr: { type: 2 (tracepoint), size: 136, config: 0x190 (syscalls:sys_enter_nanosleep), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 10
       2.456 ( 0.016 ms): perf/258487 perf_event_open(attr_uptr: { type: 2 (tracepoint), size: 136, config: 0x196 (syscalls:sys_enter_clock_nanosleep), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 11

   Performance counter stats for 'sleep 1.23456789':

           1,402,839      cpu_atom/instructions/
       <not counted>      cpu_core/instructions/                                                  (0.00%)
              11,066      cpu_atom/cache-misses/
       <not counted>      cpu_core/cache-misses/                                                  (0.00%)
                   0      syscalls:sys_enter_nanosleep
                   1      syscalls:sys_enter_clock_nanosleep

         1.236246714 seconds time elapsed

         0.000000000 seconds user
         0.001308000 seconds sys

  root@number:~#

Now if we use it even for the ones we have a specific beautifier in
tools/perf/trace/beauty, i.e. use btf_struct_scnprintf() for all
structs, by adding the following patch:

  @@ -2316,7 +2316,7 @@ static size_t syscall__scnprintf_args(struct syscall *sc, char *bf, size_t size,

   			default_scnprintf = sc->arg_fmt[arg.idx].scnprintf;

  -			if (default_scnprintf == NULL || default_scnprintf == SCA_PTR) {
  +			if (1 || (default_scnprintf == NULL || default_scnprintf == SCA_PTR)) {
   				btf_printed = trace__btf_scnprintf(trace, &arg, bf + printed,
   								   size - printed, val, field->type);
   				if (btf_printed) {

We get:

  root@number:~# perf trace -e connect,sendto ping -c 1 www.google.com
  PING www.google.com (142.251.129.68) 56(84) bytes of data.
       0.000 ( 0.015 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)1,(union){.sa_data_min = (char[14])['/','r','u','n','/','s','y','s','t','e','m','d','/','r',],},}, addrlen: 42) = 0
       0.046 ( 0.004 ms): ping/283259 sendto(fd: 5, buff: 0x559b008ae980, len: 97, flags: DONTWAIT|NOSIGNAL) = 97
       0.353 ( 0.012 ms): ping/283259 sendto(fd: 5, buff: 0x7ffc01294960, len: 20, addr: (struct sockaddr){.sa_family = (sa_family_t)16,}, addr_len: 0xc) = 20
       0.377 ( 0.006 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)2,}, addrlen: 16) = 0
       0.388 ( 0.010 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)10,}, addrlen: 28) = 0
       0.402 ( 0.001 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)2,(union){.sa_data_min = (char[14])[4,1,142,251,129,'D',],},}, addrlen: 16) = 0
       0.425 ( 0.045 ms): ping/283259 sendto(fd: 3, buff: 0x559b008a8ac0, len: 64, addr: (struct sockaddr){.sa_family = (sa_family_t)2,}, addr_len: 0x10) = 64
  64 bytes from rio07s07-in-f4.1e100.net (142.251.129.68): icmp_seq=1 ttl=49 time=44.1 ms

  --- www.google.com ping statistics ---
  1 packets transmitted, 1 received, 0% packet loss, time 0ms
  rtt min/avg/max/mdev = 44.113/44.113/44.113/0.000 ms
      44.849 ( 0.038 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)1,(union){.sa_data_min = (char[14])['/','r','u','n','/','s','y','s','t','e','m','d','/','r',],},}, addrlen: 42) = 0
      44.927 ( 0.006 ms): ping/283259 sendto(fd: 5, buff: 0x559b008b03d0, len: 110, flags: DONTWAIT|NOSIGNAL) = 110
  root@number:~#

Which looks sane, i.e.:

  18.800 ( 0.004 ms): ping/258481 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 142.251.129.68 }, addrlen: 16) = 0

Becomes:

   0.402 ( 0.001 ms): ping/283259 connect(fd: 5, uservaddr: (struct sockaddr){.sa_family = (sa_family_t)2,(union){.sa_data_min = (char[14])[4,1,142,251,129,'D',],},}, addrlen: 16) = 0

And.

  #define AF_UNIX         1       /* Unix domain sockets          */
  #define AF_LOCAL        1       /* POSIX name for AF_UNIX       */
  #define AF_INET         2       /* Internet IP Protocol         */
  <SNIP>
  #define AF_INET6        10      /* IP version 6                 */

And 'D' == 68, so the preexisting sockaddr BPF collector is working with
the new generic BTF pretty printer (btf_struct_scnprintf()), its just
that it doesn't know about 'struct sockaddr' besides what is in BTF,
i.e. its an array of bytes, not an IPv4 address that needs extra
massaging.

Ditto for the 'struct perf_event_attr' case:

       1.851 ( 0.002 ms): perf/258487 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000003, sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 258488 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 9

Becomes:

       2.081 ( 0.002 ms): :283304/283304 perf_event_open(attr_uptr: (struct perf_event_attr){.size = (__u32)136,.config = (__u64)17179869187,.sample_type = (__u64)65536,.read_format = (__u64)3,.disabled = (__u64)0x1,.inherit = (__u64)0x1,.enable_on_exec = (__u64)0x1,.exclude_guest = (__u64)0x1,}, pid: 283305 (sleep), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 9

hex(17179869187) = 0x400000003, etc.

read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING is

enum perf_event_read_format {
        PERF_FORMAT_TOTAL_TIME_ENABLED          = 1U << 0,
        PERF_FORMAT_TOTAL_TIME_RUNNING          = 1U << 1,

and so on.

We need to work with the libbpf btf dump api to get one output that
matches the 'perf trace'/strace expectations/format, but having this in
this current form is already an improvement to 'perf trace', so lets
improve from what we have.

Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240815013626.935097-7-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20240824163322.60796-5-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 09:52:07 -03:00
Howard Chu
7f40306728 perf trace: Add trace__bpf_sys_enter_beauty_map() to prepare for fetching data in BPF
Set up beauty_map, load it to BPF, in such format: if argument No.3 is a
struct of size 32 bytes (of syscall number 114) beauty_map[114][2] = 32;

if argument No.3 is a string (of syscall number 114) beauty_map[114][2] =
1;

if argument No.3 is a buffer, its size is indicated by argument No.4 (of
syscall number 114) beauty_map[114][2] = -4; /* -1 ~ -6, we'll read this
buffer size in BPF  */

Committer notes:

Moved syscall_arg_fmt__cache_btf_struct() from a ifdef
HAVE_LIBBPF_SUPPORT to closer to where it is used, that is ifdef'ed on
HAVE_BPF_SKEL and thus breaks the build when building with
BUILD_BPF_SKEL=0, as detected using 'make -C tools/perf build-test'.

Also add 'struct beauty_map_enter' to tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
as we're using it in this patch, otherwise we get this while trying to
build at this point in the original patch series:

  builtin-trace.c: In function ‘trace__init_syscalls_bpf_prog_array_maps’:
  builtin-trace.c:3725:58: error: ‘struct <anonymous>’ has no member named ‘beauty_map_enter’
   3725 |         int beauty_map_fd = bpf_map__fd(trace->skel->maps.beauty_map_enter);
        |

We also have to take into account syscall_arg_fmt.from_user when telling
the kernel what to copy in the sys_enter generic collector, we don't
want to collect bogus data in buffers that will only be available to us
at sys_exit time, i.e. after the kernel has filled it, so leave this for
when we have such a sys_exit based collector.

Committer testing:

Not wired up yet, so all continues to work, using the existing BPF
collector and userspace beautifiers that are augmentation aware:

  root@number:~# rm -f 987654 ; touch 123456 ; perf trace -e rename* mv 123456 987654
       0.000 ( 0.031 ms): mv/20888 renameat2(olddfd: CWD, oldname: "123456", newdfd: CWD, newname: "987654", flags: NOREPLACE) = 0
  root@number:~# perf trace -e connect,sendto ping -c 1 www.google.com
       0.000 ( 0.014 ms): ping/20892 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0
       0.040 ( 0.003 ms): ping/20892 sendto(fd: 5, buff: 0x560b4ff17980, len: 97, flags: DONTWAIT|NOSIGNAL) = 97
       0.480 ( 0.017 ms): ping/20892 sendto(fd: 5, buff: 0x7ffd82d07150, len: 20, addr: { .family: NETLINK }, addr_len: 0xc) = 20
       0.526 ( 0.014 ms): ping/20892 connect(fd: 5, uservaddr: { .family: INET6, port: 0, addr: 2800:3f0:4004:810::2004 }, addrlen: 28) = 0
       0.542 ( 0.002 ms): ping/20892 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
       0.544 ( 0.004 ms): ping/20892 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 142.251.135.100 }, addrlen: 16) = 0
       0.559 ( 0.002 ms): ping/20892 connect(fd: 5, uservaddr: { .family: INET, port: 1025, addr: 142.251.135.100 }, addrlen: 16PING www.google.com (142.251.135.100) 56(84) bytes of data.
  ) = 0
       0.589 ( 0.058 ms): ping/20892 sendto(fd: 3, buff: 0x560b4ff11ac0, len: 64, addr: { .family: INET, port: 0, addr: 142.251.135.100 }, addr_len: 0x10) = 64
      45.250 ( 0.029 ms): ping/20892 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0
      45.344 ( 0.012 ms): ping/20892 sendto(fd: 5, buff: 0x560b4ff19340, len: 111, flags: DONTWAIT|NOSIGNAL) = 111
  64 bytes from rio09s08-in-f4.1e100.net (142.251.135.100): icmp_seq=1 ttl=49 time=44.4 ms

  --- www.google.com ping statistics ---
  1 packets transmitted, 1 received, 0% packet loss, time 0ms
  rtt min/avg/max/mdev = 44.361/44.361/44.361/0.000 ms
  root@number:~#

Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240815013626.935097-4-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20240824163322.60796-3-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 09:51:59 -03:00
Arnaldo Carvalho de Melo
d92f490cba perf trace: Mark bpf's attr as from_user
This one has no specific pretty printer right now, so will be handled by
the generic BTF based one later in this patch series.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10 09:51:51 -03:00
Arnaldo Carvalho de Melo
c790f2bafb perf trace: Introduce SCA_TIMESPEC_FROM_USER() to set .from_user = true
Paving the way for the generic BPF BTF based syscall arg augmenter.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-09 19:23:04 -03:00
Arnaldo Carvalho de Melo
be14a71984 perf trace: Introduce SCA_SOCKADDR_FROM_USER() to set .from_user = true
Paving the way for the generic BPF BTF based syscall arg augmenter.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-09 19:23:04 -03:00
Arnaldo Carvalho de Melo
690eda6508 perf trace: Introduce SCA_PERF_ATTR_FROM_USER() to set .from_user = true
Paving the way for the generic BPF BTF based syscall arg augmenter.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-09 19:23:04 -03:00
Arnaldo Carvalho de Melo
2f2e439ba5 perf trace: Mark which syscall arguments go from user space to kernel space
We need to know where to collect it in the BPF augmenters, if in the
sys_enter hook or in the sys_exit hook.

Start with the SCA_FILENAME one, that is just from user to kernel space.

The alternative, better, but takes a bit more time than I have now, is
to use the __user information that is already in the syscall args and
encoded in BTF via a tag, do it later.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-09 19:23:03 -03:00
Arnaldo Carvalho de Melo
c90a88d33a perf trace: Use a common encoding for augmented arguments, with size + error + payload
We were using a more compact format, without explicitely encoding the
size and possible error in the payload for an argument.

To do it generically, at least as Howard Chu did in his GSoC activities,
it is more convenient to use the same model that was being used for
string arguments, passing { size, error, payload }.

So use that for the non string syscall args we have so far:

  struct timespec
  struct perf_event_attr
  struct sockaddr (this one has even a variable size)

With this in place we have the userspace pretty printers:

  perf_event_attr___scnprintf()
  syscall_arg__scnprintf_augmented_sockaddr()
  syscall_arg__scnprintf_augmented_timespec()

Ready to have the generic BPF collector in tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c
sending its generic payload and thus we'll use them instead of a generic
libbpf btf_dump interface that doesn't know about about the sockaddr
mux, perf_event_attr non-trivial fields (sample_type, etc), leaving it
as a (useful) fallback that prints just basic types until we put in
place a more sophisticated pretty printer infrastructure that associates
synthesized enums to struct fields using the header scrapers we have in
tools/perf/trace/beauty/, some of them in this list:

  $ ls tools/perf/trace/beauty/*.sh
  tools/perf/trace/beauty/arch_errno_names.sh
  tools/perf/trace/beauty/kcmp_type.sh
  tools/perf/trace/beauty/perf_ioctl.sh
  tools/perf/trace/beauty/statx_mask.sh
  tools/perf/trace/beauty/clone.sh
  tools/perf/trace/beauty/kvm_ioctl.sh
  tools/perf/trace/beauty/pkey_alloc_access_rights.sh
  tools/perf/trace/beauty/sync_file_range.sh
  tools/perf/trace/beauty/drm_ioctl.sh
  tools/perf/trace/beauty/madvise_behavior.sh
  tools/perf/trace/beauty/prctl_option.sh
  tools/perf/trace/beauty/usbdevfs_ioctl.sh
  tools/perf/trace/beauty/fadvise.sh
  tools/perf/trace/beauty/mmap_flags.sh
  tools/perf/trace/beauty/rename_flags.sh
  tools/perf/trace/beauty/vhost_virtio_ioctl.sh
  tools/perf/trace/beauty/fs_at_flags.sh
  tools/perf/trace/beauty/mmap_prot.sh
  tools/perf/trace/beauty/sndrv_ctl_ioctl.sh
  tools/perf/trace/beauty/x86_arch_prctl.sh
  tools/perf/trace/beauty/fsconfig.sh
  tools/perf/trace/beauty/mount_flags.sh
  tools/perf/trace/beauty/sndrv_pcm_ioctl.sh
  tools/perf/trace/beauty/fsmount.sh
  tools/perf/trace/beauty/move_mount_flags.sh
  tools/perf/trace/beauty/sockaddr.sh
  tools/perf/trace/beauty/fspick.sh
  tools/perf/trace/beauty/mremap_flags.sh
  tools/perf/trace/beauty/socket.sh
  $

Testing it:

  root@number:~# rm -f 987654 ; touch 123456 ; perf trace -e rename* mv 123456 987654
     0.000 ( 0.031 ms): mv/1193096 renameat2(olddfd: CWD, oldname: "123456", newdfd: CWD, newname: "987654", flags: NOREPLACE) = 0
  root@number:~# perf trace -e *nanosleep sleep 1.2345678901
       0.000 (1234.654 ms): sleep/1192697 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 234567891 }, rmtp: 0x7ffe1ea80460) = 0
  root@number:~# perf trace -e perf_event_open* perf stat -e cpu-clock sleep 1
       0.000 ( 0.011 ms): perf/1192701 perf_event_open(attr_uptr: { type: 1 (software), size: 136, config: 0 (PERF_COUNT_SW_CPU_CLOCK), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 1192702 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3

   Performance counter stats for 'sleep 1':

                0.51 msec cpu-clock                        #    0.001 CPUs utilized

         1.001242090 seconds time elapsed

         0.000000000 seconds user
         0.001010000 seconds sys

  root@number:~# perf trace -e connect* ping -c 1 bsky.app
       0.000 ( 0.130 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0
      23.907 ( 0.006 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.20.108.158 }, addrlen: 16) = 0
      23.915 PING bsky.app (3.20.108.158) 56(84) bytes of data.
  ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
      23.917 ( 0.002 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.12.170.30 }, addrlen: 16) = 0
      23.921 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
      23.923 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 18.217.70.179 }, addrlen: 16) = 0
      23.925 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
      23.927 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.132.20.46 }, addrlen: 16) = 0
      23.930 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
      23.931 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.142.89.165 }, addrlen: 16) = 0
      23.934 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
      23.935 ( 0.002 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 18.119.147.159 }, addrlen: 16) = 0
      23.938 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
      23.940 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.22.38.164 }, addrlen: 16) = 0
      23.942 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16)           = 0
      23.944 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.13.14.133 }, addrlen: 16) = 0
      23.956 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 1025, addr: 3.20.108.158 }, addrlen: 16) = 0
  ^C
  --- bsky.app ping statistics ---
  1 packets transmitted, 0 received, 100% packet loss, time 0ms

  root@number:~#

Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/CAP-5=fW4=2GoP6foAN6qbrCiUzy0a_TzHbd8rvDsakTPfdzvfg@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-09 19:17:03 -03:00
Arnaldo Carvalho de Melo
c1632cc5ed perf trace augmented_syscalls.bpf: Move the renameat aumenter to renameat2, temporarily
While trying to shape Howard Chu's generic BPF augmenter transition into
the codebase I got stuck with the renameat2 syscall.

Until I noticed that the attempt at reusing augmenters were making it
use the 'openat' syscall augmenter, that collect just one string syscall
arg, for the 'renameat2' syscall, that takes two strings.

So, for the moment, just to help in this transition period, since
'renameat2' is what is used these days in the 'mv' utility, just make
the BPF collector be associated with the more widely used syscall,
hopefully the transition to Howard's generic BPF augmenter will cure
this, so get this out of the way for now!

So now we still have that odd "reuse", but for something we're not
testing so won't get in the way anymore:

  root@number:~# rm -f 987654 ; touch 123456 ; perf trace -vv -e rename* mv 123456 987654 |& grep renameat
  Reusing "openat" BPF sys_enter augmenter for "renameat"
       0.000 ( 0.079 ms): mv/1158612 renameat2(olddfd: CWD, oldname: "123456", newdfd: CWD, newname: "987654", flags: NOREPLACE) = 0
  root@number:~#

Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/CAP-5=fXjGYs=tpBgETK-P9U-CuXssytk9pSnTXpfphrmmOydWA@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-09 19:16:26 -03:00
Kan Liang
003265bb6f perf mem: Fix the wrong reference in parse_record_events()
A segmentation fault can be triggered when running
'perf mem record -e ldlat-loads'

The commit 35b38a71c9 ("perf mem: Rework command option handling")
moves the OPT_CALLBACK of event from __cmd_record() to cmd_mem().

When invoking the __cmd_record(), the 'mem' has been referenced (&).

So the &mem passed into the parse_record_events() is a double reference
(&&) of the original struct perf_mem mem.

But in the cmd_mem(), the &mem is the single reference (&) of the
original struct perf_mem mem.

Fixes: 35b38a71c9 ("perf mem: Rework command option handling")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240905170737.4070743-3-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-06 11:45:28 -03:00
Kan Liang
5ad7db2c3f perf mem: Fix missed p-core mem events on ADL and RPL
The p-core mem events are missed when launching 'perf mem record' on ADL
and RPL.

  root@number:~# perf mem record sleep 1
  Memory events are enabled on a subset of CPUs: 16-27
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.032 MB perf.data ]
  root@number:~# perf evlist
  cpu_atom/mem-loads,ldlat=30/P
  cpu_atom/mem-stores/P
  dummy:u

A variable 'record' in the 'struct perf_mem_event' is to indicate
whether a mem event in a mem_events[] should be recorded. The current
code only configure the variable for the first eligible PMU.

It's good enough for a non-hybrid machine or a hybrid machine which has
the same mem_events[].

However, if a different mem_events[] is used for different PMUs on a
hybrid machine, e.g., ADL or RPL, the 'record' for the second PMU never
get a chance to be set.

The mem_events[] of the second PMU are always ignored.

'perf mem' doesn't support the per-PMU configuration now. A per-PMU
mem_events[] 'record' variable doesn't make sense. Make it global.

That could also avoid searching for the per-PMU mem_events[] via
perf_pmu__mem_events_ptr every time.

Committer testing:

  root@number:~# perf evlist -g
  cpu_atom/mem-loads,ldlat=30/P
  cpu_atom/mem-stores/P
  {cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/}
  cpu_core/mem-stores/P
  dummy:u
  root@number:~#

The :S for '{cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/}' is
not being added by 'perf evlist -g', to be checked.

Fixes: abbdd79b78 ("perf mem: Clean up perf_mem_events__name()")
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Closes: https://lore.kernel.org/lkml/Zthu81fA3kLC2CS2@x1/
Link: https://lore.kernel.org/r/20240905170737.4070743-2-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-06 11:45:17 -03:00
Kan Liang
6e05d28ff2 perf mem: Check mem_events for all eligible PMUs
The current perf_pmu__mem_events_init() only checks the availability of
the mem_events for the first eligible PMU. It works for non-hybrid
machines and hybrid machines that have the same mem_events.

However, it may bring issues if a hybrid machine has a different
mem_events on different PMU, e.g., Alder Lake and Raptor Lake. A
mem-loads-aux event is only required for the p-core. The mem_events on
both e-core and p-core should be checked and marked.

The issue was not found, because it's hidden by another bug, which only
records the mem-events for the e-core. The wrong check for the p-core
events didn't yell.

Fixes: abbdd79b78 ("perf mem: Clean up perf_mem_events__name()")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240905170737.4070743-1-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-06 11:45:07 -03:00
Andi Kleen
4bef6168c1 perf script python: Avoid buffer overflow in python PEBS register interface
Running a script that processes PEBS records gives buffer overflows
in valgrind.

The problem is that the allocation of the register string doesn't
include the terminating 0 byte. Fix this.

I also replaced the very magic "28" with a more reasonable larger buffer
that should fit all registers.  There's no need to conserve memory here.

  ==2106591== Memcheck, a memory error detector
  ==2106591== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
  ==2106591== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
  ==2106591== Command: ../perf script -i tcall.data gcov.py tcall.gcov
  ==2106591==
  ==2106591== Invalid write of size 1
  ==2106591==    at 0x713354: regs_map (trace-event-python.c:748)
  ==2106591==    by 0x7134EB: set_regs_in_dict (trace-event-python.c:784)
  ==2106591==    by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
  ==2106591==    by 0x716327: python_process_general_event (trace-event-python.c:1499)
  ==2106591==    by 0x7164E1: python_process_event (trace-event-python.c:1531)
  ==2106591==    by 0x44F9AF: process_sample_event (builtin-script.c:2549)
  ==2106591==    by 0x6294DC: evlist__deliver_sample (session.c:1534)
  ==2106591==    by 0x6296D0: machines__deliver_event (session.c:1573)
  ==2106591==    by 0x629C39: perf_session__deliver_event (session.c:1655)
  ==2106591==    by 0x625830: ordered_events__deliver_event (session.c:193)
  ==2106591==    by 0x630B23: do_flush (ordered-events.c:245)
  ==2106591==    by 0x630E7A: __ordered_events__flush (ordered-events.c:324)
  ==2106591==  Address 0x7186fe0 is 0 bytes after a block of size 0 alloc'd
  ==2106591==    at 0x484280F: malloc (vg_replace_malloc.c:442)
  ==2106591==    by 0x7134AD: set_regs_in_dict (trace-event-python.c:780)
  ==2106591==    by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
  ==2106591==    by 0x716327: python_process_general_event (trace-event-python.c:1499)
  ==2106591==    by 0x7164E1: python_process_event (trace-event-python.c:1531)
  ==2106591==    by 0x44F9AF: process_sample_event (builtin-script.c:2549)
  ==2106591==    by 0x6294DC: evlist__deliver_sample (session.c:1534)
  ==2106591==    by 0x6296D0: machines__deliver_event (session.c:1573)
  ==2106591==    by 0x629C39: perf_session__deliver_event (session.c:1655)
  ==2106591==    by 0x625830: ordered_events__deliver_event (session.c:193)
  ==2106591==    by 0x630B23: do_flush (ordered-events.c:245)
  ==2106591==    by 0x630E7A: __ordered_events__flush (ordered-events.c:324)
  ==2106591==
  ==2106591== Invalid read of size 1
  ==2106591==    at 0x484B6C6: strlen (vg_replace_strmem.c:502)
  ==2106591==    by 0x555D494: PyUnicode_FromString (unicodeobject.c:1899)
  ==2106591==    by 0x7134F7: set_regs_in_dict (trace-event-python.c:786)
  ==2106591==    by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
  ==2106591==    by 0x716327: python_process_general_event (trace-event-python.c:1499)
  ==2106591==    by 0x7164E1: python_process_event (trace-event-python.c:1531)
  ==2106591==    by 0x44F9AF: process_sample_event (builtin-script.c:2549)
  ==2106591==    by 0x6294DC: evlist__deliver_sample (session.c:1534)
  ==2106591==    by 0x6296D0: machines__deliver_event (session.c:1573)
  ==2106591==    by 0x629C39: perf_session__deliver_event (session.c:1655)
  ==2106591==    by 0x625830: ordered_events__deliver_event (session.c:193)
  ==2106591==    by 0x630B23: do_flush (ordered-events.c:245)
  ==2106591==  Address 0x7186fe0 is 0 bytes after a block of size 0 alloc'd
  ==2106591==    at 0x484280F: malloc (vg_replace_malloc.c:442)
  ==2106591==    by 0x7134AD: set_regs_in_dict (trace-event-python.c:780)
  ==2106591==    by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
  ==2106591==    by 0x716327: python_process_general_event (trace-event-python.c:1499)
  ==2106591==    by 0x7164E1: python_process_event (trace-event-python.c:1531)
  ==2106591==    by 0x44F9AF: process_sample_event (builtin-script.c:2549)
  ==2106591==    by 0x6294DC: evlist__deliver_sample (session.c:1534)
  ==2106591==    by 0x6296D0: machines__deliver_event (session.c:1573)
  ==2106591==    by 0x629C39: perf_session__deliver_event (session.c:1655)
  ==2106591==    by 0x625830: ordered_events__deliver_event (session.c:193)
  ==2106591==    by 0x630B23: do_flush (ordered-events.c:245)
  ==2106591==    by 0x630E7A: __ordered_events__flush (ordered-events.c:324)
  ==2106591==
  ==2106591== Invalid write of size 1
  ==2106591==    at 0x713354: regs_map (trace-event-python.c:748)
  ==2106591==    by 0x713539: set_regs_in_dict (trace-event-python.c:789)
  ==2106591==    by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
  ==2106591==    by 0x716327: python_process_general_event (trace-event-python.c:1499)
  ==2106591==    by 0x7164E1: python_process_event (trace-event-python.c:1531)
  ==2106591==    by 0x44F9AF: process_sample_event (builtin-script.c:2549)
  ==2106591==    by 0x6294DC: evlist__deliver_sample (session.c:1534)
  ==2106591==    by 0x6296D0: machines__deliver_event (session.c:1573)
  ==2106591==    by 0x629C39: perf_session__deliver_event (session.c:1655)
  ==2106591==    by 0x625830: ordered_events__deliver_event (session.c:193)
  ==2106591==    by 0x630B23: do_flush (ordered-events.c:245)
  ==2106591==    by 0x630E7A: __ordered_events__flush (ordered-events.c:324)
  ==2106591==  Address 0x7186fe0 is 0 bytes after a block of size 0 alloc'd
  ==2106591==    at 0x484280F: malloc (vg_replace_malloc.c:442)
  ==2106591==    by 0x7134AD: set_regs_in_dict (trace-event-python.c:780)
  ==2106591==    by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
  ==2106591==    by 0x716327: python_process_general_event (trace-event-python.c:1499)
  ==2106591==    by 0x7164E1: python_process_event (trace-event-python.c:1531)
  ==2106591==    by 0x44F9AF: process_sample_event (builtin-script.c:2549)
  ==2106591==    by 0x6294DC: evlist__deliver_sample (session.c:1534)
  ==2106591==    by 0x6296D0: machines__deliver_event (session.c:1573)
  ==2106591==    by 0x629C39: perf_session__deliver_event (session.c:1655)
  ==2106591==    by 0x625830: ordered_events__deliver_event (session.c:193)
  ==2106591==    by 0x630B23: do_flush (ordered-events.c:245)
  ==2106591==    by 0x630E7A: __ordered_events__flush (ordered-events.c:324)
  ==2106591==
  ==2106591== Invalid read of size 1
  ==2106591==    at 0x484B6C6: strlen (vg_replace_strmem.c:502)
  ==2106591==    by 0x555D494: PyUnicode_FromString (unicodeobject.c:1899)
  ==2106591==    by 0x713545: set_regs_in_dict (trace-event-python.c:791)
  ==2106591==    by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
  ==2106591==    by 0x716327: python_process_general_event (trace-event-python.c:1499)
  ==2106591==    by 0x7164E1: python_process_event (trace-event-python.c:1531)
  ==2106591==    by 0x44F9AF: process_sample_event (builtin-script.c:2549)
  ==2106591==    by 0x6294DC: evlist__deliver_sample (session.c:1534)
  ==2106591==    by 0x6296D0: machines__deliver_event (session.c:1573)
  ==2106591==    by 0x629C39: perf_session__deliver_event (session.c:1655)
  ==2106591==    by 0x625830: ordered_events__deliver_event (session.c:193)
  ==2106591==    by 0x630B23: do_flush (ordered-events.c:245)
  ==2106591==  Address 0x7186fe0 is 0 bytes after a block of size 0 alloc'd
  ==2106591==    at 0x484280F: malloc (vg_replace_malloc.c:442)
  ==2106591==    by 0x7134AD: set_regs_in_dict (trace-event-python.c:780)
  ==2106591==    by 0x713E58: get_perf_sample_dict (trace-event-python.c:940)
  ==2106591==    by 0x716327: python_process_general_event (trace-event-python.c:1499)
  ==2106591==    by 0x7164E1: python_process_event (trace-event-python.c:1531)
  ==2106591==    by 0x44F9AF: process_sample_event (builtin-script.c:2549)
  ==2106591==    by 0x6294DC: evlist__deliver_sample (session.c:1534)
  ==2106591==    by 0x6296D0: machines__deliver_event (session.c:1573)
  ==2106591==    by 0x629C39: perf_session__deliver_event (session.c:1655)
  ==2106591==    by 0x625830: ordered_events__deliver_event (session.c:193)
  ==2106591==    by 0x630B23: do_flush (ordered-events.c:245)
  ==2106591==    by 0x630E7A: __ordered_events__flush (ordered-events.c:324)
  ==2106591==
  73056 total, 29 ignored

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240905151058.2127122-2-ak@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-06 11:44:58 -03:00
Ian Rogers
f2dbc77909 perf jevents: Ignore sys when determining a model directory
Existing sys directories aren't placed under a model directory like
skylake.

Placing a sys directory there causes the `is_leaf_dir` test to fail and
consequently no events or metrics are generated for the model.

Ignore sys directories in this case and update the comments to
reflect why.

This change has no affect, but when testing with a sys directory for a
model people have reported running into the no event/metric issue.

Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240904211705.915101-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-06 11:44:46 -03:00
Arnaldo Carvalho de Melo
92984e4468 Merge remote-tracking branch 'torvalds/master' into perf-tools-next
To pick up fixes from perf-tools/perf-tools, some of which were also in
perf-tools-next but were then indentified as being more appropriate to
go sooner, to fix regressions in v6.11.

Resolve a simple merge conflict in tools/perf/tests/pmu.c where a more
future proof approach to initialize all fields of a struct was used in
perf-tools-next, the one that is going into v6.11 is enough for the
segfault it addressed (using an uninitialized test_pmu.alias field).

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-06 11:42:59 -03:00
Linus Torvalds
b831f83e40 bpf-6.11-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmbaYFMACgkQ6rmadz2v
 bTq7JBAAipwHeOL3IYproQxGy+f0W3Uik9FNlavSQ3zpJHmTJcpf0ysXkqH23g2q
 26CF0R44gmGMkdbZsxbk3HLI2qRmzxmznYCDH0g7d9qwzQMhFHIiY7TW7UD/XbKx
 UHdHLb5PYrj+j94T1WGiQdvbZYDlpmdz5rFA9K/TBtBArqYp9mA4D/cIlTDBfFpk
 cjhSGVl9x/BKbiHKApxSGcR7Fh/+ux9mVdlssWQNhRfm3V2tbRSAw1i1/ydTG+4c
 bf/m0RSIDfPMxy1i7D0lNRbclzWVisTqNzDXHfQoRUJMuMDfsK4UZB/6gvh+2LKy
 D60vT8AfN5ygjJbLdFbwFGnEymjfsXWguyqfQB0d9Hj/2/EsZ01rI2ikJv9J+qKl
 wwZM3YeA3Q/V0mZ5wCONp2dn+s+82nga+fdvCRFz6SLkWQwgbW5BYHFF1c60V9MH
 Pbd9Y5VfCOEZRzR6RxbmguPrnoU1+BUwQeIAp9L73bllrzhtmh/aL/b03uw8/wUh
 I+peLxJ+DVp6wTudgvSMviMySWcztuz397G7TnFyG0V4nKe1+QxSaQWWw2HKvpy3
 i+m98qoWqbuJqz49FpEtX6x/17gZZNA0LK648D77nrOfsGWOLTKOZUDbNWbTPw9a
 Gojg5obJ8P82yO9UCYQLyGsAJxJrKZv3OEmqy0mRG1hrSMsozxg=
 =5Quw
 -----END PGP SIGNATURE-----

Merge tag 'bpf-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Fix crash when btf_parse_base() returns an error (Martin Lau)

 - Fix out of bounds access in btf_name_valid_section() (Jeongjun Park)

* tag 'bpf-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Add a selftest to check for incorrect names
  bpf: add check for invalid name in btf_name_valid_section()
  bpf: Fix a crash when btf_parse_base() returns an error pointer
2024-09-05 20:10:53 -07:00
Linus Torvalds
d759ee240d Including fixes from can, bluetooth and wireless.
No known regressions at this point. Another calm week, but chances are
 that has more to do with vacation season than the quality of our work.
 
 Current release - new code bugs:
 
  - smc: prevent NULL pointer dereference in txopt_get
 
  - eth: ti: am65-cpsw: number of XDP-related fixes
 
 Previous releases - regressions:
 
  - Revert "Bluetooth: MGMT/SMP: Fix address type when using SMP over
    BREDR/LE", it breaks existing user space
 
  - Bluetooth: qca: if memdump doesn't work, re-enable IBS to avoid
    later problems with suspend
 
  - can: mcp251x: fix deadlock if an interrupt occurs during mcp251x_open
 
  - eth: r8152: fix the firmware communication error due to use
    of bulk write
 
  - ptp: ocp: fix serial port information export
 
  - eth: igb: fix not clearing TimeSync interrupts for 82580
 
  - Revert "wifi: ath11k: support hibernation", fix suspend on Lenovo
 
 Previous releases - always broken:
 
  - eth: intel: fix crashes and bugs when reconfiguration and resets
    happening in parallel
 
  - wifi: ath11k: fix NULL dereference in ath11k_mac_get_eirp_power()
 
 Misc:
 
  - docs: netdev: document guidance on cleanup.h
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmbaMbsACgkQMUZtbf5S
 IrtpUg/+J6rNaZuGVTHJQAjdSlMx/HzpN3GIbhYyUSg+iHNclqtxJ706b2vyrG88
 Rw5a+f8aQueONNsoFfa/ooU4cGsdO1oYlch0Wtuj5taCqy2SvtVqJAyiuDyNNjU0
 BQ1Rf7aRLI7enmEpZJN2FFu106YTVccBcLqhPkx0CPcEjV+p5RvypfeQL72H6ZKx
 +7/HzEl4bagHIQW3W1uJGNUdwNP7fP2/Kg7TrTJ1t629nLiJCxKL7LrsmebO5o9a
 v+2NxAa1eujTZ1k7ITcM0wYlxKOaGNFF4sT+dA+GfMe+SFssdhGeZYyv1t0zm5VI
 3apJ/pSHza1a/hXFa6PaOSw5M5LWn4bJOqeZLl/yIV0upu5xadWqmT0gVay8V9lY
 +x/MURGr3seuNRSMsaToHDIq+Us45Dt/qkDDNO/P+9R/BsJKCW05Pfqx3Mr/OHzv
 eeCPbXRh4YYBdrUicBWo04gSD+BUA53vW8FC3pxU5ieLOOcX4kOPeb8wNPHcXjMU
 73D+kyO1ufsfsFMkd3VfgDI1mMz+xpEuZ6pxs33tJ/1Ny7DdG1Q49xlQVh4Wnobk
 uQqUSzdoelOROeg1rwmsIbfwIvj5a5dVIyBu8TDjHlb/rk1QNTkyu+fFmQRWEotL
 fQ7U62wXlpoCT8WchSMtiU32IDJ2+Lwhwecguy1Z7kLOLrtL8XU=
 =ju86
 -----END PGP SIGNATURE-----

Merge tag 'net-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from can, bluetooth and wireless.

  No known regressions at this point. Another calm week, but chances are
  that has more to do with vacation season than the quality of our work.

  Current release - new code bugs:

   - smc: prevent NULL pointer dereference in txopt_get

   - eth: ti: am65-cpsw: number of XDP-related fixes

  Previous releases - regressions:

   - Revert "Bluetooth: MGMT/SMP: Fix address type when using SMP over
     BREDR/LE", it breaks existing user space

   - Bluetooth: qca: if memdump doesn't work, re-enable IBS to avoid
     later problems with suspend

   - can: mcp251x: fix deadlock if an interrupt occurs during
     mcp251x_open

   - eth: r8152: fix the firmware communication error due to use of bulk
     write

   - ptp: ocp: fix serial port information export

   - eth: igb: fix not clearing TimeSync interrupts for 82580

   - Revert "wifi: ath11k: support hibernation", fix suspend on Lenovo

  Previous releases - always broken:

   - eth: intel: fix crashes and bugs when reconfiguration and resets
     happening in parallel

   - wifi: ath11k: fix NULL dereference in ath11k_mac_get_eirp_power()

  Misc:

   - docs: netdev: document guidance on cleanup.h"

* tag 'net-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
  ila: call nf_unregister_net_hooks() sooner
  tools/net/ynl: fix cli.py --subscribe feature
  MAINTAINERS: fix ptp ocp driver maintainers address
  selftests: net: enable bind tests
  net: dsa: vsc73xx: fix possible subblocks range of CAPT block
  sched: sch_cake: fix bulk flow accounting logic for host fairness
  docs: netdev: document guidance on cleanup.h
  net: xilinx: axienet: Fix race in axienet_stop
  net: bridge: br_fdb_external_learn_add(): always set EXT_LEARN
  r8152: fix the firmware doesn't work
  fou: Fix null-ptr-deref in GRO.
  bareudp: Fix device stats updates.
  net: mana: Fix error handling in mana_create_txq/rxq's NAPI cleanup
  bpf, net: Fix a potential race in do_sock_getsockopt()
  net: dqs: Do not use extern for unused dql_group
  sch/netem: fix use after free in netem_dequeue
  usbnet: modern method to get random MAC
  MAINTAINERS: wifi: cw1200: add net-cw1200.h
  ice: do not bring the VSI up, if it was down before the XDP setup
  ice: remove ICE_CFG_BUSY locking from AF_XDP code
  ...
2024-09-05 17:08:01 -07:00
Linus Torvalds
f95359996a spi: Fixes for v6.11
A few small driver specific fixes (including some of the widespread work
 on fixing missing ID tables for module autoloading and the revert of
 some problematic PM work in spi-rockchip), some improvements to the
 MAINTAINERS information for the NXP drivers and the addition of a new
 device ID to spidev.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmbaK/gACgkQJNaLcl1U
 h9DzvAf+LDdI7hKt61JqwMNpmAB9n2AlXBCksfiWHVVXqoYaRBUQeMsoXjaAmkZ5
 CAZneRV0yud9JR9BdRWITbYkMc3XzoMVDN5Z6C329lQjAJKyOE4n/VhgODcT1Hcj
 PmKigUI0Ub6xEq7NyZZxl4/W/1Pp8Ybg2Z1+1aF4vW2SVMGg6+NlmEZitV61M64f
 ZHH4c6SpW1fQPcUeNzlhOg3v6dGuyDf1/Mn4gopdrNzXP8VUvltMtnKBo4xZk34o
 kMxXMgWQ8bDD6vHQP7P50D49y70OMmoySo4GkWpnyzuIQVEN1W/QoC3py2NbPhQE
 WtAgG/9sf90tuPgaDQ6h3fknbYEGZg==
 =fedt
 -----END PGP SIGNATURE-----

Merge tag 'spi-fix-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "A few small driver specific fixes (including some of the widespread
  work on fixing missing ID tables for module autoloading and the revert
  of some problematic PM work in spi-rockchip), some improvements to the
  MAINTAINERS information for the NXP drivers and the addition of a new
  device ID to spidev"

* tag 'spi-fix-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  MAINTAINERS: SPI: Add mailing list imx@lists.linux.dev for nxp spi drivers
  MAINTAINERS: SPI: Add freescale lpspi maintainer information
  spi: spi-fsl-lpspi: Fix off-by-one in prescale max
  spi: spidev: Add missing spi_device_id for jg10309-01
  spi: bcm63xx: Enable module autoloading
  spi: intel: Add check devm_kasprintf() returned value
  spi: spidev: Add an entry for elgin,jg10309-01
  spi: rockchip: Resolve unbalanced runtime PM / system PM handling
2024-09-05 16:49:10 -07:00
Linus Torvalds
2a66044754 regulator: Fix for v6.11
A fix from Doug Anderson for a missing stub, required to fix the build
 for some newly added users of devm_regulator_bulk_get_const() in
 !REGULATOR configurations.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmbaKwgACgkQJNaLcl1U
 h9A0Twf+Nw4D6a/LVuGTArpWXahY/HEP/BndU9Su4modr30SQY1/Ih4/PMAlIVyd
 RQvWTGXZdesxv04gOM9TSoLl7IUgVR5nrVPbj7H36C+Nw+l9e6lH24+PgVN96FIb
 jFXOAyI6ka+ghj3xRlxObDqnjxRBFdcY6WTw2RbaJIPj3XjY3EQ9gQhp2k75ktTS
 JWU0QX5Q9jaNirQGjLUuDRNyhIZZHur7VM1TuzH44qDmbZ/O5e3UKQUAWUGKJgxE
 bH64OYvQjWPnbhr0ABOPRd+0cVCAXbW+7KxcnhhHtIf/023YCAmMIDqAjsLlIzvP
 cb3sWErzFS7pzQUTE1RMMbbfQ/eqKw==
 =hPtk
 -----END PGP SIGNATURE-----

Merge tag 'regulator-fix-v6.11-stub' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fix from Mark Brown:
 "A fix from Doug Anderson for a missing stub, required to fix the build
  for some newly added users of devm_regulator_bulk_get_const() in
  !REGULATOR configurations"

* tag 'regulator-fix-v6.11-stub' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: core: Stub devm_regulator_bulk_get_const() if !CONFIG_REGULATOR
2024-09-05 16:41:16 -07:00
Linus Torvalds
6c5b3e30e5 Rust fixes for v6.11 (2nd)
Toolchain and infrastructure:
 
  - Fix builds for nightly compiler users now that 'new_uninit' was split
    into new features by using an alternative approach for the code that
    used what is now called the 'box_uninit_write' feature.
 
  - Allow the 'stable_features' lint to preempt upcoming warnings about
    them, since soon there will be unstable features that will become
    stable in nightly compilers.
 
  - Export bss symbols too.
 
 'kernel' crate:
 
  - 'block' module: fix wrong usage of lockdep API.
 
 'macros' crate:
 
  - Provide correct provenance when constructing 'THIS_MODULE'.
 
 Documentation:
 
  - Remove unintended indentation (blockquotes) in generated output.
 
  - Fix a couple typos.
 
 MAINTAINERS:
 
  - Remove Wedson as Rust maintainer.
 
  - Update Andreas' email.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmbaIkMACgkQGXyLc2ht
 IW2PtQ//ZMLngRVmzOtek7oz3NfR7vAbMjwknC8uy9xKsBGGYpyI9LUdBziMB/SD
 +/moWfLacJwuFRRlgsPVjo6IcwCnMauO5V77cWWTbcH6dgigfD4fz/IpZlMuTdUA
 gQ8xAjr8jMK0uT3ME5v7QpXQYOE/1wvOHJpULO71On1l4JqxCbSRozxI4TTxFEmP
 mVRCV7c38NzX2YZyYkIzbkXNS8y0tbCTMC1HLCfROOrxbGh7eqjq/AZcHl2SkxHg
 C0u0Rc7BQtMkCQqtceEWPNloe2SlznbhqiJi6vOttM3LUFYCtS76zqrGnnqtFkMF
 zJuXYNpbjFDvZCfbOL50p0oYb5G/+FgZarK+TPXympPnyqoyWNufwS5N0Hn10wn8
 5GbI3ZGrg6q5SE39Sf/wiC9tMP4h2dUTeLwabk2yuW7QVBAxw52SkEtpRIevSN2n
 b0IUExRHilXMBY0FmkP3x+Ot+qkydxWH0c/rpf9NldNfP7F6QY384hn47/cl8bMo
 LUygXBjAbDTtKage97JkZTfmOUeS6nrYrQjLL/kUu1H+AdnhOUndtaHv6W5zArQQ
 hFU751+izJFiEw8wqw1MoK2CcRL5QNMDdrj+kwXaCMNSv0Ie3p4jqXuWXSVUBoMz
 l0pJMgjE1Dc2QYTDV3Jq8ptiiBBM9Oyxen/doBA0OJiMm9S1drQ=
 =nJge
 -----END PGP SIGNATURE-----

Merge tag 'rust-fixes-6.11-2' of https://github.com/Rust-for-Linux/linux

Pull Rust fixes from Miguel Ojeda:
 "Toolchain and infrastructure:

   - Fix builds for nightly compiler users now that 'new_uninit' was
     split into new features by using an alternative approach for the
     code that used what is now called the 'box_uninit_write' feature

   - Allow the 'stable_features' lint to preempt upcoming warnings about
     them, since soon there will be unstable features that will become
     stable in nightly compilers

   - Export bss symbols too

  'kernel' crate:

   - 'block' module: fix wrong usage of lockdep API

  'macros' crate:

   - Provide correct provenance when constructing 'THIS_MODULE'

  Documentation:

   - Remove unintended indentation (blockquotes) in generated output

   - Fix a couple typos

  MAINTAINERS:

   - Remove Wedson as Rust maintainer

   - Update Andreas' email"

* tag 'rust-fixes-6.11-2' of https://github.com/Rust-for-Linux/linux:
  MAINTAINERS: update Andreas Hindborg's email address
  MAINTAINERS: Remove Wedson as Rust maintainer
  rust: macros: provide correct provenance when constructing THIS_MODULE
  rust: allow `stable_features` lint
  docs: rust: remove unintended blockquote in Quick Start
  rust: alloc: eschew `Box<MaybeUninit<T>>::write`
  rust: kernel: fix typos in code comments
  docs: rust: remove unintended blockquote in Coding Guidelines
  rust: block: fix wrong usage of lockdep API
  rust: kbuild: fix export of bss symbols
2024-09-05 16:35:57 -07:00
Linus Torvalds
e4b42053b7 Tracing fixes for 6.11:
- Fix adding a new fgraph callback after function graph tracing has
   already started.
 
   If the new caller does not initialize its hash before registering the
   fgraph_ops, it can cause a NULL pointer dereference. Fix this by adding
   a new parameter to ftrace_graph_enable_direct() passing in the newly
   added gops directly and not rely on using the fgraph_array[], as entries
   in the fgraph_array[] must be initialized. Assign the new gops to the
   fgraph_array[] after it goes through ftrace_startup_subops() as that
   will properly initialize the gops->ops and initialize its hashes.
 
 - Fix a memory leak in fgraph storage memory test.
 
   If the "multiple fgraph storage on a function" boot up selftest
   fails in the registering of the function graph tracer, it will
   not free the memory it allocated for the filter. Break the loop
   up into two where it allocates the filters first and then registers
   the functions where any errors will do the appropriate clean ups.
 
 - Only clear the timerlat timers if it has an associated kthread.
 
   In the rtla tool that uses timerlat, if it was killed just as it
   was shutting down, the signals can free the kthread and the timer.
   But the closing of the timerlat files could cause the hrtimer_cancel()
   to be called on the already freed timer. As the kthread variable is
   is set to NULL when the kthreads are stopped and the timers are freed
   it can be used to know not to call hrtimer_cancel() on the timer if
   the kthread variable is NULL.
 
 - Use a cpumask to keep track of osnoise/timerlat kthreads
 
   The timerlat tracer can use user space threads for its analysis.
   With the killing of the rtla tool, the kernel can get confused
   between if it is using a user space thread to analyze or one of its
   own kernel threads. When this confusion happens, kthread_stop()
   can be called on a user space thread and bad things happen.
   As the kernel threads are per-cpu, a bitmask can be used to know
   when a kernel thread is used or when a user space thread is used.
 
 - Add missing interface_lock to osnoise/timerlat stop_kthread()
 
   The stop_kthread() function in osnoise/timerlat clears the
   osnoise kthread variable, and if it was a user space thread does
   a put_task on it. But this can race with the closing of the timerlat
   files that also does a put_task on the kthread, and if the race happens
   the task will have put_task called on it twice and oops.
 
 - Add cond_resched() to the tracing_iter_reset() loop.
 
   The latency tracers keep writing to the ring buffer without resetting
   when it issues a new "start" event (like interrupts being disabled).
   When reading the buffer with an iterator, the tracing_iter_reset()
   sets its pointer to that start event by walking through all the events
   in the buffer until it gets to the time stamp of the start event.
   In the case of a very large buffer, the loop that looks for the start
   event has been reported taking a very long time with a non preempt kernel
   that it can trigger a soft lock up warning. Add a cond_resched() into
   that loop to make sure that doesn't happen.
 
 - Use list_del_rcu() for eventfs ei->list variable
 
   It was reported that running loops of creating and deleting  kprobe events
   could cause a crash due to the eventfs list iteration hitting a LIST_POISON
   variable. This is because the list is protected by SRCU but when an item is
   deleted from the list, it was using list_del() which poisons the "next"
   pointer. This is what list_del_rcu() was to prevent.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZtohNBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qtoNAQDQKjomYLCpLz2EqgHZ6VB81QVrHuqt
 cU7xuEfUJDzyyAEA/n0t6quIdjYRd6R2/KxGkP6By/805Coq4IZMTgNQmw0=
 =nZ7k
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix adding a new fgraph callback after function graph tracing has
   already started.

   If the new caller does not initialize its hash before registering the
   fgraph_ops, it can cause a NULL pointer dereference. Fix this by
   adding a new parameter to ftrace_graph_enable_direct() passing in the
   newly added gops directly and not rely on using the fgraph_array[],
   as entries in the fgraph_array[] must be initialized.

   Assign the new gops to the fgraph_array[] after it goes through
   ftrace_startup_subops() as that will properly initialize the
   gops->ops and initialize its hashes.

 - Fix a memory leak in fgraph storage memory test.

   If the "multiple fgraph storage on a function" boot up selftest fails
   in the registering of the function graph tracer, it will not free the
   memory it allocated for the filter. Break the loop up into two where
   it allocates the filters first and then registers the functions where
   any errors will do the appropriate clean ups.

 - Only clear the timerlat timers if it has an associated kthread.

   In the rtla tool that uses timerlat, if it was killed just as it was
   shutting down, the signals can free the kthread and the timer. But
   the closing of the timerlat files could cause the hrtimer_cancel() to
   be called on the already freed timer. As the kthread variable is is
   set to NULL when the kthreads are stopped and the timers are freed it
   can be used to know not to call hrtimer_cancel() on the timer if the
   kthread variable is NULL.

 - Use a cpumask to keep track of osnoise/timerlat kthreads

   The timerlat tracer can use user space threads for its analysis. With
   the killing of the rtla tool, the kernel can get confused between if
   it is using a user space thread to analyze or one of its own kernel
   threads. When this confusion happens, kthread_stop() can be called on
   a user space thread and bad things happen. As the kernel threads are
   per-cpu, a bitmask can be used to know when a kernel thread is used
   or when a user space thread is used.

 - Add missing interface_lock to osnoise/timerlat stop_kthread()

   The stop_kthread() function in osnoise/timerlat clears the osnoise
   kthread variable, and if it was a user space thread does a put_task
   on it. But this can race with the closing of the timerlat files that
   also does a put_task on the kthread, and if the race happens the task
   will have put_task called on it twice and oops.

 - Add cond_resched() to the tracing_iter_reset() loop.

   The latency tracers keep writing to the ring buffer without resetting
   when it issues a new "start" event (like interrupts being disabled).
   When reading the buffer with an iterator, the tracing_iter_reset()
   sets its pointer to that start event by walking through all the
   events in the buffer until it gets to the time stamp of the start
   event. In the case of a very large buffer, the loop that looks for
   the start event has been reported taking a very long time with a non
   preempt kernel that it can trigger a soft lock up warning. Add a
   cond_resched() into that loop to make sure that doesn't happen.

 - Use list_del_rcu() for eventfs ei->list variable

   It was reported that running loops of creating and deleting kprobe
   events could cause a crash due to the eventfs list iteration hitting
   a LIST_POISON variable. This is because the list is protected by SRCU
   but when an item is deleted from the list, it was using list_del()
   which poisons the "next" pointer. This is what list_del_rcu() was to
   prevent.

* tag 'trace-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/timerlat: Add interface_lock around clearing of kthread in stop_kthread()
  tracing/timerlat: Only clear timer if a kthread exists
  tracing/osnoise: Use a cpumask to know what threads are kthreads
  eventfs: Use list_del_rcu() for SRCU protected list variable
  tracing: Avoid possible softlockup in tracing_iter_reset()
  tracing: Fix memory leak in fgraph storage selftest
  tracing: fgraph: Fix to add new fgraph_ops to array after ftrace_startup_subops()
2024-09-05 16:29:41 -07:00
Eric Dumazet
031ae72825 ila: call nf_unregister_net_hooks() sooner
syzbot found an use-after-free Read in ila_nf_input [1]

Issue here is that ila_xlat_exit_net() frees the rhashtable,
then call nf_unregister_net_hooks().

It should be done in the reverse way, with a synchronize_rcu().

This is a good match for a pre_exit() method.

[1]
 BUG: KASAN: use-after-free in rht_key_hashfn include/linux/rhashtable.h:159 [inline]
 BUG: KASAN: use-after-free in __rhashtable_lookup include/linux/rhashtable.h:604 [inline]
 BUG: KASAN: use-after-free in rhashtable_lookup include/linux/rhashtable.h:646 [inline]
 BUG: KASAN: use-after-free in rhashtable_lookup_fast+0x77a/0x9b0 include/linux/rhashtable.h:672
Read of size 4 at addr ffff888064620008 by task ksoftirqd/0/16

CPU: 0 UID: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.11.0-rc4-syzkaller-00238-g2ad6d23f465a #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
Call Trace:
 <TASK>
  __dump_stack lib/dump_stack.c:93 [inline]
  dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
  print_address_description mm/kasan/report.c:377 [inline]
  print_report+0x169/0x550 mm/kasan/report.c:488
  kasan_report+0x143/0x180 mm/kasan/report.c:601
  rht_key_hashfn include/linux/rhashtable.h:159 [inline]
  __rhashtable_lookup include/linux/rhashtable.h:604 [inline]
  rhashtable_lookup include/linux/rhashtable.h:646 [inline]
  rhashtable_lookup_fast+0x77a/0x9b0 include/linux/rhashtable.h:672
  ila_lookup_wildcards net/ipv6/ila/ila_xlat.c:132 [inline]
  ila_xlat_addr net/ipv6/ila/ila_xlat.c:652 [inline]
  ila_nf_input+0x1fe/0x3c0 net/ipv6/ila/ila_xlat.c:190
  nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
  nf_hook_slow+0xc3/0x220 net/netfilter/core.c:626
  nf_hook include/linux/netfilter.h:269 [inline]
  NF_HOOK+0x29e/0x450 include/linux/netfilter.h:312
  __netif_receive_skb_one_core net/core/dev.c:5661 [inline]
  __netif_receive_skb+0x1ea/0x650 net/core/dev.c:5775
  process_backlog+0x662/0x15b0 net/core/dev.c:6108
  __napi_poll+0xcb/0x490 net/core/dev.c:6772
  napi_poll net/core/dev.c:6841 [inline]
  net_rx_action+0x89b/0x1240 net/core/dev.c:6963
  handle_softirqs+0x2c4/0x970 kernel/softirq.c:554
  run_ksoftirqd+0xca/0x130 kernel/softirq.c:928
  smpboot_thread_fn+0x544/0xa30 kernel/smpboot.c:164
  kthread+0x2f0/0x390 kernel/kthread.c:389
  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
 </TASK>

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x64620
flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
page_type: 0xbfffffff(buddy)
raw: 00fff00000000000 ffffea0000959608 ffffea00019d9408 0000000000000000
raw: 0000000000000000 0000000000000003 00000000bfffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as freed
page last allocated via order 3, migratetype Unmovable, gfp_mask 0x52dc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO), pid 5242, tgid 5242 (syz-executor), ts 73611328570, free_ts 618981657187
  set_page_owner include/linux/page_owner.h:32 [inline]
  post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1493
  prep_new_page mm/page_alloc.c:1501 [inline]
  get_page_from_freelist+0x2e4c/0x2f10 mm/page_alloc.c:3439
  __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4695
  __alloc_pages_node_noprof include/linux/gfp.h:269 [inline]
  alloc_pages_node_noprof include/linux/gfp.h:296 [inline]
  ___kmalloc_large_node+0x8b/0x1d0 mm/slub.c:4103
  __kmalloc_large_node_noprof+0x1a/0x80 mm/slub.c:4130
  __do_kmalloc_node mm/slub.c:4146 [inline]
  __kmalloc_node_noprof+0x2d2/0x440 mm/slub.c:4164
  __kvmalloc_node_noprof+0x72/0x190 mm/util.c:650
  bucket_table_alloc lib/rhashtable.c:186 [inline]
  rhashtable_init_noprof+0x534/0xa60 lib/rhashtable.c:1071
  ila_xlat_init_net+0xa0/0x110 net/ipv6/ila/ila_xlat.c:613
  ops_init+0x359/0x610 net/core/net_namespace.c:139
  setup_net+0x515/0xca0 net/core/net_namespace.c:343
  copy_net_ns+0x4e2/0x7b0 net/core/net_namespace.c:508
  create_new_namespaces+0x425/0x7b0 kernel/nsproxy.c:110
  unshare_nsproxy_namespaces+0x124/0x180 kernel/nsproxy.c:228
  ksys_unshare+0x619/0xc10 kernel/fork.c:3328
  __do_sys_unshare kernel/fork.c:3399 [inline]
  __se_sys_unshare kernel/fork.c:3397 [inline]
  __x64_sys_unshare+0x38/0x40 kernel/fork.c:3397
page last free pid 11846 tgid 11846 stack trace:
  reset_page_owner include/linux/page_owner.h:25 [inline]
  free_pages_prepare mm/page_alloc.c:1094 [inline]
  free_unref_page+0xd22/0xea0 mm/page_alloc.c:2612
  __folio_put+0x2c8/0x440 mm/swap.c:128
  folio_put include/linux/mm.h:1486 [inline]
  free_large_kmalloc+0x105/0x1c0 mm/slub.c:4565
  kfree+0x1c4/0x360 mm/slub.c:4588
  rhashtable_free_and_destroy+0x7c6/0x920 lib/rhashtable.c:1169
  ila_xlat_exit_net+0x55/0x110 net/ipv6/ila/ila_xlat.c:626
  ops_exit_list net/core/net_namespace.c:173 [inline]
  cleanup_net+0x802/0xcc0 net/core/net_namespace.c:640
  process_one_work kernel/workqueue.c:3231 [inline]
  process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3312
  worker_thread+0x86d/0xd40 kernel/workqueue.c:3390
  kthread+0x2f0/0x390 kernel/kthread.c:389
  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Memory state around the buggy address:
 ffff88806461ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff88806461ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888064620000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                      ^
 ffff888064620080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff888064620100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

Fixes: 7f00feaf10 ("ila: Add generic ILA translation facility")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20240904144418.1162839-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-05 14:57:12 -07:00
Arkadiusz Kubalewski
6fda63c45f tools/net/ynl: fix cli.py --subscribe feature
Execution of command:
./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml /
	--subscribe "monitor" --sleep 10
fails with:
  File "/repo/./tools/net/ynl/cli.py", line 109, in main
    ynl.check_ntf()
  File "/repo/tools/net/ynl/lib/ynl.py", line 924, in check_ntf
    op = self.rsp_by_value[nl_msg.cmd()]
KeyError: 19

Parsing Generic Netlink notification messages performs lookup for op in
the message. The message was not yet decoded, and is not yet considered
GenlMsg, thus msg.cmd() returns Generic Netlink family id (19) instead of
proper notification command id (i.e.: DPLL_CMD_PIN_CHANGE_NTF=13).

Allow the op to be obtained within NetlinkProtocol.decode(..) itself if the
op was not passed to the decode function, thus allow parsing of Generic
Netlink notifications without causing the failure.

Suggested-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/netdev/m2le0n5xpn.fsf@gmail.com/
Fixes: 0a966d606c ("tools/net/ynl: Fix extack decoding for directional ops")
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20240904135034.316033-1-arkadiusz.kubalewski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-05 14:56:45 -07:00
Vadim Fedorenko
20d664ebd2 MAINTAINERS: fix ptp ocp driver maintainers address
While checking the latest series for ptp_ocp driver I realised that
MAINTAINERS file has wrong item about email on linux.dev domain.

Fixes: 795fd9342c ("ptp_ocp: adjust MAINTAINERS and mailmap")
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240904131855.559078-1-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-05 14:47:46 -07:00
Jamie Bainbridge
e4af74a53b selftests: net: enable bind tests
bind_wildcard is compiled but not run, bind_timewait is not compiled.

These two tests complete in a very short time, use the test harness
properly, and seem reasonable to enable.

The author of the tests confirmed via email that these were
intended to be run.

Enable these two tests.

Fixes: 13715acf8a ("selftest: Add test for bind() conflicts.")
Fixes: 2c042e8e54 ("tcp: Add selftest for bind() and TIME_WAIT.")
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/5a009b26cf5fb1ad1512d89c61b37e2fac702323.1725430322.git.jamie.bainbridge@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-05 14:38:15 -07:00
Frank Li
c9ca76e823
MAINTAINERS: SPI: Add mailing list imx@lists.linux.dev for nxp spi drivers
Add mailing list imx@lists.linux.dev for nxp spi drivers(qspi, fspi and
dspi).

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Stefan Wahren <wahrenst@gmx.net>
Link: https://patch.msgid.link/20240905155230.1901787-1-Frank.Li@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-09-05 19:15:45 +01:00