Commit Graph

24171 Commits

Author SHA1 Message Date
Changbin Du
846e193980 perf ftrace: Add option '-m/--buffer-size' to set per-cpu buffer size
This adds an option '-m/--buffer-size' to allow us set the size of per-cpu
tracing buffer.

Committer testing:

Before running with this option:

  # find /sys/kernel/tracing/ -name buffer_size_kb | xargs cat
  1408
  1408
  1408
  1408
  1408
  1408
  1408
  1408
  1408
  #

Then, run:

  # perf ftrace -m 2048K | head -10
   2)               |  mutex_unlock() {
   2)   ==========> |
   2)               |    smp_irq_work_interrupt() {
   2)               |      irq_enter() {
   2)   0.121 us    |        rcu_irq_enter();
   2)   0.128 us    |        irqtime_account_irq();
   2)   0.719 us    |      }
   2)               |      __wake_up() {
   2)               |        __wake_up_common_lock() {
   2)   0.105 us    |          _raw_spin_lock_irqsave();
  #

Now look at those tracefs knobs:

  # find /sys/kernel/tracing/ -name buffer_size_kb | xargs cat
  2048
  2048
  2048
  2048
  2048
  2048
  2048
  2048
  2048
  #

This should be similar to the -m option in the other perf tools, such as
'perf record', 'perf trace', etc.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: http://lore.kernel.org/lkml/20200808023141.14227-5-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14 08:37:46 -03:00
Changbin Du
68faab0f93 perf ftrace: Factor out function write_tracing_file_int()
We will reuse this function later.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: http://lore.kernel.org/lkml/20200808023141.14227-4-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14 08:37:05 -03:00
Changbin Du
d6d81bfe42 perf ftrace: Add option '-F/--funcs' to list available functions
This adds an option '-F/--funcs' to list all available functions to
trace, which is read from tracing file 'available_filter_functions'.

  $ sudo ./perf ftrace -F | head
  trace_initcall_finish_cb
  initcall_blacklisted
  do_one_initcall
  do_one_initcall
  trace_initcall_start_cb
  run_init_process
  try_to_run_init_process
  match_dev_by_label
  match_dev_by_uuid
  rootfs_init_fs_context
  $

Committer notes:

This is the same command line option and for the same purpose as in
'perf probe'.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: http://lore.kernel.org/lkml/20200808023141.14227-3-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14 08:36:38 -03:00
Changbin Du
eb6d31ae22 perf ftrace: Select function/function_graph tracer automatically
The '-g/-G' options have already implied function_graph tracer should be
used instead of function tracer. So we don't need extra option
'--tracer' in this case.

This patch changes the behavior as below:

  - If '-g' or '-G' option is on, then function_graph tracer is used.
  - If '-T' or '-N' option is on, then function tracer is used.
  - The function_graph has priority over function tracer.
  - The option '--tracer' only take effect if neither -g/-G nor -T/-N
    is specified.

Here are some examples.

This will start tracing all functions using default tracer:

  $ sudo perf ftrace

This will trace all functions using function graph tracer:

  $ sudo perf ftrace -G '*'

This will trace function vfs_read using function graph tracer:

  $ sudo perf ftrace -G vfs_read

This will trace function vfs_read using function tracer:

  $ sudo perf ftrace -T vfs_read

Committer notes:

Using '-h -G' will tell what that option is about, so to further clarify
the above examples:

  # perf ftrace -h -G

    -G, --graph-funcs <func>	Set graph filter on given functions

  # perf ftrace -h -g

    -g, --nograph-funcs <func>	Set nograph filter on given functions

  # perf ftrace -h -T

    -T, --trace-funcs <func>	trace given functions only

  # perf ftrace -h -N

    -N, --notrace-funcs <func>	do not trace given functions

  #

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: http://lore.kernel.org/lkml/20200808023141.14227-2-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14 08:36:31 -03:00
Linus Torvalds
a1d21081a6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
 "Some merge window fallout, some longer term fixes:

   1) Handle headroom properly in lapbether and x25_asy drivers, from
      Xie He.

   2) Fetch MAC address from correct r8152 device node, from Thierry
      Reding.

   3) In the sw kTLS path we should allow MSG_CMSG_COMPAT in sendmsg,
      from Rouven Czerwinski.

   4) Correct fdputs in socket layer, from Miaohe Lin.

   5) Revert troublesome sockptr_t optimization, from Christoph Hellwig.

   6) Fix TCP TFO key reading on big endian, from Jason Baron.

   7) Missing CAP_NET_RAW check in nfc, from Qingyu Li.

   8) Fix inet fastreuse optimization with tproxy sockets, from Tim
      Froidcoeur.

   9) Fix 64-bit divide in new SFC driver, from Edward Cree.

  10) Add a tracepoint for prandom_u32 so that we can more easily
      perform usage analysis. From Eric Dumazet.

  11) Fix rwlock imbalance in AF_PACKET, from John Ogness"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits)
  net: openvswitch: introduce common code for flushing flows
  af_packet: TPACKET_V3: fix fill status rwlock imbalance
  random32: add a tracepoint for prandom_u32()
  Revert "ipv4: tunnel: fix compilation on ARCH=um"
  net: accept an empty mask in /sys/class/net/*/queues/rx-*/rps_cpus
  net: ethernet: stmmac: Disable hardware multicast filter
  net: stmmac: dwmac1000: provide multicast filter fallback
  ipv4: tunnel: fix compilation on ARCH=um
  vsock: fix potential null pointer dereference in vsock_poll()
  sfc: fix ef100 design-param checking
  net: initialize fastreuse on inet_inherit_port
  net: refactor bind_bucket fastreuse into helper
  net: phy: marvell10g: fix null pointer dereference
  net: Fix potential memory leak in proto_register()
  net: qcom/emac: add missed clk_disable_unprepare in error path of emac_clks_phase1_init
  ionic_lif: Use devm_kcalloc() in ionic_qcq_alloc()
  net/nfc/rawsock.c: add CAP_NET_RAW check.
  hinic: fix strncpy output truncated compile warnings
  drivers/net/wan/x25_asy: Added needed_headroom and a skb->len check
  net/tls: Fix kmap usage
  ...
2020-08-13 20:03:11 -07:00
Andrii Nakryiko
4fccd2ff74 selftests/bpf: Make test_varlen work with 32-bit user-space arch
Despite bpftool generating data section memory layout that will work for
32-bit architectures on user-space side, BPF programs should be careful to not
use ambiguous types like `long`, which have different size in 32-bit and
64-bit environments. Fix that in test by using __u64 explicitly, which is
a recommended approach anyway.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200813204945.1020225-10-andriin@fb.com
2020-08-13 16:45:41 -07:00
Andrii Nakryiko
0f993845d7 tools/bpftool: Generate data section struct with conservative alignment
The comment in the code describes this in good details. Generate such a memory
layout that would work both on 32-bit and 64-bit architectures for user-space.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200813204945.1020225-9-andriin@fb.com
2020-08-13 16:45:41 -07:00
Andrii Nakryiko
5705d70583 selftests/bpf: Correct various core_reloc 64-bit assumptions
Ensure that types are memory layout- and field alignment-compatible regardless
of 32/64-bitness mix of libbpf and BPF architecture.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200813204945.1020225-8-andriin@fb.com
2020-08-13 16:45:41 -07:00
Andrii Nakryiko
4c01925f58 libbpf: Enforce 64-bitness of BTF for BPF object files
BPF object files are always targeting 64-bit BPF target architecture, so
enforce that at BTF level as well.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200813204945.1020225-7-andriin@fb.com
2020-08-13 16:45:41 -07:00
Andrii Nakryiko
eed7818adf selftests/bpf: Fix btf_dump test cases on 32-bit arches
Fix btf_dump test cases by hard-coding BPF's pointer size of 8 bytes for cases
where it's impossible to deterimne the pointer size (no long type in BTF). In
cases where it's known, validate libbpf correctly determines it as 8.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200813204945.1020225-6-andriin@fb.com
2020-08-13 16:45:41 -07:00
Andrii Nakryiko
44ad23dfbc libbpf: Handle BTF pointer sizes more carefully
With libbpf and BTF it is pretty common to have libbpf built for one
architecture, while BTF information was generated for a different architecture
(typically, but not always, BPF). In such case, the size of a pointer might
differ betweem architectures. libbpf previously was always making an
assumption that pointer size for BTF is the same as native architecture
pointer size, but that breaks for cases where libbpf is built as 32-bit
library, while BTF is for 64-bit architecture.

To solve this, add heuristic to determine pointer size by searching for `long`
or `unsigned long` integer type and using its size as a pointer size. Also,
allow to override the pointer size with a new API btf__set_pointer_size(), for
cases where application knows which pointer size should be used. User
application can check what libbpf "guessed" by looking at the result of
btf__pointer_size(). If it's not 0, then libbpf successfully determined a
pointer size, otherwise native arch pointer size will be used.

For cases where BTF is parsed from ELF file, use ELF's class (32-bit or
64-bit) to determine pointer size.

Fixes: 8a138aed4a ("bpf: btf: Add BTF support to libbpf")
Fixes: 351131b51c ("libbpf: add btf_dump API for BTF-to-C conversion")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200813204945.1020225-5-andriin@fb.com
2020-08-13 16:45:41 -07:00
Andrii Nakryiko
15728ad3e7 libbpf: Fix BTF-defined map-in-map initialization on 32-bit host arches
Libbpf built in 32-bit mode should be careful about not conflating 64-bit BPF
pointers in BPF ELF file and host architecture pointers. This patch fixes
issue of incorrect initializating of map-in-map inner map slots due to such
difference.

Fixes: 646f02ffdd ("libbpf: Add BTF-defined map-in-map support")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200813204945.1020225-4-andriin@fb.com
2020-08-13 16:45:41 -07:00
Andrii Nakryiko
9028bbcc3e selftest/bpf: Fix compilation warnings in 32-bit mode
Fix compilation warnings emitted when compiling selftests for 32-bit platform
(x86 in my case).

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200813204945.1020225-3-andriin@fb.com
2020-08-13 16:45:41 -07:00
Andrii Nakryiko
09f44b753a tools/bpftool: Fix compilation warnings in 32-bit mode
Fix few compilation warnings in bpftool when compiling in 32-bit mode.
Abstract away u64 to pointer conversion into a helper function.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200813204945.1020225-2-andriin@fb.com
2020-08-13 16:45:41 -07:00
John Fastabend
9efa9e4997 bpf, selftests: Add tests to sock_ops for loading sk
Add tests to directly accesse sock_ops sk field. Then use it to
ensure a bad pointer access will fault if something goes wrong.
We do three tests:

The first test ensures when we read sock_ops sk pointer into the
same register that we don't fault as described earlier. Here r9
is chosen as the temp register.  The xlated code is,

  36: (7b) *(u64 *)(r1 +32) = r9
  37: (61) r9 = *(u32 *)(r1 +28)
  38: (15) if r9 == 0x0 goto pc+3
  39: (79) r9 = *(u64 *)(r1 +32)
  40: (79) r1 = *(u64 *)(r1 +0)
  41: (05) goto pc+1
  42: (79) r9 = *(u64 *)(r1 +32)

The second test ensures the temp register selection does not collide
with in-use register r9. Shown here r8 is chosen because r9 is the
sock_ops pointer. The xlated code is as follows,

  46: (7b) *(u64 *)(r9 +32) = r8
  47: (61) r8 = *(u32 *)(r9 +28)
  48: (15) if r8 == 0x0 goto pc+3
  49: (79) r8 = *(u64 *)(r9 +32)
  50: (79) r9 = *(u64 *)(r9 +0)
  51: (05) goto pc+1
  52: (79) r8 = *(u64 *)(r9 +32)

And finally, ensure we didn't break the base case where dst_reg does
not equal the source register,

  56: (61) r2 = *(u32 *)(r1 +28)
  57: (15) if r2 == 0x0 goto pc+1
  58: (79) r2 = *(u64 *)(r1 +0)

Notice it takes us an extra four instructions when src reg is the
same as dst reg. One to save the reg, two to restore depending on
the branch taken and a goto to jump over the second restore.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/159718355325.4728.4163036953345999636.stgit@john-Precision-5820-Tower
2020-08-13 22:40:43 +02:00
John Fastabend
8e0c151756 bpf, selftests: Add tests for sock_ops load with r9, r8.r7 registers
Loads in sock_ops case when using high registers requires extra logic to
ensure the correct temporary value is used. We need to ensure the temp
register does not use either the src_reg or dst_reg. Lets add an asm
test to force the logic is triggered.

The xlated code is here,

  30: (7b) *(u64 *)(r9 +32) = r7
  31: (61) r7 = *(u32 *)(r9 +28)
  32: (15) if r7 == 0x0 goto pc+2
  33: (79) r7 = *(u64 *)(r9 +0)
  34: (63) *(u32 *)(r7 +916) = r8
  35: (79) r7 = *(u64 *)(r9 +32)

Notice r9 and r8 are not used for temp registers and r7 is chosen.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/159718353345.4728.8805043614257933227.stgit@john-Precision-5820-Tower
2020-08-13 22:40:43 +02:00
John Fastabend
86ed4be68f bpf, selftests: Add tests for ctx access in sock_ops with single register
To verify fix ("bpf: sock_ops ctx access may stomp registers in corner case")
we want to force compiler to generate the following code when accessing a
field with BPF_TCP_SOCK_GET_COMMON,

     r1 = *(u32 *)(r1 + 96) // r1 is skops ptr

Rather than depend on clang to do this we add the test with inline asm to
the tcpbpf test. This saves us from having to create another runner and
ensures that if we break this again test_tcpbpf will crash.

With above code we get the xlated code,

  11: (7b) *(u64 *)(r1 +32) = r9
  12: (61) r9 = *(u32 *)(r1 +28)
  13: (15) if r9 == 0x0 goto pc+4
  14: (79) r9 = *(u64 *)(r1 +32)
  15: (79) r1 = *(u64 *)(r1 +0)
  16: (61) r1 = *(u32 *)(r1 +2348)
  17: (05) goto pc+1
  18: (79) r9 = *(u64 *)(r1 +32)

We also add the normal case where src_reg != dst_reg so we can compare
code generation easily from llvm-objdump and ensure that case continues
to work correctly. The normal code is xlated to,

  20: (b7) r1 = 0
  21: (61) r1 = *(u32 *)(r3 +28)
  22: (15) if r1 == 0x0 goto pc+2
  23: (79) r1 = *(u64 *)(r3 +0)
  24: (61) r1 = *(u32 *)(r1 +2348)

Where the temp variable is not used.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/159718351457.4728.3295119261717842496.stgit@john-Precision-5820-Tower
2020-08-13 22:40:43 +02:00
Toke Høiland-Jørgensen
23ab656be2 libbpf: Prevent overriding errno when logging errors
Turns out there were a few more instances where libbpf didn't save the
errno before writing an error message, causing errno to be overridden by
the printf() return and the error disappearing if logging is enabled.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200813142905.160381-1-toke@redhat.com
2020-08-13 22:30:31 +02:00
Alexander Gordeev
2db13a9b30 perf bench numa: Use numa_node_to_cpus() to bind tasks to nodes
It is currently assumed that each node contains at most nr_cpus/nr_nodes
CPUs and nodes' CPU ranges do not overlap.

That assumption is generally incorrect as there are archs where a CPU
number does not depend on to its node number.

This update removes the described assumption by simply calling
numa_node_to_cpus() interface and using the returned mask for binding
CPUs to nodes.

Also, variable types and names made consistent in functions using
cpumask.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Balamuruhan S <bala24@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Link: http://lore.kernel.org/lkml/20200813113247.GA2014@oc3871087118.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-13 10:02:27 -03:00
Alexander Gordeev
509f68e327 perf bench numa: Fix cpumask memory leak in node_has_cpus()
Couple numa_allocate_cpumask() and numa_free_cpumask() functions

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Balamuruhan S <bala24@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Link: http://lore.kernel.org/lkml/20200813113041.GA1685@oc3871087118.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-13 10:02:00 -03:00
Daniel Díaz
fa5c893181 tools build feature: Quote CC and CXX for their arguments
When using a cross-compilation environment, such as OpenEmbedded,
the CC an CXX variables are set to something more than just a
command: there are arguments (such as --sysroot) that need to be
passed on to the compiler so that the right set of headers and
libraries are used.

For the particular case that our systems detected, CC is set to
the following:

  export CC="aarch64-linaro-linux-gcc  --sysroot=/oe/build/tmp/work/machine/perf/1.0-r9/recipe-sysroot"

Without quotes, detection is as follows:

  Auto-detecting system features:
  ...                         dwarf: [ OFF ]
  ...            dwarf_getlocations: [ OFF ]
  ...                         glibc: [ OFF ]
  ...                          gtk2: [ OFF ]
  ...                        libbfd: [ OFF ]
  ...                        libcap: [ OFF ]
  ...                        libelf: [ OFF ]
  ...                       libnuma: [ OFF ]
  ...        numa_num_possible_cpus: [ OFF ]
  ...                       libperl: [ OFF ]
  ...                     libpython: [ OFF ]
  ...                     libcrypto: [ OFF ]
  ...                     libunwind: [ OFF ]
  ...            libdw-dwarf-unwind: [ OFF ]
  ...                          zlib: [ OFF ]
  ...                          lzma: [ OFF ]
  ...                     get_cpuid: [ OFF ]
  ...                           bpf: [ OFF ]
  ...                        libaio: [ OFF ]
  ...                       libzstd: [ OFF ]
  ...        disassembler-four-args: [ OFF ]

  Makefile.config:414: *** No gnu/libc-version.h found, please install glibc-dev[el].  Stop.
  Makefile.perf:230: recipe for target 'sub-make' failed
  make[1]: *** [sub-make] Error 2
  Makefile:69: recipe for target 'all' failed
  make: *** [all] Error 2

With CC and CXX quoted, some of those features are now detected.

Fixes: e3232c2f39 ("tools build feature: Use CC and CXX from parent")
Signed-off-by: Daniel Díaz <daniel.diaz@linaro.org>
Reviewed-by: Thomas Hebb <tommyhebb@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@chromium.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lore.kernel.org/lkml/20200812221518.2869003-1-daniel.diaz@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-13 10:00:19 -03:00
Jiri Olsa
b2fe96a350 perf tools: Fix module symbol processing
The 'dso->kernel' condition is true also for kernel modules now,
and there are several places that were omited by the initial change:

  - we need to identify modules separately in dso__process_kernel_symbol
  - we need to set 'dso->kernel' for module from buildid table
  - there's no need to use 'dso->kernel || kmodule' in one condition

Committer testing:

Before:

  # perf test -v object
  <SNIP>
  Objdump command is: objdump -z -d --start-address=0xffffffff813e682f --stop-address=0xffffffff813e68af /usr/lib/debug/lib/modules/5.7.14-200.fc32.x86_64/vmlinux
  Bytes read match those read by objdump
  Reading object code for memory address: 0xffffffffc02dc257
  File is: /lib/modules/5.7.14-200.fc32.x86_64/kernel/arch/x86/crypto/crc32c-intel.ko.xz
  On file address is: 0xffffffffc02dc2e7
  dso__data_read_offset failed
  test child finished with -1
  ---- end ----
  Object code reading: FAILED!
  #

After:

  # perf test object
  26: Object code reading                                   : Ok
  # perf test object
  26: Object code reading                                   : Ok
  # perf test object
  26: Object code reading                                   : Ok
  # perf test object
  26: Object code reading                                   : Ok
  # perf test object
  26: Object code reading                                   : Ok
  #

Fixes: 02213cec64 ("perf maps: Mark module DSOs with kernel type")
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
2020-08-13 09:57:40 -03:00
Jiri Olsa
1c695c88a1 perf tools: Rename 'enum dso_kernel_type' to 'enum dso_space_type'
Rename enum dso_kernel_type to enum dso_space_type, which seems like
better fit.

Committer notes:

This is used with 'struct dso'->kernel, which once was a boolean, so
DSO_SPACE__USER is zero, !zero means some sort of kernel space, be it
the host kernel space or a guest kernel space.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-13 09:53:21 -03:00
Rob Herring
ce746d43a1 libperf: Fix man page typos
Fix various typos and inconsistent capitalization of CPU in the libperf
man pages.

Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200807193241.3904545-1-robh@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-13 09:34:26 -03:00
Michael Petlan
194cb6b50f perf test: Allow multiple probes in record+script_probe_vfs_getname.sh
Sometimes when adding a kprobe by perf, it results in multiple probe
points, such as the following:

  # ./perf probe -l
    probe:vfs_getname    (on getname_flags:73@fs/namei.c with pathname)
    probe:vfs_getname_1  (on getname_flags:73@fs/namei.c with pathname)
    probe:vfs_getname_2  (on getname_flags:73@fs/namei.c with pathname)
  # cat /sys/kernel/debug/tracing/kprobe_events
  p:probe/vfs_getname _text+5501804 pathname=+0(+0(%gpr31)):string
  p:probe/vfs_getname_1 _text+5505388 pathname=+0(+0(%gpr31)):string
  p:probe/vfs_getname_2 _text+5508396 pathname=+0(+0(%gpr31)):string

In this test, we need to record all of them and expect any of them in
the perf-script output, since it's not clear which one will be used for
the desired syscall:

  # perf stat -e probe:vfs_getname\* -- touch /tmp/nic

   Performance counter stats for 'touch /tmp/nic':

                31      probe:vfs_getname_2
                 0      probe:vfs_getname_1
                 1      probe:vfs_getname
       0.001421826 seconds time elapsed

       0.001506000 seconds user
       0.000000000 seconds sys

If the test relies only on probe:vfs_getname, it might easily miss the
relevant data.

Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
LPU-Reference: 20200722135845.29958-1-mpetlan@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-13 09:34:26 -03:00
Vincent Whitchurch
1beaef29c3 perf bench mem: Always memset source before memcpy
For memcpy, the source pages are memset to zero only when --cycles is
used.  This leads to wildly different results with or without --cycles,
since all sources pages are likely to be mapped to the same zero page
without explicit writes.

Before this fix:

$ export cmd="./perf stat -e LLC-loads -- ./perf bench \
  mem memcpy -s 1024MB -l 100 -f default"
$ $cmd

         2,935,826      LLC-loads
       3.821677452 seconds time elapsed

$ $cmd --cycles

       217,533,436      LLC-loads
       8.616725985 seconds time elapsed

After this fix:

$ $cmd

       214,459,686      LLC-loads
       8.674301124 seconds time elapsed

$ $cmd --cycles

       214,758,651      LLC-loads
       8.644480006 seconds time elapsed

Fixes: 47b5757bac ("perf bench mem: Move boilerplate memory allocation to the infrastructure")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel@axis.com
Link: http://lore.kernel.org/lkml/20200810133404.30829-1-vincent.whitchurch@axis.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-13 09:34:26 -03:00
David Ahern
d566a9c2d4 perf sched: Prefer sched_waking event when it exists
Commit fbd705a0c6 ("sched: Introduce the 'trace_sched_waking'
tracepoint") added sched_waking tracepoint which should be preferred
over sched_wakeup when analyzing scheduling delays.

Update 'perf sched record' to collect sched_waking events if it exists
and fallback to sched_wakeup if it does not. Similarly, update timehist
command to skip sched_wakeup events if the session includes sched_waking
(ie., sched_waking is preferred over sched_wakeup).

Signed-off-by: David Ahern <dsahern@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lore.kernel.org/lkml/20200807164844.44870-1-dsahern@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-13 09:34:26 -03:00
Fabian Frederick
d8bb9abe21 selftests: netfilter: kill running process only
Avoid noise like the following:
nft_flowtable.sh: line 250: kill: (4691) - No such process

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-13 04:25:16 +02:00
Fabian Frederick
dd08734d8a selftests: netfilter: add MTU arguments to flowtables
Add some documentation, default values defined in original
script and Originator/Link/Responder arguments
using getopts like in tools/power/cpupower/bench/cpufreq-bench_plot.sh

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-13 04:25:11 +02:00
Fabian Frederick
6d006a4e38 selftests: netfilter: add checktool function
avoid repeating the same test for different toolcheck

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-13 04:25:08 +02:00
Jean-Philippe Brucker
702eddc77a libbpf: Handle GCC built-in types for Arm NEON
When building Arm NEON (SIMD) code from lib/raid6/neon.uc, GCC emits
DWARF information using a base type "__Poly8_t", which is internal to
GCC and not recognized by Clang. This causes build failures when
building with Clang a vmlinux.h generated from an arm64 kernel that was
built with GCC.

	vmlinux.h:47284:9: error: unknown type name '__Poly8_t'
	typedef __Poly8_t poly8x16_t[16];
	        ^~~~~~~~~

The polyX_t types are defined as unsigned integers in the "Arm C
Language Extension" document (101028_Q220_00_en). Emit typedefs based on
standard integer types for the GCC internal types, similar to those
emitted by Clang.

Including linux/kernel.h to use ARRAY_SIZE() incidentally redefined
max(), causing a build bug due to different types, hence the seemingly
unrelated change.

Reported-by: Jakov Petrina <jakov.petrina@sartura.hr>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200812143909.3293280-1-jean-philippe@linaro.org
2020-08-12 18:11:51 -07:00
Andrii Nakryiko
8faf7fc597 tools/bpftool: Make skeleton code C++17-friendly by dropping typeof()
Seems like C++17 standard mode doesn't recognize typeof() anymore. This can
be tested by compiling test_cpp test with -std=c++17 or -std=c++1z options.
The use of typeof in skeleton generated code is unnecessary, all types are
well-known at the time of code generation, so remove all typeof()'s to make
skeleton code more future-proof when interacting with C++ compilers.

Fixes: 985ead416d ("bpftool: Add skeleton codegen command")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200812025907.1371956-1-andriin@fb.com
2020-08-12 18:09:45 -07:00
Linus Torvalds
9ad57f6dfc Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - most of the rest of MM (memcg, hugetlb, vmscan, proc, compaction,
   mempolicy, oom-kill, hugetlbfs, migration, thp, cma, util,
   memory-hotplug, cleanups, uaccess, migration, gup, pagemap),

 - various other subsystems (alpha, misc, sparse, bitmap, lib, bitops,
   checkpatch, autofs, minix, nilfs, ufs, fat, signals, kmod, coredump,
   exec, kdump, rapidio, panic, kcov, kgdb, ipc).

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (164 commits)
  mm/gup: remove task_struct pointer for all gup code
  mm: clean up the last pieces of page fault accountings
  mm/xtensa: use general page fault accounting
  mm/x86: use general page fault accounting
  mm/sparc64: use general page fault accounting
  mm/sparc32: use general page fault accounting
  mm/sh: use general page fault accounting
  mm/s390: use general page fault accounting
  mm/riscv: use general page fault accounting
  mm/powerpc: use general page fault accounting
  mm/parisc: use general page fault accounting
  mm/openrisc: use general page fault accounting
  mm/nios2: use general page fault accounting
  mm/nds32: use general page fault accounting
  mm/mips: use general page fault accounting
  mm/microblaze: use general page fault accounting
  mm/m68k: use general page fault accounting
  mm/ia64: use general page fault accounting
  mm/hexagon: use general page fault accounting
  mm/csky: use general page fault accounting
  ...
2020-08-12 11:24:12 -07:00
Tiezhu Yang
aaa3e7fb81 selftests: kmod: use variable NAME in kmod_test_0001()
Patch series "kmod/umh: a few fixes".

Tiezhu Yang had sent out a patch set with a slew of kmod selftest fixes,
and one patch which modified kmod to return 254 when a module was not
found.  This opened up pandora's box about why that was being used for and
low and behold its because when UMH_WAIT_PROC is used we call a
kernel_wait4() call but have never unwrapped the error code.  The commit
log for that fix details the rationale for the approach taken.  I'd
appreciate some review on that, in particular nfs folks as it seems a case
was never really hit before.

This patch (of 5):

Use the variable NAME instead of "\000" directly in kmod_test_0001().

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Sergei Trofimovich <slyfox@gentoo.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Sergey Kvachonok <ravenexp@gmail.com>
Cc: Tony Vroon <chainsaw@gentoo.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Link: http://lkml.kernel.org/r/20200610154923.27510-1-mcgrof@kernel.org
Link: http://lkml.kernel.org/r/20200610154923.27510-2-mcgrof@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12 10:58:01 -07:00
Ralph Campbell
b0fc0f3fca mm/migrate: add migrate-shared test for migrate_vma_*()
Add a migrate_vma_*() self test for mmap(MAP_SHARED) to verify that
!vma_anonymous() ranges won't be migrated.

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: "Bharata B Rao" <bharata@linux.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/20200710194840.7602-3-rcampbell@nvidia.com
Link: http://lkml.kernel.org/r/20200709165711.26584-3-rcampbell@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12 10:57:56 -07:00
Roman Gushchin
90631e1dea kselftests: cgroup: add perpcu memory accounting test
Add a simple test to check the percpu memory accounting.  The test creates
a cgroup tree with 1000 child cgroups and checks values of memory.current
and memory.stat::percpu.

Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tobin C. Harding <tobin@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Waiman Long <longman@redhat.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Bixuan Cui <cuibixuan@huawei.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: http://lkml.kernel.org/r/20200608230819.832349-6-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12 10:57:55 -07:00
Colin Ian King
f9f9506826 perf bench: Fix a couple of spelling mistakes in options text
There are a couple of spelling mistakes in the text. Fix these.

Signed-off-by: Colin King <colin.king@canonical.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-janitors@vger.kernel.org
Link: http://lore.kernel.org/lkml/20200812064647.200132-1-colin.king@canonical.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-12 09:10:25 -03:00
Alexander Gordeev
85372c6974 perf bench numa: Fix benchmark names
Standard benchmark names let users know the tests specifics.  For
example "2x1-bw-process" name tells that two processes one thread each
are run and the RAM bandwidth is measured.

Several benchmarks names do not correspond to their actual running
configuration. Fix that and also some whitespace and comment
inconsistencies.

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/6b6f2084f132ee8e9203dc7c32f9deb209b87a68.1597004831.git.agordeev@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-12 09:08:42 -03:00
Alexander Gordeev
72d69c2a4e perf bench numa: Fix number of processes in "2x3-convergence" test
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/d949f5f48e17fc816f3beecf8479f1b2480345e4.1597004831.git.agordeev@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-12 09:06:38 -03:00
Arnaldo Carvalho de Melo
6016e03487 tools headers UAPI: Sync kvm.h headers with the kernel sources
To pick the changes in:

  3edd68399d ("KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR support")
  1aa561b1a4 ("kvm: x86: Add "last CPU" to some KVM_EXIT information")
  23a60f8344 ("s390/kvm: diagnose 0x318 sync and reset")

That do not result in any change in tooling, as the additions are not
being used in any table generator.

This silences these perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
  diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Collin Walling <walling@linux.ibm.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mohammed Gamal <mgamal@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-12 09:02:40 -03:00
Arnaldo Carvalho de Melo
fe452fb843 tools include UAPI: Sync linux/vhost.h with the kernel sources
To get the changes in:

  25abc060d2 ("vhost-vdpa: support IOTLB batching hints")

This doesn't result in any changes in tooling, no new ioctls to be
picked up by the id->string table generators, etc.

Silencing this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/vhost.h' differs from latest version at 'include/uapi/linux/vhost.h'
  diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-12 08:57:07 -03:00
Arnaldo Carvalho de Melo
23db762be2 tools headers kvm s390: Sync headers with the kernel sources
To pick the changes in:

  23a60f8344 ("s390/kvm: diagnose 0x318 sync and reset")

None of them trigger any changes in tooling, this time this is just to silence
these perf build warnings:

  Warning: Kernel ABI header at 'tools/arch/s390/include/uapi/asm/kvm.h' differs from latest version at 'arch/s390/include/uapi/asm/kvm.h'
  diff -u tools/arch/s390/include/uapi/asm/kvm.h arch/s390/include/uapi/asm/kvm.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Collin Walling <walling@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-12 08:52:32 -03:00
Arnaldo Carvalho de Melo
f3cf7fa963 perf trace beauty: Use the autogenerated protocol family table
That helps us not to lose new protocol families when they are
introduced, replacing that hardcoded, dated family->string table.

To recap what this allows us to do:

  # perf trace -e syscalls:sys_enter_socket/max-stack=10/ --filter=family==INET --max-events=1
     0.000 fetchmail/41097 syscalls:sys_enter_socket(family: INET, type: DGRAM|CLOEXEC|NONBLOCK, protocol: IP)
                                       __GI___socket (inlined)
                                       reopen (/usr/lib64/libresolv-2.31.so)
                                       send_dg (/usr/lib64/libresolv-2.31.so)
                                       __res_context_send (/usr/lib64/libresolv-2.31.so)
                                       __GI___res_context_query (inlined)
                                       __GI___res_context_search (inlined)
                                       _nss_dns_gethostbyname4_r (/usr/lib64/libnss_dns-2.31.so)
                                       gaih_inet.constprop.0 (/usr/lib64/libc-2.31.so)
                                       __GI_getaddrinfo (inlined)
                                       [0x15cb2] (/usr/bin/fetchmail)
  #

More work is still needed to allow for the more natura strace-like
syscall name usage instead of the trace event name:

  # perf trace -e socket/max-stack=10,family==INET/ --max-events=1

I.e. to allow for modifiers to follow the syscall name and for logical
expressions to be accepted as filters to use with that syscall, be it as
trace event filters or BPF based ones.

Using -v we can see how the trace event filter is built:

  # perf trace -v -e syscalls:sys_enter_socket/call-graph=dwarf/ --filter=family==INET --max-events=2
  <SNIP>
  New filter for syscalls:sys_enter_socket: (family==0x2) && (common_pid != 41384 && common_pid != 2836)
  <SNIP>

  $ tools/perf/trace/beauty/socket.sh | grep -w 2
	[2] = "INET",
  $

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-12 08:43:51 -03:00
Arnaldo Carvalho de Melo
58277f502f perf trace beauty: Add script to autogenerate socket families table
To use with 'perf trace', to convert the protocol families to strings,
e.g:

  $ tools/perf/trace/beauty/socket.sh
  static const char *socket_families[] = {
  	[0] = "UNSPEC",
  	[1] = "LOCAL",
  	[2] = "INET",
  	[3] = "AX25",
  	[4] = "IPX",
  	[5] = "APPLETALK",
  	[6] = "NETROM",
  	[7] = "BRIDGE",
  	[8] = "ATMPVC",
  	[9] = "X25",
  	[10] = "INET6",
  	[11] = "ROSE",
  	[12] = "DECnet",
  	[13] = "NETBEUI",
  	[14] = "SECURITY",
  	[15] = "KEY",
  	[16] = "NETLINK",
  	[17] = "PACKET",
  	[18] = "ASH",
  	[19] = "ECONET",
  	[20] = "ATMSVC",
  	[21] = "RDS",
  	[22] = "SNA",
  	[23] = "IRDA",
  	[24] = "PPPOX",
  	[25] = "WANPIPE",
  	[26] = "LLC",
  	[27] = "IB",
  	[28] = "MPLS",
  	[29] = "CAN",
  	[30] = "TIPC",
  	[31] = "BLUETOOTH",
  	[32] = "IUCV",
  	[33] = "RXRPC",
  	[34] = "ISDN",
  	[35] = "PHONET",
  	[36] = "IEEE802154",
  	[37] = "CAIF",
  	[38] = "ALG",
  	[39] = "NFC",
  	[40] = "VSOCK",
  	[41] = "KCM",
  	[42] = "QIPCRTR",
  	[43] = "SMC",
  	[44] = "XDP",
  };
  $

This uses a copy of include/linux/socket.h that is kept in a directory
to be used just for these table generation scripts and for checking if
the kernel has a new file that maybe gets something new for these
tables.

This allows us to:

- Avoid accessing files outside tools/, in the kernel sources, that may
  be changed in unexpected ways and thus break these scripts.

- Notice when those files change and thus check if the changes don't
  break those scripts, update them to automatically get the new
  definitions, a new socket family, for instance.

- Not add then to the tools/include/ where it may end up used while
  building the tools and end up requiring dragging yet more stuff from
  the kernel or plain break the build in some of the myriad environments
  where perf may be built.

This will replace the previous static array in tools/perf/ that was
dated and was already missing the AF_KCM, AF_QIPCRTR, AF_SMC and AF_XDP
families.

The next cset will wire this up to the perf build process.

At some point this must be made into a library to be used in places such
as libtraceevent, bpftrace, etc.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-12 08:38:36 -03:00
Linus Torvalds
57b0779392 Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:

 - IRQ bypass support for vdpa and IFC

 - MLX5 vdpa driver

 - Endianness fixes for virtio drivers

 - Misc other fixes

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (71 commits)
  vdpa/mlx5: fix up endian-ness for mtu
  vdpa: Fix pointer math bug in vdpasim_get_config()
  vdpa/mlx5: Fix pointer math in mlx5_vdpa_get_config()
  vdpa/mlx5: fix memory allocation failure checks
  vdpa/mlx5: Fix uninitialised variable in core/mr.c
  vdpa_sim: init iommu lock
  virtio_config: fix up warnings on parisc
  vdpa/mlx5: Add VDPA driver for supported mlx5 devices
  vdpa/mlx5: Add shared memory registration code
  vdpa/mlx5: Add support library for mlx5 VDPA implementation
  vdpa/mlx5: Add hardware descriptive header file
  vdpa: Modify get_vq_state() to return error code
  net/vdpa: Use struct for set/get vq state
  vdpa: remove hard coded virtq num
  vdpasim: support batch updating
  vhost-vdpa: support IOTLB batching hints
  vhost-vdpa: support get/set backend features
  vhost: generialize backend features setting/getting
  vhost-vdpa: refine ioctl pre-processing
  vDPA: dont change vq irq after DRIVER_OK
  ...
2020-08-11 14:34:17 -07:00
Linus Torvalds
4bf5e36118 Merge tag 'libnvdimm-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updayes from Vishal Verma:
 "You'd normally receive this pull request from Dan Williams, but he's
  busy watching a newborn (Congrats Dan!), so I'm watching libnvdimm
  this cycle.

  This adds a new feature in libnvdimm - 'Runtime Firmware Activation',
  and a few small cleanups and fixes in libnvdimm and DAX. I'd
  originally intended to make separate topic-based pull requests - one
  for libnvdimm, and one for DAX, but some of the DAX material fell out
  since it wasn't quite ready.

  Summary:

   - add 'Runtime Firmware Activation' support for NVDIMMs that
     advertise the relevant capability

   - misc libnvdimm and DAX cleanups"

* tag 'libnvdimm-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  libnvdimm/security: ensure sysfs poll thread woke up and fetch updated attr
  libnvdimm/security: the 'security' attr never show 'overwrite' state
  libnvdimm/security: fix a typo
  ACPI: NFIT: Fix ARS zero-sized allocation
  dax: Fix incorrect argument passed to xas_set_err()
  ACPI: NFIT: Add runtime firmware activate support
  PM, libnvdimm: Add runtime firmware activation support
  libnvdimm: Convert to DEVICE_ATTR_ADMIN_RO()
  drivers/dax: Expand lock scope to cover the use of addresses
  fs/dax: Remove unused size parameter
  dax: print error message by pr_info() in __generic_fsdax_supported()
  driver-core: Introduce DEVICE_ATTR_ADMIN_{RO,RW}
  tools/testing/nvdimm: Emulate firmware activation commands
  tools/testing/nvdimm: Prepare nfit_ctl_test() for ND_CMD_CALL emulation
  tools/testing/nvdimm: Add command debug messages
  tools/testing/nvdimm: Cleanup dimm index passing
  ACPI: NFIT: Define runtime firmware activation commands
  ACPI: NFIT: Move bus_dsm_mask out of generic nvdimm_bus_descriptor
  libnvdimm: Validate command family indices
2020-08-11 10:59:19 -07:00
Stanislav Fomichev
da7bdfdd23 selftests/bpf: Fix v4_to_v6 in sk_lookup
I'm getting some garbage in bytes 8 and 9 when doing conversion
from sockaddr_in to sockaddr_in6 (leftover from AF_INET?). Let's
explicitly clear the higher bytes.

Fixes: 0ab5539f85 ("selftests/bpf: Tests for BPF_SK_LOOKUP attach point")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20200807223846.4190917-1-sdf@google.com
2020-08-11 15:36:51 +02:00
Jianlin Lv
0390c429db selftests/bpf: Fix segmentation fault in test_progs
test_progs reports the segmentation fault as below:

  $ sudo ./test_progs -t mmap --verbose
  test_mmap:PASS:skel_open_and_load 0 nsec
  [...]
  test_mmap:PASS:adv_mmap1 0 nsec
  test_mmap:PASS:adv_mmap2 0 nsec
  test_mmap:PASS:adv_mmap3 0 nsec
  test_mmap:PASS:adv_mmap4 0 nsec
  Segmentation fault

This issue was triggered because mmap() and munmap() used inconsistent
length parameters; mmap() creates a new mapping of 3 * page_size, but the
length parameter set in the subsequent re-map and munmap() functions is
4 * page_size; this leads to the destruction of the process space.

To fix this issue, first create 4 pages of anonymous mapping, then do all
the mmap() with MAP_FIXED.

Another issue is that when unmap the second page fails, the length
parameter to delete tmp1 mappings should be 4 * page_size.

Signed-off-by: Jianlin Lv <Jianlin.Lv@arm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200810153940.125508-1-Jianlin.Lv@arm.com
2020-08-11 15:36:45 +02:00
Yonghong Song
63fe3fd393 libbpf: Do not use __builtin_offsetof for offsetof
Commit 5fbc220862 ("tools/libpf: Add offsetof/container_of macro
in bpf_helpers.h") added a macro offsetof() to get the offset of a
structure member:

   #define offsetof(TYPE, MEMBER)  ((size_t)&((TYPE *)0)->MEMBER)

In certain use cases, size_t type may not be available so
Commit da7a35062b ("libbpf bpf_helpers: Use __builtin_offsetof
for offsetof") changed to use __builtin_offsetof which removed
the dependency on type size_t, which I suggested.

But using __builtin_offsetof will prevent CO-RE relocation
generation in case that, e.g., TYPE is annotated with "preserve_access_info"
where a relocation is desirable in case the member offset is changed
in a different kernel version. So this patch reverted back to
the original macro but using "unsigned long" instead of "site_t".

Fixes: da7a35062b ("libbpf bpf_helpers: Use __builtin_offsetof for offsetof")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/bpf/20200811030852.3396929-1-yhs@fb.com
2020-08-11 15:11:07 +02:00
Linus Torvalds
00e4db5125 Merge tag 'perf-tools-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools updates from Arnaldo Carvalho de Melo:
 "New features:

   - Introduce controlling how 'perf stat' and 'perf record' works via a
     control file descriptor, allowing starting with events configured
     but disabled until commands are received via the control file
     descriptor. This allows, for instance for tools such as Intel VTune
     to make further use of perf as its Linux platform driver.

   - Improve 'perf record' to to register in a perf.data file header the
     clockid used to help later correlate things like syslog files and
     perf events recorded.

   - Add basic syscall and find_next_bit benchmarks to 'perf bench'.

   - Allow using computed metrics in calculating other metrics. For
     instance:

	  {
	    .metric_expr    = "l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit",
	    .metric_name    = "DCache_L2_All_Hits",
	  },
	  {
	    .metric_expr    = "max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss",
	    .metric_name    = "DCache_L2_All_Miss",
	  },
	  {
	     .metric_expr    = "dcache_l2_all_hits + dcache_l2_all_miss",
	     .metric_name    = "DCache_L2_All",
	  }

   - Add suport for 'd_ratio', '>' and '<' operators to the expression
     resolver used in calculating metrics in 'perf stat'.

  Support for new kernel features:

   - Support TEXT_POKE and KSYMBOL_TYPE_OOL perf metadata events to cope
     with things like ftrace, trampolines, i.e. changes in the kernel
     text that gets in the way of properly decoding Intel PT hardware
     traces, for instance.

  Intel PT:

   - Add various knobs to reduce the volume of Intel PT traces by
     reducing the level of details such as decoding just some types of
     packets (e.g., FUP/TIP, PSB+), also filtering by time range.

   - Add new itrace options (log flags to the 'd' option, error flags to
     the 'e' one, etc), controlling how Intel PT is transformed into
     perf events, document some missing options (e.g., how to synthesize
     callchains).

  BPF:

   - Properly report BPF errors when parsing events.

   - Do not setup side-band events if LIBBPF is not linked, fixing a
     segfault.

  Libraries:

   - Improvements to the libtraceevent plugin mechanism.

   - Improve libtracevent support for KVM trace events SVM exit reasons.

   - Add a libtracevent plugins for decoding syscalls/sys_enter_futex
     and for tlb_flush.

   - Ensure sample_period is set libpfm4 events in 'perf test'.

   - Fixup libperf namespacing, to make sure what is in libperf has the
     perf_ namespace while what is now only in tools/perf/ doesn't use
     that prefix.

  Arch specific:

   - Improve the testing of vendor events and metrics in 'perf test'.

   - Allow no ARM CoreSight hardware tracer sink to be specified on
     command line.

   - Fix arm_spe_x recording when mixed with other perf events.

   - Add s390 idle functions 'psw_idle' and 'psw_idle_exit' to list of
     idle symbols.

   - List kernel supplied event aliases for arm64 in 'perf list'.

   - Add support for extended register capability in PowerPC 9 and 10.

   - Added nest IMC power9 metric events.

  Miscellaneous:

   - No need to setup sample_regs_intr/sample_regs_user for dummy
     events.

   - Update various copies of kernel headers, some causing perf to
     handle new syscalls, MSRs, etc.

   - Improve usage of flex and yacc, enabling warnings and addressing
     the fallout.

   - Add missing '--output' option to 'perf kmem' so that it can pass it
     along to 'perf record'.

   - 'perf probe' fixes related to adding multiple probes on the same
     address for the same event.

   - Make 'perf probe' warn if the target function is a GNU indirect
     function.

   - Remove //anon mmap events from 'perf inject jit' to fix supporting
     both using ELF files for generated functions and the perf-PID.map
     approaches"

* tag 'perf-tools-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (144 commits)
  perf record: Skip side-band event setup if HAVE_LIBBPF_SUPPORT is not set
  perf tools powerpc: Add support for extended regs in power10
  perf tools powerpc: Add support for extended register capability
  tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
  tools arch x86: Sync asm/cpufeatures.h with the kernel sources
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  tools headers UAPI: update linux/in.h copy
  tools headers API: Update close_range affected files
  perf script: Add 'tod' field to display time of day
  perf script: Change the 'enum perf_output_field' enumerators to be 64 bits
  perf data: Add support to store time of day in CTF data conversion
  perf tools: Move clockid_res_ns under clock struct
  perf header: Store clock references for -k/--clockid option
  perf tools: Add clockid_name function
  perf clockid: Move parse_clockid() to new clockid object
  tools lib traceevent: Handle possible strdup() error in tep_add_plugin_path() API
  libtraceevent: Fixed description of tep_add_plugin_path() API
  libtraceevent: Fixed type in PRINT_FMT_STING
  libtraceevent: Fixed broken indentation in parse_ip4_print_args()
  libtraceevent: Improve error handling of tep_plugin_add_option() API
  ...
2020-08-10 19:21:38 -07:00