Commit Graph

26757 Commits

Author SHA1 Message Date
James Clark
1ac9e0b573 perf cs-etm: Set time on synthesised samples to preserve ordering
The following attribute is set when synthesising samples in
timed decoding mode:

    attr.sample_type |= PERF_SAMPLE_TIME;

This results in new samples that appear to have timestamps but
because we don't assign any timestamps to the samples, when the
resulting inject file is opened again, the synthesised samples
will be on the wrong side of the MMAP or COMM events.

For example, this results in the samples being associated with
the perf binary, rather than the target of the record:

    perf record -e cs_etm/@tmc_etr0/u top
    perf inject -i perf.data -o perf.inject --itrace=i100il
    perf report -i perf.inject

Where 'Command' == perf should show as 'top':

    # Overhead  Command  Source Shared Object  Source Symbol           Target Symbol           Basic Block Cycles
    # ........  .......  ....................  ......................  ......................  ..................
    #
        31.08%  perf     [unknown]             [.] 0x000000000040c3f8  [.] 0x000000000040c3e8  -

If the perf.data file is opened directly with perf, without the
inject step, then this already works correctly because the
events are synthesised after the COMM and MMAP events and
no second sorting happens. Re-sorting only happens when opening
the perf.inject file for the second time so timestamps are
needed.

Using the timestamp from the AUX record mirrors the current
behaviour when opening directly with perf, because the events
are generated on the call to cs_etm__process_queues().

The ETM trace could optionally contain time stamps, but there is
no way to correlate this with the kernel time. So, the best available
time value is that of the AUX_RECORD header. This patch uses
the timestamp from the header for all the samples. The ordering of the
samples are implicit in the trace and thus is fine with respect to
relative ordering.

Reviewed-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Co-developed-by: Al Grant <al.grant@arm.com>
Signed-off-by: Al Grant <al.grant@arm.com>
Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Suzuki K Poulos <suzuki.poulose@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Branislav Rankov <branislav.rankov@arm.com>
Cc: Denis Nikitin <denik@chromium.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20210510143248.27423-3-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-12 15:47:09 -03:00
James Clark
aadd6ba409 perf cs-etm: Refactor timestamp variable names
Remove ambiguity in variable names relating to timestamps.

A later commit will save the sample kernel timestamp in one of the etm
structs, so name all elements appropriately to avoid confusion.

This is also removes some ambiguity arising from the fact that the
--timestamp argument to perf record refers to sample kernel timestamps,
and the /timestamp/ event modifier refers to CS timestamps, so the term
is overloaded.

Signed-off-by: James Clark <james.clark@arm.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Branislav Rankov <branislav.rankov@arm.com>
Cc: Denis Nikitin <denik@chromium.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20210510143248.27423-2-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-12 15:47:04 -03:00
André Almeida
f4addd54b1 selftests: futex: Expand timeout test
Improve futex timeout testing by checking all the operations that
supports timeout and their available modes.

Signed-off-by: André Almeida <andrealmeid@collabora.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210427135328.11013-3-andrealmeid@collabora.com
2021-05-12 20:44:59 +02:00
André Almeida
c7d84e7ff5 selftests: futex: Correctly include headers dirs
When building selftests, the build system will install uapi linux
headers at usr/include in kernel source's root directory. When building
with a different output folder, the headers will be installed at
kselftests/usr/include.

Add both paths so we can build the tests using up-to-date headers.

Currently, this is uncommon to happen since it's rare to find a
build system with an outdated futex header, but it happens
when testing new futex operations.

Signed-off-by: André Almeida <andrealmeid@collabora.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210427135328.11013-2-andrealmeid@collabora.com
2021-05-12 20:44:58 +02:00
Lei Zhao
046b243a6a perf x86 kvm-stat: Support to analyze kvm MSR
usage:
    - kvm stat
      run a command and gather performance counter statistics

    - show the result:
      perf kvm stat report --event=msr

See the msr events:

Analyze events for all VMs, all VCPUs:

MSR Access Samples  Samples% Time%  Min Time Max Time  Avg time

  0x6e0:W   67007  98.17%   98.31%  0.59us   10.69us  0.90us ( +-  0.10% )
  0x830:W    1186   1.74%    1.60%  0.53us  108.34us  0.82us ( +- 11.02% )
   0x3b:R      66   0.10%    0.09%  0.56us    1.26us  0.80us ( +-  3.24% )

Total Samples:68259, Total events handled time:61150.95us.

Signed-off-by: Lei Zhao <zhaolei27@baidu.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/1618470001-7239-1-git-send-email-lirongqing@baidu.com
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-12 12:43:12 -03:00
Namhyung Kim
07b747f99a perf stat: Use aggregated counts directly
The ps->res_stats is for repeated runs, so the interval code should
not touch it.  Actually the aggregated counts are available in the
counter->counts->aggr, so we can (and should) use it directly IMHO.

No functional change intended.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210423023833.1430520-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-12 12:43:11 -03:00
Adrian Hunter
66286ed3e8 perf record: Set timestamp boundary for AUX area events
AUX area data is not processed by 'perf record' and consequently the
 --timestamp-boundary option may result in no values for "time of first
sample" and "time of last sample". However there are non-sample events
that can be used instead, namely 'itrace_start' and 'aux'.
'itrace_start' is issued before tracing starts, and 'aux' is issued
every time data is ready.

Implement tool callbacks for those two for 'perf record', to update the
timestamp boundary.

Example:

 $ perf record -e intel_pt//u --timestamp-boundary uname
 Linux
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.022 MB perf.data ]
 $ perf script --header-only | grep "time of"
 # time of first sample : 4574.835541
 # time of last sample : 4574.835907
 $ perf script --itrace=be -F-ip | head -1
           uname 13752 [001]  4574.835589:          1 branches:uH:
 $ perf script --itrace=be -F-ip | tail -1
           uname 13752 [001]  4574.835867:          1 branches:uH:
 $

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20210503064222.5319-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-12 12:43:11 -03:00
Adrian Hunter
e3ff42bdeb perf intel-pt: Parse VM Time Correlation options and set up decoding
Add parsing and validation of VM Time Correlation options, and pass
parameters to the decoder. Also update the Intel PT documentation
accordingly.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20210430070309.17624-13-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-12 12:43:11 -03:00
Adrian Hunter
fa8f949d16 perf intel-pt: Add VM Time Correlation to decoder
VM Time Correlation means determining if each TSC packet belongs to a VM
Guest or the Host. When the trace is "in context" that is indicated by
the NR flag in the PIP packet. However, when tracing kernel-only,
userspace only, or using address filters, the trace can be "out of context"
in which case timing packets are produced but not PIP packets.

Nevertheless, it is very unlikely the VM Guest timestamps will be in
the same range as the Host timestamps. Host time ranges are established
by a starting side-band event timestamp, and subsequently by the buffer
timestamp, written when the buffer is copied to the perf.data file.

This patch supports updating the VM Guest timestamp packets, assuming an
unchanging (during perf record) VMX TSC Offset and no VMX TSC scaling.

Furthermore, it is possible to determine what the VMX TSC Offset is,
although not necessarily at the start. The dry-run option lets that
information be determined so that the user can pass it to a subsequent
run. For more detail, refer to the example in the Intel PT documentation
in a subsequent patch.

VM Time Correlation is also performed on the TSC value in PEBs-via-PT
records.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20210430070309.17624-12-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-12 12:43:11 -03:00
Adrian Hunter
31c7e27dae perf intel-pt: Better 7-byte timestamp wraparound logic
A timestamp should not go backwards. If it does it is assumed that the
 7-byte TSC packet value has wrapped. Improve that logic so that it will
not allow the timestamp to go past the buffer timestamp (which is recorded
when the buffer is copied out)

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20210430070309.17624-11-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-12 12:43:11 -03:00
Adrian Hunter
5ac35d778a perf intel-pt: Pass the first timestamp to the decoder
VM Time Correlation will use time ranges to determine whether a TSC packet
belongs to the Host or Guest. To start, the first non-zero timestamp is
needed. Pass that to the decoder.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20210430070309.17624-10-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-12 12:43:11 -03:00
Adrian Hunter
0fc9d33894 perf intel-pt: Add a tree for VMCS information
Even when VMX TSC Offset is not changing (during perf record), different
virtual machines can have different TSC Offsets. There is a Virtual Machine
Control Structure (VMCS) for each virtual CPU, the address of which is
reported to Intel PT in the VMCS packet. We do not know which VMCS belongs
to which virtual machine, so use a tree to keep track of VMCS information.
Then the decoder will be able to use the current VMCS value to look up the
current TSC Offset.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20210430070309.17624-9-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-12 12:43:11 -03:00
Adrian Hunter
335358cc30 perf intel-pt: Let overlap detection handle VM timestamps
Intel PT timestamps are affected by virtualization. While TSC packets can
still be considered to be unique, the TSC values need not be in order any
more. Adjust the algorithm accordingly.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20210430070309.17624-8-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-12 12:43:11 -03:00
Adrian Hunter
6aa3afc9c8 perf auxtrace: Allow buffers to be mapped read / write
To support in-place update, allow buffers to be mapped read / write.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20210430070309.17624-7-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-12 12:43:11 -03:00
Adrian Hunter
83d7f5f1ad perf inject: Add --vm-time-correlation option
Intel PT timestamps are affected by virtualization. Add a new option
that will allow the Intel PT decoder to correlate the timestamps and
translate the virtual machine timestamps to host timestamps.

The advantages of making this a separate step, rather than a part of
normal decoding are that it is simpler to implement, and it needs to
be done only once.

This patch adds only the option. Later patches add Intel PT support.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20210430070309.17624-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-12 12:43:10 -03:00
Adrian Hunter
2a525f6a55 perf inject: Add facility to do in place update
When there is a need to modify only timestamps, it is much simpler and
quicker to do it to the existing file rather than re-write all the
contents.

In preparation for that, add the ability to modify the input file in place.
In practice that just means making the file descriptor and mmaps writable.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20210430070309.17624-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-12 12:43:10 -03:00
Adrian Hunter
e9d6473963 perf intel-pt: Support Z itrace option for timeless decoding
Correlating virtual machine TSC packets is not supported at present, so
instead support the Z itrace option.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20210430070309.17624-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-12 12:43:10 -03:00
Adrian Hunter
856ecd6ab4 perf intel-pt: Move synth_opts initialization earlier
Move synth_opts initialization earlier, so it can be used earlier.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20210430070309.17624-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-12 12:43:10 -03:00
Adrian Hunter
18f4949427 perf auxtrace: Add Z itrace option for timeless decoding
Issues correlating timestamps can be avoided with timeless decoding. Add
an option for that, so that timeless decoding can be used even when
timestamps are present.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20210430070309.17624-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-12 12:43:10 -03:00
Peter Zijlstra
e2d9494bef objtool: Provide stats for jump_labels
Add objtool --stats to count the jump_label sites it encounters.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210506194158.153101906@infradead.org
2021-05-12 14:54:56 +02:00
Peter Zijlstra
6d37b83c5d objtool: Rewrite jump_label instructions
When a jump_entry::key has bit1 set, rewrite the instruction to be a
NOP. This allows the compiler/assembler to emit JMP (and thus decide
on which encoding to use).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210506194158.091028792@infradead.org
2021-05-12 14:54:56 +02:00
Peter Zijlstra
cbf82a3dc2 objtool: Decode jump_entry::key addend
Teach objtool about the the low bits in the struct static_key pointer.

That is, the low two bits of @key in:

  struct jump_entry {
	s32 code;
	s32 target;
	long key;
  }

as found in the __jump_table section. Since @key has a relocation to
the variable (to be resolved by the linker), the low two bits will be
reflected in the relocation's addend.

As such, find the reloc and store the addend, such that we can access
these bits.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210506194158.028024143@infradead.org
2021-05-12 14:54:55 +02:00
Peter Zijlstra
25cf0d8aa2 objtool: Rewrite hashtable sizing
Currently objtool has 5 hashtables and sizes them 16 or 20 bits
depending on the --vmlinux argument.

However, a single side doesn't really work well for the 5 tables,
which among them, cover 3 different uses. Also, while vmlinux is
larger, there is still a very wide difference between a defconfig and
allyesconfig build, which again isn't optimally covered by a single
size.

Another aspect is the cost of elf_hash_init(), which for large tables
dominates the runtime for small input files. It turns out that all it
does it assign NULL, something that is required when using malloc().
However, when we allocate memory using mmap(), we're guaranteed to get
zero filled pages.

Therefore, rewrite the whole thing to:

 1) use more dynamic sized tables, depending on the input file,
 2) avoid the need for elf_hash_init() entirely by using mmap().

This speeds up a regular kernel build (100s to 98s for
x86_64-defconfig), and potentially dramatically speeds up vmlinux
processing.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210506194157.452881700@infradead.org
2021-05-12 14:54:50 +02:00
Chris Hyser
9f26990074 kselftest: Add test for core sched prctl interface
Provides a selftest and examples of using the interface.

[peterz: updated to not use sched_debug]
Signed-off-by: Chris Hyser <chris.hyser@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Don Hiatt <dhiatt@digitalocean.com>
Tested-by: Hongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123309.100860030@infradead.org
2021-05-12 11:43:32 +02:00
Chris Hyser
7ac592aa35 sched: prctl() core-scheduling interface
This patch provides support for setting and copying core scheduling
'task cookies' between threads (PID), processes (TGID), and process
groups (PGID).

The value of core scheduling isn't that tasks don't share a core,
'nosmt' can do that. The value lies in exploiting all the sharing
opportunities that exist to recover possible lost performance and that
requires a degree of flexibility in the API.

From a security perspective (and there are others), the thread,
process and process group distinction is an existent hierarchal
categorization of tasks that reflects many of the security concerns
about 'data sharing'. For example, protecting against cache-snooping
by a thread that can just read the memory directly isn't all that
useful.

With this in mind, subcommands to CREATE/SHARE (TO/FROM) provide a
mechanism to create and share cookies. CREATE/SHARE_TO specify a
target pid with enum pidtype used to specify the scope of the targeted
tasks. For example, PIDTYPE_TGID will share the cookie with the
process and all of it's threads as typically desired in a security
scenario.

API:

  prctl(PR_SCHED_CORE, PR_SCHED_CORE_GET, tgtpid, pidtype, &cookie)
  prctl(PR_SCHED_CORE, PR_SCHED_CORE_CREATE, tgtpid, pidtype, NULL)
  prctl(PR_SCHED_CORE, PR_SCHED_CORE_SHARE_TO, tgtpid, pidtype, NULL)
  prctl(PR_SCHED_CORE, PR_SCHED_CORE_SHARE_FROM, srcpid, pidtype, NULL)

where 'tgtpid/srcpid == 0' implies the current process and pidtype is
kernel enum pid_type {PIDTYPE_PID, PIDTYPE_TGID, PIDTYPE_PGID, ...}.

For return values, EINVAL, ENOMEM are what they say. ESRCH means the
tgtpid/srcpid was not found. EPERM indicates lack of PTRACE permission
access to tgtpid/srcpid. ENODEV indicates your machines lacks SMT.

[peterz: complete rewrite]
Signed-off-by: Chris Hyser <chris.hyser@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Don Hiatt <dhiatt@digitalocean.com>
Tested-by: Hongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123309.039845339@infradead.org
2021-05-12 11:43:31 +02:00
David S. Miller
df6f823703 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2021-05-11

The following pull-request contains BPF updates for your *net* tree.

We've added 13 non-merge commits during the last 8 day(s) which contain
a total of 21 files changed, 817 insertions(+), 382 deletions(-).

The main changes are:

1) Fix multiple ringbuf bugs in particular to prevent writable mmap of
   read-only pages, from Andrii Nakryiko & Thadeu Lima de Souza Cascardo.

2) Fix verifier alu32 known-const subregister bound tracking for bitwise
   operations and/or/xor, from Daniel Borkmann.

3) Reject trampoline attachment for functions with variable arguments,
   and also add a deny list of other forbidden functions, from Jiri Olsa.

4) Fix nested bpf_bprintf_prepare() calls used by various helpers by
   switching to per-CPU buffers, from Florent Revest.

5) Fix kernel compilation with BTF debug info on ppc64 due to pahole
   missing TCP-CC functions like cubictcp_init, from Martin KaFai Lau.

6) Add a kconfig entry to provide an option to disallow unprivileged
   BPF by default, from Daniel Borkmann.

7) Fix libbpf compilation for older libelf when GELF_ST_VISIBILITY()
   macro is not available, from Arnaldo Carvalho de Melo.

8) Migrate test_tc_redirect to test_progs framework as prep work
   for upcoming skb_change_head() fix & selftest, from Jussi Maki.

9) Fix a libbpf segfault in add_dummy_ksym_var() if BTF is not
   present, from Ian Rogers.

10) Fix tx_only micro-benchmark in xdpsock BPF sample with proper frame
    size, from Magnus Karlsson.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-11 16:05:56 -07:00
Andrii Nakryiko
e5670fa029 libbpf: Treat STV_INTERNAL same as STV_HIDDEN for functions
Do the same global -> static BTF update for global functions with STV_INTERNAL
visibility to turn on static BPF verification mode.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210507054119.270888-7-andrii@kernel.org
2021-05-11 15:07:17 -07:00
Andrii Nakryiko
247b8634e6 libbpf: Fix ELF symbol visibility update logic
Fix silly bug in updating ELF symbol's visibility.

Fixes: a46349227c ("libbpf: Add linker extern resolution support for functions and global variables")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210507054119.270888-6-andrii@kernel.org
2021-05-11 15:07:17 -07:00
Andrii Nakryiko
31332ccb75 bpftool: Stop emitting static variables in BPF skeleton
As discussed in [0], stop emitting static variables in BPF skeletons to avoid
issues with name-conflicting static variables across multiple
statically-linked BPF object files.

Users using static variables to pass data between BPF programs and user-space
should do a trivial one-time switch according to the following simple rules:
  - read-only `static volatile const` variables should be converted to
    `volatile const`;
  - read/write `static volatile` variables should just drop `static volatile`
    modifiers to become global variables/symbols. To better handle older Clang
    versions, such newly converted global variables should be explicitly
    initialized with a specific value or `= 0`/`= {}`, whichever is
    appropriate.

  [0] https://lore.kernel.org/bpf/CAEf4BzZo7_r-hsNvJt3w3kyrmmBJj7ghGY8+k4nvKF0KLjma=w@mail.gmail.com/T/#m664d4b0d6b31ac8b2669360e0fc2d6962e9f5ec1

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210507054119.270888-5-andrii@kernel.org
2021-05-11 15:07:17 -07:00
Andrii Nakryiko
256eab48e7 selftests/bpf: Stop using static variables for passing data to/from user-space
In preparation of skipping emitting static variables in BPF skeletons, switch
all current selftests uses of static variables to pass data between BPF and
user-space to use global variables.

All non-read-only `static volatile` variables become just plain global
variables by dropping `static volatile` part.

Read-only `static volatile const` variables, though, still require `volatile`
modifier, otherwise compiler will ignore whatever values are set from
user-space.

Few static linker tests are using name-conflicting static variables to
validate that static linker still properly handles static variables and
doesn't trip up on name conflicts.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210507054119.270888-4-andrii@kernel.org
2021-05-11 15:07:17 -07:00
Andrii Nakryiko
fdbf5ddeb8 libbpf: Add per-file linker opts
For better future extensibility add per-file linker options. Currently
the set of available options is empty. This changes bpf_linker__add_file()
API, but it's not a breaking change as bpf_linker APIs hasn't been released
yet.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210507054119.270888-3-andrii@kernel.org
2021-05-11 15:07:17 -07:00
Andrii Nakryiko
37f05601ea bpftool: Strip const/volatile/restrict modifiers from .bss and .data vars
Similarly to .rodata, strip any const/volatile/restrict modifiers when
generating BPF skeleton. They are not helpful and actually just get in the way.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210507054119.270888-2-andrii@kernel.org
2021-05-11 15:07:17 -07:00
Jussi Maki
096eccdef0 selftests/bpf: Rewrite test_tc_redirect.sh as prog_tests/tc_redirect.c
As discussed in [0], this ports test_tc_redirect.sh to the test_progs
framework and removes the old test.

This makes it more in line with rest of the tests and makes it possible
to run this test case with vmtest.sh and under the bpf CI.

The upcoming skb_change_head() helper fix in [0] is depending on it and
extending the test case to redirect a packet from L3 device to veth.

  [0] https://lore.kernel.org/bpf/20210427135550.807355-1-joamaki@gmail.com

Signed-off-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210505085925.783985-1-joamaki@gmail.com
2021-05-11 23:15:43 +02:00
Arnaldo Carvalho de Melo
67e7ec0bd4 libbpf: Provide GELF_ST_VISIBILITY() define for older libelf
Where that macro isn't available.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/YJaspEh0qZr4LYOc@kernel.org
2021-05-11 23:07:33 +02:00
Björn Töpel
d25fba0e34 tools/memory-model: Fix smp_mb__after_spinlock() spelling
A misspelled git-grep regex revealed that smp_mb__after_spinlock()
was misspelled in explanation.txt.  This commit adds the missing "_".

Fixes: 1c27b644c0 ("Automate memory-barriers.txt; provide Linux-kernel memory model")
[ paulmck: Apply Alan Stern commit-log feedback. ]
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:27:20 -07:00
Paul E. McKenney
3d78668e5b torture: Don't cap remote runs by build-system number of CPUs
Currently, if a torture scenario requires more CPUs than are present
on the build system, kvm.sh and friends limit the CPUs available to
that scenario.  This makes total sense when the build system and the
system running the scenarios are one and the same, but not so much when
remote systems might well have more CPUs.

This commit therefore introduces a --remote flag to kvm.sh that suppresses
this CPU-limiting behavior, and causes kvm-remote.sh to use this flag.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:05:07 -07:00
Paul E. McKenney
c43d3b0083 torture: Make kvm-remote.sh account for network failure in pathname checks
In a long-duration kvm-remote.sh run, almost all of the remote accesses will
be simple file-existence checks.  These are thus the most likely to be caught
out by network failures, which do happen from time to time.

This commit therefore takes a first step towards tolerating temporary
network outages by making the file-existence checks repeat in the face of
such an outage.  They also print a message every minute during a outage,
allowing the user to take appropriate action.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:05:07 -07:00
Paul E. McKenney
d4240d628f rcutorture: Add BUSTED-BOOST to test RCU priority boosting tests
This commit adds the BUSTED-BOOST rcutorture scenario, which can be
used to test rcutorture's ability to test RCU priority boosting.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:05:07 -07:00
Paul E. McKenney
00ad25f601 torture: Set kvm.sh language to English
Some of the code invoked directly and indirectly from kvm.sh parses
the output of commands.  This parsing assumes English, which can cause
failures if the user has set some other language.  In a few cases,
there are language-independent commands available, but this is not
always the case.  Therefore, as an alternative to polyglot parsing,
this commit sets the LANG environment variable to en_US.UTF-8.

Reported-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:05:06 -07:00
Frederic Weisbecker
f8c8484dbd torture: Correctly fetch number of CPUs for non-English languages
Grepping for "CPU" on lscpu output isn't always successful, depending
on the local language setting.  As a result, the build can be aborted
early with:

	"make: the '-j' option requires a positive integer argument"

This commit therefore uses the human-language-independent approach
available via the getconf command, both in kvm-build.sh and in
kvm-remote.sh.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:05:06 -07:00
Paul E. McKenney
226dd39d23 torture: Make kvm-find-errors.sh account for kvm-remote.sh
Currently, kvm-find-errors.sh assumes that if "--buildonly" appears in
the log file, then the run did builds but ran no kernels.  This breaks
with kvm-remote.sh, which uses kvm.sh to do a build, then kvm-again.sh
to run the kernels built on remote systems.  This commit therefore adds
a check for a kvm-remote.sh run.

While in the area, this commit checks for "--build-only" as well as
"--build-only".

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:05:06 -07:00
Paul E. McKenney
b09751d752 torture: Make the build machine control N in "make -jN"
Given remote rcutorture runs, it is quite possible that the build system
will have fewer CPUs than the system(s) running the actual test scenarios.
In such cases, using the number of CPUs on the test systems can overload
the build system, slowing down the build or, worse, OOMing the build
system.  This commit therefore uses the build system's CPU count to set
N in "make -jN", and by tradition sets "N" to double the CPU count.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:05:06 -07:00
Paul E. McKenney
f254a0b527 torture: Make kvm.sh use abstracted kvm-end-run-stats.sh
This commit reduces duplicate code by making kvm.sh use the new
kvm-end-run-stats.sh script rather than taking its historical approach
of open-coding it.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:05:06 -07:00
Paul E. McKenney
ee8fef9137 torture: Abstract end-of-run summary
This commit abstractst the end-of-run summary from kvm-again.sh, and,
while in the area, brings its format into line with that of kvm.sh.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:05:06 -07:00
Paul E. McKenney
32dbdaf71a torture: Fix grace-period rate output
The kvm-again.sh script relies on shell comments added to the qemu-cmd
file, but this means that code extracting values from the QEMU command in
this file must grep out those commment.  Which kvm-recheck-rcu.sh failed
to do, which destroyed its grace-period-per-second calculation.  This
commit therefore adds the needed "grep -v '^#'" to kvm-recheck-rcu.sh.

Fixes: 315957cad4 ("torture: Prepare for splitting qemu execution from kvm-test-1-run.sh")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:05:06 -07:00
Paul E. McKenney
0092eae4cb torture: Add kvm-remote.sh script for distributed rcutorture test runs
This commit adds a kvm-remote.sh script that prepares a tarball that
is then downloaded to the remote system(s) and executed.  The user is
responsible for having set up the remote systems to run qemu, but all the
kernel builds are done on the system running the kvm-remote.sh script.
The user is also responsible for setting up the remote systems so that
ssh can be run non-interactively, given that ssh is used to poll the
remote systems in order to detect completion of each batch.

See the script's header comment for usage information.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:05:05 -07:00
Paul E. McKenney
179141865d rcuscale: Allow CPU hotplug to be enabled
It is no longer possible to disable CPU hotplug in many configurations,
which means that the CONFIG_HOTPLUG_CPU=n lines in rcuscale's Kconfig
options are just a source of useless diagnostics.  In addition, rcuscale
doesn't do CPU-hotplug operations in any case.  This commit therefore
changes these lines to read CONFIG_HOTPLUG_CPU=y.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:05:05 -07:00
Paul E. McKenney
68d415f91f refscale: Allow CPU hotplug to be enabled
It is no longer possible to disable CPU hotplug in many configurations,
which means that the CONFIG_HOTPLUG_CPU=n lines in refscale's Kconfig
options are just a source of useless diagnostics.  In addition, refscale
doesn't do CPU-hotplug operations in any case.  This commit therefore
changes these lines to read CONFIG_HOTPLUG_CPU=y.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:05:05 -07:00
Paul E. McKenney
fb4855c362 torture: Make kvm-again.sh use "scenarios" rather than "batches" file
This commit saves a few lines of code by making kvm-again.sh use the
"scenarios" file rather than the "batches" file, both of which are
generated by kvm.sh.

This results in a break point because new versions of kvm-again.sh cannot
handle "res" directories produced by old versions of kvm.sh, which lack
the "scenarios" file.  In the unlikely event that this becomes a problem,
a trivial script suffices to convert the "batches" file to a "scenarios"
file, and this script may be easily extracted from kvm.sh.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:05:05 -07:00
Paul E. McKenney
3d2cc4fec8 torture: Add "scenarios" option to kvm.sh --dryrun parameter
This commit adds "--dryrun scenarios" to kvm.sh, which prints something
like this:

1.  TREE03
2.  TREE07
3.  SRCU-P SRCU-N
4.  TREE01 TRACE01
5.  TREE02 TRACE02
6.  TREE04 RUDE01 TASKS01
7.  TREE05 TASKS03 SRCU-T SRCU-U
8.  TASKS02 TINY01 TINY02 TREE09

This format is more convenient for scripts that run batches of scenarios.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10 16:05:05 -07:00