Commit Graph

30609 Commits

Author SHA1 Message Date
Andrii Nakryiko
f366006342 libbpf: move xsk.{c,h} into selftests/bpf
Remove deprecated xsk APIs from libbpf. But given we have selftests
relying on this, move those files (with minimal adjustments to make them
compilable) under selftests/bpf.

We also remove all the removed APIs from libbpf.map, while overall
keeping version inheritance chain, as most APIs are backwards
compatible so there is no need to reassign them as LIBBPF_1.0.0 versions.

Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220627211527.2245459-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-28 13:13:32 -07:00
Arnaldo Carvalho de Melo
7fe718fb8f tools headers UAPI: Sync linux/kvm.h with the kernel sources
To pick the changes in:

  bfbab44568 ("KVM: arm64: Implement PSCI SYSTEM_SUSPEND")
  7b33a09d03 ("KVM: arm64: Add support for userspace to suspend a vCPU")
  ffbb61d09f ("KVM: x86: Accept KVM_[GS]ET_TSC_KHZ as a VM ioctl.")
  661a20fab7 ("KVM: x86/xen: Advertise and document KVM_XEN_HVM_CONFIG_EVTCHN_SEND")
  fde0451be8 ("KVM: x86/xen: Support per-vCPU event channel upcall via local APIC")
  28d1629f75 ("KVM: x86/xen: Kernel acceleration for XENVER_version")
  5363952605 ("KVM: x86/xen: handle PV timers oneshot mode")
  942c2490c2 ("KVM: x86/xen: Add KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID")
  2fd6df2f2b ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
  35025735a7 ("KVM: x86/xen: Support direct injection of event channel events")

That automatically adds support for this new ioctl:

  $ tools/perf/trace/beauty/kvm_ioctl.sh > before
  $ cp include/uapi/linux/kvm.h tools/include/uapi/linux/kvm.h
  $ tools/perf/trace/beauty/kvm_ioctl.sh > after
  $ diff -u before after
  --- before	2022-06-28 12:13:07.281150509 -0300
  +++ after	2022-06-28 12:13:16.423392896 -0300
  @@ -98,6 +98,7 @@
   	[0xcc] = "GET_SREGS2",
   	[0xcd] = "SET_SREGS2",
   	[0xce] = "GET_STATS_FD",
  +	[0xd0] = "XEN_HVM_EVTCHN_SEND",
   	[0xe0] = "CREATE_DEVICE",
   	[0xe1] = "SET_DEVICE_ATTR",
   	[0xe2] = "GET_DEVICE_ATTR",
  $

This silences these perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
  diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oupton@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lore.kernel.org/lkml/Yrs4RE+qfgTaWdAt@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-28 14:20:22 -03:00
Ian Rogers
579d6c6d77 perf bpf: 8 byte align bpil data
bpil data is accessed assuming 64-bit alignment resulting in undefined
behavior as the data is just byte aligned. With an -fsanitize=undefined
build the following errors are observed:

  $ sudo perf record -a sleep 1
  util/bpf-event.c:310:22: runtime error: load of misaligned address 0x55f61084520f for type '__u64', which requires 8 byte alignment
  0x55f61084520f: note: pointer points here
   a8 fe ff ff 3c  51 d3 c0 ff ff ff ff 04  84 d3 c0 ff ff ff ff d8  aa d3 c0 ff ff ff ff a4  c0 d3 c0
               ^
  util/bpf-event.c:311:20: runtime error: load of misaligned address 0x55f61084522f for type '__u32', which requires 4 byte alignment
  0x55f61084522f: note: pointer points here
   ff ff ff ff c7  17 00 00 f1 02 00 00 1f  04 00 00 58 04 00 00 00  00 00 00 0f 00 00 00 63  02 00 00
               ^
  util/bpf-event.c:198:33: runtime error: member access within misaligned address 0x55f61084523f for type 'const struct bpf_func_info', which requires 4 byte alignment
  0x55f61084523f: note: pointer points here
   58 04 00 00 00  00 00 00 0f 00 00 00 63  02 00 00 3b 00 00 00 ab  02 00 00 44 00 00 00 14  03 00 00

Correct this by rouding up the data sizes and aligning the pointers.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Monnet <quentin@isovalent.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20220614014714.1407239-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-28 12:05:25 -03:00
Arnaldo Carvalho de Melo
117c49505b tools kvm headers arm64: Update KVM headers from the kernel sources
To pick the changes from:

  2cde51f1e1 ("KVM: arm64: Hide KVM_REG_ARM_*_BMAP_BIT_COUNT from userspace")
  b22216e1a6 ("KVM: arm64: Add vendor hypervisor firmware register")
  428fd6788d ("KVM: arm64: Add standard hypervisor firmware register")
  05714cab7d ("KVM: arm64: Setup a framework for hypercall bitmap firmware registers")
  18f3976fdb ("KVM: arm64: uapi: Add kvm_debug_exit_arch.hsr_high")
  a5905d6af4 ("KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated")

That don't causes any changes in tooling (when built on x86), only
addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
  diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Raghavendra Rao Ananta <rananta@google.com>
Link: https://lore.kernel.org/lkml/YrsWcDQyJC+xsfmm@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-28 11:56:01 -03:00
Namhyung Kim
49c692b7df perf offcpu: Accept allowed sample types only
As offcpu-time event is synthesized at the end, it could not get the
all the sample info.  Define OFFCPU_SAMPLE_TYPES for allowed ones and
mask out others in evsel__config() to prevent parse errors.

Because perf sample parsing assumes a specific ordering with the
sample types, setting unsupported one would make it fail to read
data like perf record -d/--data.

Fixes: edc41a1099 ("perf record: Enable off-cpu analysis with BPF")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Blake Jones <blakejones@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: bpf@vger.kernel.org
Link: http://lore.kernel.org/lkml/20220624231313.367909-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-28 11:45:45 -03:00
Namhyung Kim
d6838ec44b perf offcpu: Fix build failure on old kernels
Old kernels have a 'struct task_struct' which contains a "state" field
and newer kernels have "__state" instead.

While the get_task_state() in the BPF code handles that in some way, it
assumed the current kernel has the new definition and it caused a build
error on old kernels.

We should not assume anything and access them carefully.  Do not use
'task struct' directly access it instead using new and old definitions
in a row.

Fixes: edc41a1099 ("perf record: Enable off-cpu analysis with BPF")
Reported-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Blake Jones <blakejones@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: bpf@vger.kernel.org
Link: http://lore.kernel.org/lkml/20220624231313.367909-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-28 11:41:26 -03:00
Victor Nogueira
88153e29c1 selftests: tc-testing: Add testcases to test new flush behaviour
Add tdc test cases to verify new flush behaviour is correct, which do
the following:

- Try to flush only one action which is being referenced by a filter
- Try to flush three actions where the last one (index 3) is being
  referenced by a filter

Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-27 21:51:28 -07:00
Arnaldo Carvalho de Melo
f8d8661940 tools headers UAPI: Synch KVM's svm.h header with the kernel
To pick up the changes from:

  d5af44dde5 ("x86/sev: Provide support for SNP guest request NAEs")
  0afb6b660a ("x86/sev: Use SEV-SNP AP creation to start secondary CPUs")
  dc3f3d2474 ("x86/mm: Validate memory when changing the C-bit")
  cbd3d4f7c4 ("x86/sev: Check SEV-SNP features support")

That gets these new SVM exit reasons:

+       { SVM_VMGEXIT_PSC,              "vmgexit_page_state_change" }, \
+       { SVM_VMGEXIT_GUEST_REQUEST,    "vmgexit_guest_request" }, \
+       { SVM_VMGEXIT_EXT_GUEST_REQUEST, "vmgexit_ext_guest_request" }, \
+       { SVM_VMGEXIT_AP_CREATION,      "vmgexit_ap_creation" }, \
+       { SVM_VMGEXIT_HV_FEATURES,      "vmgexit_hypervisor_feature" }, \

Addressing this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/svm.h' differs from latest version at 'arch/x86/include/uapi/asm/svm.h'
  diff -u tools/arch/x86/include/uapi/asm/svm.h arch/x86/include/uapi/asm/svm.h

This causes these changes:

  CC      /tmp/build/perf-urgent/arch/x86/util/kvm-stat.o
  LD      /tmp/build/perf-urgent/arch/x86/util/perf-in.o
  LD      /tmp/build/perf-urgent/arch/x86/perf-in.o
  LD      /tmp/build/perf-urgent/arch/perf-in.o
  LD      /tmp/build/perf-urgent/perf-in.o
  LINK    /tmp/build/perf-urgent/perf

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-26 12:32:55 -03:00
Arnaldo Carvalho de Melo
e2213a2dc6 tools include UAPI: Sync linux/vhost.h with the kernel sources
To get the changes in:

  84d7c8fd3a ("vhost-vdpa: introduce uAPI to set group ASID")
  2d1fcb7758 ("vhost-vdpa: uAPI to get virtqueue group id")
  a0c95f2011 ("vhost-vdpa: introduce uAPI to get the number of address spaces")
  3ace88bd37 ("vhost-vdpa: introduce uAPI to get the number of virtqueue groups")
  175d493c3c ("vhost: move the backend feature bits to vhost_types.h")

Silencing this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/vhost.h' differs from latest version at 'include/uapi/linux/vhost.h'
  diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h

To pick up these changes and support them:

  $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > before
  $ cp include/uapi/linux/vhost.h tools/include/uapi/linux/vhost.h
  $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > after
  $ diff -u before after
  --- before	2022-06-26 12:04:35.982003781 -0300
  +++ after	2022-06-26 12:04:43.819972476 -0300
  @@ -28,6 +28,7 @@
   	[0x74] = "VDPA_SET_CONFIG",
   	[0x75] = "VDPA_SET_VRING_ENABLE",
   	[0x77] = "VDPA_SET_CONFIG_CALL",
  +	[0x7C] = "VDPA_SET_GROUP_ASID",
   };
   static const char *vhost_virtio_ioctl_read_cmds[] = {
   	[0x00] = "GET_FEATURES",
  @@ -39,5 +40,8 @@
   	[0x76] = "VDPA_GET_VRING_NUM",
   	[0x78] = "VDPA_GET_IOVA_RANGE",
   	[0x79] = "VDPA_GET_CONFIG_SIZE",
  +	[0x7A] = "VDPA_GET_AS_NUM",
  +	[0x7B] = "VDPA_GET_VRING_GROUP",
   	[0x80] = "VDPA_GET_VQS_COUNT",
  +	[0x81] = "VDPA_GET_GROUP_NUM",
   };
  $

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Gautam Dawar <gautam.dawar@xilinx.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/Yrh3xMYbfeAD0MFL@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-26 12:32:55 -03:00
Gang Li
448ce0e6ea perf stat: Enable ignore_missing_thread
perf already support ignore_missing_thread for -p, but not yet
applied to `perf stat -p <pid>`. This patch enables ignore_missing_thread
for `perf stat -p <pid>`.

Committer notes:

And here is a refresher about the 'ignore_missing_thread' knob, from a
previous patch using it:

  ca8000684e ("perf evsel: Enable ignore_missing_thread for pid option")

  ---
    While monitoring a multithread process with pid option, perf sometimes
    may return sys_perf_event_open failure with 3(No such process) if any of
    the process's threads die before we open the event. However, we want
    perf continue monitoring the remaining threads and do not exit with
    error.
  ---

Signed-off-by: Gang Li <ligang.bdlg@bytedance.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220622030037.15005-1-ligang.bdlg@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-26 12:32:55 -03:00
Raul Silvera
37ed2cddcb perf inject: Adjust output data offset for backward compatibility
When 'perf inject' creates a new file, it reuses the data offset from
the input file. If there has been a change on the size of the header, as
happened in v5.12 -> v5.13, the new offsets will be wrong, resulting in
a corrupted output file.

This change adds the function perf_session__data_offset to compute the
data offset based on the current header size, and uses that instead of
the offset from the original input file.

Signed-off-by: Raul Silvera <rsilvera@google.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.king@intel.com>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220621152725.2668041-1-rsilvera@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-26 12:32:55 -03:00
Arnaldo Carvalho de Melo
3713e2494b perf trace beauty: Fix generation of errno id->str table on ALT Linux
For some reason using:

         cat <<EoFuncBegin
  static const char *errno_to_name__$arch(int err)
  {
         switch (err) {
  EoFuncBegin

In tools/perf/trace/beauty/arch_errno_names.sh isn't working on ALT
Linux sisyphus (development version), which could be some distro
specific glitch, so just get this done in an alternative way that works
everywhere while giving notice to the people working on that distro to
try and figure our what really took place.

Cc: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-26 12:32:55 -03:00
Adrian Hunter
ab66fdace8 perf build-id: Fix caching files with a wrong build ID
Build ID events associate a file name with a build ID.  However, when
using perf inject, there is no guarantee that the file on the current
machine at the current time has that build ID. Fix by comparing the
build IDs and skip adding to the cache if they are different.

Example:

  $ echo "int main() {return 0;}" > prog.c
  $ gcc -o prog prog.c
  $ perf record --buildid-all ./prog
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.019 MB perf.data ]
  $ file-buildid() { file $1 | awk -F= '{print $2}' | awk -F, '{print $1}' ; }
  $ file-buildid prog
  444ad9be165d8058a48ce2ffb4e9f55854a3293e
  $ file-buildid ~/.debug/$(pwd)/prog/444ad9be165d8058a48ce2ffb4e9f55854a3293e/elf
  444ad9be165d8058a48ce2ffb4e9f55854a3293e
  $ echo "int main() {return 1;}" > prog.c
  $ gcc -o prog prog.c
  $ file-buildid prog
  885524d5aaa24008a3e2b06caa3ea95d013c0fc5

Before:

  $ perf buildid-cache --purge $(pwd)/prog
  $ perf inject -i perf.data -o junk
  $ file-buildid ~/.debug/$(pwd)/prog/444ad9be165d8058a48ce2ffb4e9f55854a3293e/elf
  885524d5aaa24008a3e2b06caa3ea95d013c0fc5
  $

After:

  $ perf buildid-cache --purge $(pwd)/prog
  $ perf inject -i perf.data -o junk
  $ file-buildid ~/.debug/$(pwd)/prog/444ad9be165d8058a48ce2ffb4e9f55854a3293e/elf

  $

Fixes: 454c407ec1 ("perf: add perf-inject builtin")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Link: https://lore.kernel.org/r/20220621125144.5623-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-26 12:32:55 -03:00
Arnaldo Carvalho de Melo
4b3f7644ae tools headers cpufeatures: Sync with the kernel sources
To pick the changes from:

  d6d0c7f681 ("x86/cpufeatures: Add PerfMonV2 feature bit")
  296d5a17e7 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
  f30903394e ("x86/cpufeatures: Add virtual TSC_AUX feature bit")
  8ad7e8f696 ("x86/fpu/xsave: Support XSAVEC in the kernel")
  59bd54a84d ("x86/tdx: Detect running as a TDX guest in early boot")
  a77d41ac3a ("x86/cpufeatures: Add AMD Fam19h Branch Sampling feature")

This only causes these perf files to be rebuilt:

  CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
  CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o

And addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
  diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h'
  diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Babu Moger <babu.moger@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/YrDkgmwhLv+nKeOo@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-26 12:32:43 -03:00
Arnaldo Carvalho de Melo
0fdd435cb4 tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
To pick up the changes in:

  ecf8eca51f ("drm/i915/xehp: Add compute engine ABI")
  991b4de327 ("drm/i915/uapi: Add kerneldoc for engine class enum")
  c94fde8f51 ("drm/i915/uapi: Add DRM_I915_QUERY_GEOMETRY_SUBSLICES")
  1c671ad753 ("drm/i915/doc: Link query items to their uapi structs")
  a2e5402691 ("drm/i915/doc: Convert perf UAPI comments to kerneldoc")
  462ac1cdf4 ("drm/i915/doc: Convert drm_i915_query_topology_info comment to kerneldoc")
  034d47b25b ("drm/i915/uapi: Document DRM_I915_QUERY_HWCONFIG_BLOB")
  78e1fb3112 ("drm/i915/uapi: Add query for hwconfig blob")

That don't add any new ioctl, so no changes in tooling.

This silences this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
  diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h

Cc: John Harrison <John.C.Harrison@intel.com>
Cc: Matt Atwood <matthew.s.atwood@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://lore.kernel.org/lkml/YrDi4ALYjv9Mdocq@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-26 12:32:43 -03:00
Adrian Hunter
342cb0d806 perf inject: Fix missing free in copy_kcore_dir()
Free string allocated by asprintf().

Fixes: d8fc085509 ("perf inject: Keep a copy of kcore_dir")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220620103904.7960-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-26 12:32:15 -03:00
liujing
1da9e27415 tc-testing: gitignore, delete plugins directory
when we modfying kernel, commit it to our environment building. we find a error
that is "tools/testing/selftests/tc-testing/plugins" failed: No such file or directory"

we find plugins directory is ignored in
"tools/testing/selftests/tc-testing/.gitignore", but the plugins directory
is need in "tools/testing/selftests/tc-testing/Makefile"

Signed-off-by: liujing <liujing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20220622121237.5832-1-liujing@cmss.chinamobile.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-24 16:23:47 -07:00
Daniel Müller
fd75733da2 bpf: Merge "types_are_compat" logic into relo_core.c
BPF type compatibility checks (bpf_core_types_are_compat()) are
currently duplicated between kernel and user space. That's a historical
artifact more than intentional doing and can lead to subtle bugs where
one implementation is adjusted but another is forgotten.

That happened with the enum64 work, for example, where the libbpf side
was changed (commit 23b2a3a8f6 ("libbpf: Add enum64 relocation
support")) to use the btf_kind_core_compat() helper function but the
kernel side was not (commit 6089fb325c ("bpf: Add btf enum64
support")).

This patch addresses both the duplication issue, by merging both
implementations and moving them into relo_core.c, and fixes the alluded
to kind check (by giving preference to libbpf's already adjusted logic).

For discussion of the topic, please refer to:
https://lore.kernel.org/bpf/CAADnVQKbWR7oarBdewgOBZUPzryhRYvEbkhyPJQHHuxq=0K1gw@mail.gmail.com/T/#mcc99f4a33ad9a322afaf1b9276fb1f0b7add9665

Changelog:
v1 -> v2:
- limited libbpf recursion limit to 32
- changed name to __bpf_core_types_are_compat
- included warning previously present in libbpf version
- merged kernel and user space changes into a single patch

Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220623182934.2582827-1-deso@posteo.net
2022-06-24 14:15:37 -07:00
Jiri Olsa
b168852eb8 perf tools: Rework prologue generation code
Some functions we use for bpf prologue generation are going to be
deprecated. This change reworks current code not to use them.

We need to replace following functions/struct:
   bpf_program__set_prep
   bpf_program__nth_fd
   struct bpf_prog_prep_result

Currently we use bpf_program__set_prep to hook perf callback before
program is loaded and provide new instructions with the prologue.

We replace this function/ality by taking instructions for specific
program, attaching prologue to them and load such new ebpf programs
with prologue using separate bpf_prog_load calls (outside libbpf
load machinery).

Before we can take and use program instructions, we need libbpf to
actually load it. This way we get the final shape of its instructions
with all relocations and verifier adjustments).

There's one glitch though.. perf kprobe program already assumes
generated prologue code with proper values in argument registers,
so loading such program directly will fail in the verifier.

That's where the fallback pre-load handler fits in and prepends
the initialization code to the program. Once such program is loaded
we take its instructions, cut off the initialization code and prepend
the prologue.

I know.. sorry ;-)

To have access to the program when loading this patch adds support to
register 'fallback' section handler to take care of perf kprobe programs.
The fallback means that it handles any section definition besides the
ones that libbpf handles.

The handler serves two purposes:
  - allows perf programs to have special arguments in section name
  - allows perf to use pre-load callback where we can attach init
    code (zeroing all argument registers) to each perf program

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/bpf/20220616202214.70359-2-jolsa@kernel.org
2022-06-24 13:36:23 -07:00
Linus Torvalds
e946554905 ARM64:
* Fix a regression with pKVM when kmemleak is enabled
 
 * Add Oliver Upton as an official KVM/arm64 reviewer
 
 selftests:
 
 * deal with compiler optimizations around hypervisor exits
 
 x86:
 
 * MAINTAINERS reorganization
 
 * Two SEV fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmK1cjIUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMayQf+JOOggLacjPPa/t/CE8kIcbX0IWc+
 epEdq/f0qgxJlAjUB9YKgMr2Io9jPScyTdY8t6uS0WyZ7Q1NyAogXfds/dF4wElm
 IMWWfLTSU3gzCmzPh8n6SfbWtRGJKsOukK0cDIIh86h5YnXDmeyVjJrDvEVQOnzG
 TjHOKYuFXGPj8/NKwcrxqBFHK9DBNxn9b/UBRArG+5AZM0mx3Jl8LJMYUDEIyAyO
 yhNfTh7gPPidEiJLkFDyHWKg5rhO3fbn8UrncY+eTmSBqMHvvY0+eka6urwihN0v
 ExmKqy00ES51c/6r+zqsqYICqVSqiaNNWF4lp1HTp7LrUBtxyZAqnkBbHQ==
 =U2ol
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM64:

   - Fix a regression with pKVM when kmemleak is enabled

   - Add Oliver Upton as an official KVM/arm64 reviewer

  selftests:

   - deal with compiler optimizations around hypervisor exits

  x86:

   - MAINTAINERS reorganization

   - Two SEV fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SEV: Init target VMCBs in sev_migrate_from
  KVM: x86/svm: add __GFP_ACCOUNT to __sev_dbg_{en,de}crypt_user()
  MAINTAINERS: Reorganize KVM/x86 maintainership
  selftests: KVM: Handle compiler optimizations in ucall
  KVM: arm64: Add Oliver as a reviewer
  KVM: arm64: Prevent kmemleak from accessing pKVM memory
  tools/kvm_stat: fix display of error when multiple processes are found
2022-06-24 12:17:47 -07:00
Jakub Sitnicki
935336c191 selftests/bpf: Test sockmap update when socket has ULP
Cover the scenario when we cannot insert a socket into the sockmap, because
it has it is using ULP. Failed insert should not have any effect on the ULP
state. This is a regression test.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20220623091231.417138-1-jakub@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-24 11:21:50 -07:00
Eduard Zingerman
41188e9e9d selftest/bpf: Test for use-after-free bug fix in inline_bpf_loop
This test verifies that bpf_loop() inlining works as expected when
address of `env->prog` is updated. This address is updated upon BPF
program reallocation.

Reallocation is handled by bpf_prog_realloc(), which reuses old memory
if page boundary is not crossed. The value of `len` in the test is
chosen to cross this boundary on bpf_loop() patching.

Verify that the use-after-free bug in inline_bpf_loop() reported by
Dan Carpenter is fixed.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220624020613.548108-3-eddyz87@gmail.com
2022-06-24 16:51:00 +02:00
Hangbin Liu
0a2ff7cc8a Bonding: add per-port priority for failover re-selection
Add per port priority support for bonding active slave re-selection during
failover. A higher number means higher priority in selection. The primary
slave still has the highest priority. This option also follows the
primary_reselect rules.

This option could only be configured via netlink.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-24 11:27:59 +01:00
Dimitris Michailidis
b968080808 selftests/net: pass ipv6_args to udpgso_bench's IPv6 TCP test
udpgso_bench.sh has been running its IPv6 TCP test with IPv4 arguments
since its initial conmit. Looks like a typo.

Fixes: 3a687bef14 ("selftests: udp gso benchmark")
Cc: willemb@google.com
Signed-off-by: Dimitris Michailidis <dmichail@fungible.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20220623000234.61774-1-dmichail@fungible.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-23 21:19:03 -07:00
Jakub Kicinski
93817be8b6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-23 12:33:24 -07:00
Jörn-Thorben Hinz
6dc7a0baf1 selftests/bpf: Fix rare segfault in sock_fields prog test
test_sock_fields__detach() got called with a null pointer here when one
of the CHECKs or ASSERTs up to the test_sock_fields__open_and_load()
call resulted in a jump to the "done" label.

A skeletons *__detach() is not safe to call with a null pointer, though.
This led to a segfault.

Go the easy route and only call test_sock_fields__destroy() which is
null-pointer safe and includes detaching.

Came across this while looking[1] to introduce the usage of
bpf_tcp_helpers.h (included in progs/test_sock_fields.c) together with
vmlinux.h.

[1] https://lore.kernel.org/bpf/629bc069dd807d7ac646f836e9dca28bbc1108e2.camel@mailbox.tu-berlin.de/

Fixes: 8f50f16ff3 ("selftests/bpf: Extend verifier and bpf_sock tests for dst_port loads")
Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220621070116.307221-1-jthinz@mailbox.tu-berlin.de
2022-06-23 10:52:12 -07:00
Jörn-Thorben Hinz
f14a3f644a selftests/bpf: Test a BPF CC implementing the unsupported get_info()
Test whether a TCP CC implemented in BPF providing get_info() is
rejected correctly. get_info() is unsupported in a BPF CC. The check for
required functions in a BPF CC has moved, this test ensures unsupported
functions are still rejected correctly.

Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
Reviewed-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220622191227.898118-6-jthinz@mailbox.tu-berlin.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-23 09:49:58 -07:00
Jörn-Thorben Hinz
0735627d78 selftests/bpf: Test an incomplete BPF CC
Test whether a TCP CC implemented in BPF providing neither cong_avoid()
nor cong_control() is correctly rejected. This check solely depends on
tcp_register_congestion_control() now, which is invoked during
bpf_map__attach_struct_ops().

Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
Link: https://lore.kernel.org/r/20220622191227.898118-5-jthinz@mailbox.tu-berlin.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-23 09:49:57 -07:00
Jörn-Thorben Hinz
6e945d57cc selftests/bpf: Test a BPF CC writing sk_pacing_*
Test whether a TCP CC implemented in BPF is allowed to write
sk_pacing_rate and sk_pacing_status in struct sock. This is needed when
cong_control() is implemented and used.

Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
Link: https://lore.kernel.org/r/20220622191227.898118-4-jthinz@mailbox.tu-berlin.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-23 09:49:57 -07:00
Raghavendra Rao Ananta
9e2f6498ef selftests: KVM: Handle compiler optimizations in ucall
The selftests, when built with newer versions of clang, is found
to have over optimized guests' ucall() function, and eliminating
the stores for uc.cmd (perhaps due to no immediate readers). This
resulted in the userspace side always reading a value of '0', and
causing multiple test failures.

As a result, prevent the compiler from optimizing the stores in
ucall() with WRITE_ONCE().

Suggested-by: Ricardo Koller <ricarkol@google.com>
Suggested-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Message-Id: <20220615185706.1099208-1-rananta@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-23 10:26:41 -04:00
Linus Torvalds
399bd66e21 Networking fixes for 5.19-rc4, including fixes from bpf and netfilter.
Current release - regressions:
   - netfilter: cttimeout: fix slab-out-of-bounds read in cttimeout_net_exit
 
 Current release - new code bugs:
   - bpf: ftrace: keep address offset in ftrace_lookup_symbols
 
   - bpf: force cookies array to follow symbols sorting
 
 Previous releases - regressions:
   - ipv4: ping: fix bind address validity check
 
   - tipc: fix use-after-free read in tipc_named_reinit
 
   - eth: veth: add updating of trans_start
 
 Previous releases - always broken:
   - sock: redo the psock vs ULP protection check
 
   - netfilter: nf_dup_netdev: fix skb_under_panic
 
   - bpf: fix request_sock leak in sk lookup helpers
 
   - eth: igb: fix a use-after-free issue in igb_clean_tx_ring
 
   - eth: ice: prohibit improper channel config for DCB
 
   - eth: at803x: fix null pointer dereference on AR9331 phy
 
   - eth: virtio_net: fix xdp_rxq_info bug after suspend/resume
 
 Misc:
   - eth: hinic: replace memcpy() with direct assignment
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmK0P+0SHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkmBkP/1m5Et04wgtlEfQJtudZj0Sadra0tu6P
 vaYlqtiRNMziSY/hxG1p4w7giM4gD7fD3S12Pc/ueCaUwxxILN/eZ/hNgCq9huf6
 IbmVmfq6YNZwDaNzFDP8UcIqjnxbg1B3XD41dN7+FggA9ogGFkOvuAcJByzdANVX
 BLOkQmGP22+pNJmniH3KYvCZlHIa+LVeRjdjdM+1/LKDs2pxpBi97obyzb5zUiE5
 c5E7+BhkGI9X6V1TuHVCHIEFssYNWLiTJcw76HptWmK9Z/DlDEeVlHzKbAMNTycl
 I8eTLXnqgye0KCKOqJ4fN+YN42ypdDzrUILKMHGEddG1lOot/2XChgp8+EqMY7Nx
 Gjpjh28jTsKdCZMFF3lxDGxeonHciP6lZA80g3GNk4FWUVrqnKEYpdy+6psTkpDr
 HahjmFWylGXfmPIKJrsiVGIyxD4ObkRF6SSH7L8j5tAVGxaB5MDFrCws136kACCk
 ZyZiXTS0J3Cn1fAb2/vGKgDFhbEWykITYPaiVo7pyrO1jju5qQTtiKiABpcX0Ejs
 WxvPA8HB61+kEapIzBLhhxRl25CXTleGE986au2MVh0I/HuQBxVExrRE9FgThjwk
 YbSKhR2JOcD5B94HRQXVsQ05q02JzxmB0kVbqSLcIAbCOuo++LZCIdwR5XxSpF6s
 AAFhqQycWowh
 =JFWo
 -----END PGP SIGNATURE-----

Merge tag 'net-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bpf and netfilter.

  Current release - regressions:

   - netfilter: cttimeout: fix slab-out-of-bounds read in
     cttimeout_net_exit

Current release - new code bugs:

   - bpf: ftrace: keep address offset in ftrace_lookup_symbols

   - bpf: force cookies array to follow symbols sorting

  Previous releases - regressions:

   - ipv4: ping: fix bind address validity check

   - tipc: fix use-after-free read in tipc_named_reinit

   - eth: veth: add updating of trans_start

  Previous releases - always broken:

   - sock: redo the psock vs ULP protection check

   - netfilter: nf_dup_netdev: fix skb_under_panic

   - bpf: fix request_sock leak in sk lookup helpers

   - eth: igb: fix a use-after-free issue in igb_clean_tx_ring

   - eth: ice: prohibit improper channel config for DCB

   - eth: at803x: fix null pointer dereference on AR9331 phy

   - eth: virtio_net: fix xdp_rxq_info bug after suspend/resume

  Misc:

   - eth: hinic: replace memcpy() with direct assignment"

* tag 'net-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
  net: openvswitch: fix parsing of nw_proto for IPv6 fragments
  sock: redo the psock vs ULP protection check
  Revert "net/tls: fix tls_sk_proto_close executed repeatedly"
  virtio_net: fix xdp_rxq_info bug after suspend/resume
  igb: Make DMA faster when CPU is active on the PCIe link
  net: dsa: qca8k: reduce mgmt ethernet timeout
  net: dsa: qca8k: reset cpu port on MTU change
  MAINTAINERS: Add a maintainer for OCP Time Card
  hinic: Replace memcpy() with direct assignment
  Revert "drivers/net/ethernet/neterion/vxge: Fix a use-after-free bug in vxge-main.c"
  net: phy: smsc: Disable Energy Detect Power-Down in interrupt mode
  ice: ethtool: Prohibit improper channel config for DCB
  ice: ethtool: advertise 1000M speeds properly
  ice: Fix switchdev rules book keeping
  ice: ignore protocol field in GTP offload
  netfilter: nf_dup_netdev: add and use recursion counter
  netfilter: nf_dup_netdev: do not push mac header a second time
  selftests: netfilter: correct PKTGEN_SCRIPT_PATHS in nft_concat_range.sh
  net/tls: fix tls_sk_proto_close executed repeatedly
  erspan: do not assume transport header is always set
  ...
2022-06-23 09:01:01 -05:00
Dave Marchevsky
7308748925 selftests/bpf: Add benchmark for local_storage get
Add a benchmarks to demonstrate the performance cliff for local_storage
get as the number of local_storage maps increases beyond current
local_storage implementation's cache size.

"sequential get" and "interleaved get" benchmarks are added, both of
which do many bpf_task_storage_get calls on sets of task local_storage
maps of various counts, while considering a single specific map to be
'important' and counting task_storage_gets to the important map
separately in addition to normal 'hits' count of all gets. Goal here is
to mimic scenario where a particular program using one map - the
important one - is running on a system where many other local_storage
maps exist and are accessed often.

While "sequential get" benchmark does bpf_task_storage_get for map 0, 1,
..., {9, 99, 999} in order, "interleaved" benchmark interleaves 4
bpf_task_storage_gets for the important map for every 10 map gets. This
is meant to highlight performance differences when important map is
accessed far more frequently than non-important maps.

A "hashmap control" benchmark is also included for easy comparison of
standard bpf hashmap lookup vs local_storage get. The benchmark is
similar to "sequential get", but creates and uses BPF_MAP_TYPE_HASH
instead of local storage. Only one inner map is created - a hashmap
meant to hold tid -> data mapping for all tasks. Size of the hashmap is
hardcoded to my system's PID_MAX_LIMIT (4,194,304). The number of these
keys which are actually fetched as part of the benchmark is
configurable.

Addition of this benchmark is inspired by conversation with Alexei in a
previous patchset's thread [0], which highlighted the need for such a
benchmark to motivate and validate improvements to local_storage
implementation. My approach in that series focused on improving
performance for explicitly-marked 'important' maps and was rejected
with feedback to make more generally-applicable improvements while
avoiding explicitly marking maps as important. Thus the benchmark
reports both general and important-map-focused metrics, so effect of
future work on both is clear.

Regarding the benchmark results. On a powerful system (Skylake, 20
cores, 256gb ram):

Hashmap Control
===============
        num keys: 10
hashmap (control) sequential    get:  hits throughput: 20.900 ± 0.334 M ops/s, hits latency: 47.847 ns/op, important_hits throughput: 20.900 ± 0.334 M ops/s

        num keys: 1000
hashmap (control) sequential    get:  hits throughput: 13.758 ± 0.219 M ops/s, hits latency: 72.683 ns/op, important_hits throughput: 13.758 ± 0.219 M ops/s

        num keys: 10000
hashmap (control) sequential    get:  hits throughput: 6.995 ± 0.034 M ops/s, hits latency: 142.959 ns/op, important_hits throughput: 6.995 ± 0.034 M ops/s

        num keys: 100000
hashmap (control) sequential    get:  hits throughput: 4.452 ± 0.371 M ops/s, hits latency: 224.635 ns/op, important_hits throughput: 4.452 ± 0.371 M ops/s

        num keys: 4194304
hashmap (control) sequential    get:  hits throughput: 3.043 ± 0.033 M ops/s, hits latency: 328.587 ns/op, important_hits throughput: 3.043 ± 0.033 M ops/s

Local Storage
=============
        num_maps: 1
local_storage cache sequential  get:  hits throughput: 47.298 ± 0.180 M ops/s, hits latency: 21.142 ns/op, important_hits throughput: 47.298 ± 0.180 M ops/s
local_storage cache interleaved get:  hits throughput: 55.277 ± 0.888 M ops/s, hits latency: 18.091 ns/op, important_hits throughput: 55.277 ± 0.888 M ops/s

        num_maps: 10
local_storage cache sequential  get:  hits throughput: 40.240 ± 0.802 M ops/s, hits latency: 24.851 ns/op, important_hits throughput: 4.024 ± 0.080 M ops/s
local_storage cache interleaved get:  hits throughput: 48.701 ± 0.722 M ops/s, hits latency: 20.533 ns/op, important_hits throughput: 17.393 ± 0.258 M ops/s

        num_maps: 16
local_storage cache sequential  get:  hits throughput: 44.515 ± 0.708 M ops/s, hits latency: 22.464 ns/op, important_hits throughput: 2.782 ± 0.044 M ops/s
local_storage cache interleaved get:  hits throughput: 49.553 ± 2.260 M ops/s, hits latency: 20.181 ns/op, important_hits throughput: 15.767 ± 0.719 M ops/s

        num_maps: 17
local_storage cache sequential  get:  hits throughput: 38.778 ± 0.302 M ops/s, hits latency: 25.788 ns/op, important_hits throughput: 2.284 ± 0.018 M ops/s
local_storage cache interleaved get:  hits throughput: 43.848 ± 1.023 M ops/s, hits latency: 22.806 ns/op, important_hits throughput: 13.349 ± 0.311 M ops/s

        num_maps: 24
local_storage cache sequential  get:  hits throughput: 19.317 ± 0.568 M ops/s, hits latency: 51.769 ns/op, important_hits throughput: 0.806 ± 0.024 M ops/s
local_storage cache interleaved get:  hits throughput: 24.397 ± 0.272 M ops/s, hits latency: 40.989 ns/op, important_hits throughput: 6.863 ± 0.077 M ops/s

        num_maps: 32
local_storage cache sequential  get:  hits throughput: 13.333 ± 0.135 M ops/s, hits latency: 75.000 ns/op, important_hits throughput: 0.417 ± 0.004 M ops/s
local_storage cache interleaved get:  hits throughput: 16.898 ± 0.383 M ops/s, hits latency: 59.178 ns/op, important_hits throughput: 4.717 ± 0.107 M ops/s

        num_maps: 100
local_storage cache sequential  get:  hits throughput: 6.360 ± 0.107 M ops/s, hits latency: 157.233 ns/op, important_hits throughput: 0.064 ± 0.001 M ops/s
local_storage cache interleaved get:  hits throughput: 7.303 ± 0.362 M ops/s, hits latency: 136.930 ns/op, important_hits throughput: 1.907 ± 0.094 M ops/s

        num_maps: 1000
local_storage cache sequential  get:  hits throughput: 0.452 ± 0.010 M ops/s, hits latency: 2214.022 ns/op, important_hits throughput: 0.000 ± 0.000 M ops/s
local_storage cache interleaved get:  hits throughput: 0.542 ± 0.007 M ops/s, hits latency: 1843.341 ns/op, important_hits throughput: 0.136 ± 0.002 M ops/s

Looking at the "sequential get" results, it's clear that as the
number of task local_storage maps grows beyond the current cache size
(16), there's a significant reduction in hits throughput. Note that
current local_storage implementation assigns a cache_idx to maps as they
are created. Since "sequential get" is creating maps 0..n in order and
then doing bpf_task_storage_get calls in the same order, the benchmark
is effectively ensuring that a map will not be in cache when the program
tries to access it.

For "interleaved get" results, important-map hits throughput is greatly
increased as the important map is more likely to be in cache by virtue
of being accessed far more frequently. Throughput still reduces as #
maps increases, though.

To get a sense of the overhead of the benchmark program, I
commented out bpf_task_storage_get/bpf_map_lookup_elem in
local_storage_bench.c and ran the benchmark on the same host as the
'real' run. Results:

Hashmap Control
===============
        num keys: 10
hashmap (control) sequential    get:  hits throughput: 54.288 ± 0.655 M ops/s, hits latency: 18.420 ns/op, important_hits throughput: 54.288 ± 0.655 M ops/s

        num keys: 1000
hashmap (control) sequential    get:  hits throughput: 52.913 ± 0.519 M ops/s, hits latency: 18.899 ns/op, important_hits throughput: 52.913 ± 0.519 M ops/s

        num keys: 10000
hashmap (control) sequential    get:  hits throughput: 53.480 ± 1.235 M ops/s, hits latency: 18.699 ns/op, important_hits throughput: 53.480 ± 1.235 M ops/s

        num keys: 100000
hashmap (control) sequential    get:  hits throughput: 54.982 ± 1.902 M ops/s, hits latency: 18.188 ns/op, important_hits throughput: 54.982 ± 1.902 M ops/s

        num keys: 4194304
hashmap (control) sequential    get:  hits throughput: 50.858 ± 0.707 M ops/s, hits latency: 19.662 ns/op, important_hits throughput: 50.858 ± 0.707 M ops/s

Local Storage
=============
        num_maps: 1
local_storage cache sequential  get:  hits throughput: 110.990 ± 4.828 M ops/s, hits latency: 9.010 ns/op, important_hits throughput: 110.990 ± 4.828 M ops/s
local_storage cache interleaved get:  hits throughput: 161.057 ± 4.090 M ops/s, hits latency: 6.209 ns/op, important_hits throughput: 161.057 ± 4.090 M ops/s

        num_maps: 10
local_storage cache sequential  get:  hits throughput: 112.930 ± 1.079 M ops/s, hits latency: 8.855 ns/op, important_hits throughput: 11.293 ± 0.108 M ops/s
local_storage cache interleaved get:  hits throughput: 115.841 ± 2.088 M ops/s, hits latency: 8.633 ns/op, important_hits throughput: 41.372 ± 0.746 M ops/s

        num_maps: 16
local_storage cache sequential  get:  hits throughput: 115.653 ± 0.416 M ops/s, hits latency: 8.647 ns/op, important_hits throughput: 7.228 ± 0.026 M ops/s
local_storage cache interleaved get:  hits throughput: 138.717 ± 1.649 M ops/s, hits latency: 7.209 ns/op, important_hits throughput: 44.137 ± 0.525 M ops/s

        num_maps: 17
local_storage cache sequential  get:  hits throughput: 112.020 ± 1.649 M ops/s, hits latency: 8.927 ns/op, important_hits throughput: 6.598 ± 0.097 M ops/s
local_storage cache interleaved get:  hits throughput: 128.089 ± 1.960 M ops/s, hits latency: 7.807 ns/op, important_hits throughput: 38.995 ± 0.597 M ops/s

        num_maps: 24
local_storage cache sequential  get:  hits throughput: 92.447 ± 5.170 M ops/s, hits latency: 10.817 ns/op, important_hits throughput: 3.855 ± 0.216 M ops/s
local_storage cache interleaved get:  hits throughput: 128.844 ± 2.808 M ops/s, hits latency: 7.761 ns/op, important_hits throughput: 36.245 ± 0.790 M ops/s

        num_maps: 32
local_storage cache sequential  get:  hits throughput: 102.042 ± 1.462 M ops/s, hits latency: 9.800 ns/op, important_hits throughput: 3.194 ± 0.046 M ops/s
local_storage cache interleaved get:  hits throughput: 126.577 ± 1.818 M ops/s, hits latency: 7.900 ns/op, important_hits throughput: 35.332 ± 0.507 M ops/s

        num_maps: 100
local_storage cache sequential  get:  hits throughput: 111.327 ± 1.401 M ops/s, hits latency: 8.983 ns/op, important_hits throughput: 1.113 ± 0.014 M ops/s
local_storage cache interleaved get:  hits throughput: 131.327 ± 1.339 M ops/s, hits latency: 7.615 ns/op, important_hits throughput: 34.302 ± 0.350 M ops/s

        num_maps: 1000
local_storage cache sequential  get:  hits throughput: 101.978 ± 0.563 M ops/s, hits latency: 9.806 ns/op, important_hits throughput: 0.102 ± 0.001 M ops/s
local_storage cache interleaved get:  hits throughput: 141.084 ± 1.098 M ops/s, hits latency: 7.088 ns/op, important_hits throughput: 35.430 ± 0.276 M ops/s

Adjusting for overhead, latency numbers for "hashmap control" and
"sequential get" are:

hashmap_control_1k:   ~53.8ns
hashmap_control_10k:  ~124.2ns
hashmap_control_100k: ~206.5ns
sequential_get_1:     ~12.1ns
sequential_get_10:    ~16.0ns
sequential_get_16:    ~13.8ns
sequential_get_17:    ~16.8ns
sequential_get_24:    ~40.9ns
sequential_get_32:    ~65.2ns
sequential_get_100:   ~148.2ns
sequential_get_1000:  ~2204ns

Clearly demonstrating a cliff.

In the discussion for v1 of this patch, Alexei noted that local_storage
was 2.5x faster than a large hashmap when initially implemented [1]. The
benchmark results show that local_storage is 5-10x faster: a
long-running BPF application putting some pid-specific info into a
hashmap for each pid it sees will probably see on the order of 10-100k
pids. Bench numbers for hashmaps of this size are ~10x slower than
sequential_get_16, but as the number of local_storage maps grows far
past local_storage cache size the performance advantage shrinks and
eventually reverses.

When running the benchmarks it may be necessary to bump 'open files'
ulimit for a successful run.

  [0]: https://lore.kernel.org/all/20220420002143.1096548-1-davemarchevsky@fb.com
  [1]: https://lore.kernel.org/bpf/20220511173305.ftldpn23m4ski3d3@MBP-98dd607d3435.dhcp.thefacebook.com/

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20220620222554.270578-1-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-22 19:14:33 -07:00
Linus Torvalds
de5c208d53 linux-kselftest-fixes-5.19-rc4
This Kselftest fixes update for Linux 5.19-rc4 consists of compile
 time fixes and run-time resources leaks.
 
 -- Fix clang cross compilation
 -- Fix resource leak when return error
 -- fix compile error for dma_map_benchmark
 -- Fix regression - make use of GUP_TEST_FILE macro
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmKzUYsACgkQCwJExA0N
 Qxx3txAA6u4Ot7xF4L5ZC2VSpiAfnVwyiusfUFgOIcqTTb/tVhtHR2YOlNp49ETi
 LRA1vBM+OjmS5BDGEq10M1aiI4xyUB0jtEdts/pzouzk+rluIyjQy94taYsxVSq3
 ZUBpe1h09MZAX4s5tZfpJDyOOkFU8kuFKFASUGzdu/lr/NHC97rehwLW5KYG0F2J
 Pe4H1l5YM9ZODl5rgaMML2azCDqnp6/WpRU4Ov2W6q8T5X/cLwNxv7AnacMb4ajx
 IP9n4rOR05EBqsk9JZ0qJ237PsaffWN/6heczqHPuwPiyL/VKpKDSZTrAu2yBl1v
 oXS+kdGj1alvxDlfYPd5fMbvZyYyLQyinNwN/2X5MNYB9qnUc+DJIbqHbLtpjg+h
 hpeN55mAHOW/yhQDcnFIgE9RuGPPkC0WKu1RtOu6ovKCGb8Ar1cJQA/5qjnfFOhW
 HeiqD6IMuVo07oi5KxH5nSXBNvm3n5avwAO3gUOiotZXMqFa2x9lE3GWv4Hf51W8
 7OXbOF6GL1m9S5WF9Rhol6iviLRxMEESV9oAsKpi+AeyNEjkwrrJpvEIeRpPfm62
 DmspylVBEUlOPiMKwf2A6/MqhVMN8vt4ssAobWbcYHdlee4521FiGe6durLs5z/Y
 YECCoBCIiiFQaojhSdt1Vuz5oq9Ht/hITwMcf7AEIzA6b+A2YYQ=
 =orZI
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-fixes-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kselftest fixes from Shuah Khan:
 "Compile time fixes and run-time resources leaks:

   - Fix clang cross compilation

   - Fix resource leak when return error

   - fix compile error for dma_map_benchmark

   - Fix regression - make use of GUP_TEST_FILE macro"

* tag 'linux-kselftest-fixes-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: make use of GUP_TEST_FILE macro
  selftests: vm: Fix resource leak when return error
  selftests dma: fix compile error for dma_map_benchmark
  selftests: Fix clang cross compilation
2022-06-22 14:08:06 -05:00
Jakub Kicinski
53664d51d3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Use get_random_u32() instead of prandom_u32_state() in nft_meta
   and nft_numgen, from Florian Westphal.

2) Incorrect list head in nfnetlink_cttimeout in recent update coming
   from previous development cycle. Also from Florian.

3) Incorrect path to pktgen scripts for nft_concat_range.sh selftest.
   From Jie2x Zhou.

4) Two fixes for the for nft_fwd and nft_dup egress support, from Florian.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_dup_netdev: add and use recursion counter
  netfilter: nf_dup_netdev: do not push mac header a second time
  selftests: netfilter: correct PKTGEN_SCRIPT_PATHS in nft_concat_range.sh
  netfilter: cttimeout: fix slab-out-of-bounds read typo in cttimeout_net_exit
  netfilter: use get_random_u32 instead of prandom
====================

Link: https://lore.kernel.org/r/20220621085618.3975-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-21 22:41:41 -07:00
Jie2x Zhou
5d79d8af8d selftests: netfilter: correct PKTGEN_SCRIPT_PATHS in nft_concat_range.sh
Before change:
make -C netfilter
 TEST: performance
   net,port                                                      [SKIP]
   perf not supported
   port,net                                                      [SKIP]
   perf not supported
   net6,port                                                     [SKIP]
   perf not supported
   port,proto                                                    [SKIP]
   perf not supported
   net6,port,mac                                                 [SKIP]
   perf not supported
   net6,port,mac,proto                                           [SKIP]
   perf not supported
   net,mac                                                       [SKIP]
   perf not supported

After change:
   net,mac                                                       [ OK ]
     baseline (drop from netdev hook):               2061098pps
     baseline hash (non-ranged entries):             1606741pps
     baseline rbtree (match on first field only):    1191607pps
     set with  1000 full, ranged entries:            1639119pps
ok 8 selftests: netfilter: nft_concat_range.sh

Fixes: 611973c1e0 ("selftests: netfilter: Introduce tests for sets with range concatenation")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jie2x Zhou <jie2x.zhou@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-06-21 10:50:40 +02:00
Eduard Zingerman
0e1bf9ed20 selftests/bpf: BPF test_prog selftests for bpf_loop inlining
Two new test BPF programs for test_prog selftests checking bpf_loop
behavior. Both are corner cases for bpf_loop inlinig transformation:
 - check that bpf_loop behaves correctly when callback function is not
   a compile time constant
 - check that local function variables are not affected by allocating
   additional stack storage for registers spilled by loop inlining

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/r/20220620235344.569325-6-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-20 17:40:52 -07:00
Eduard Zingerman
f8acfdd044 selftests/bpf: BPF test_verifier selftests for bpf_loop inlining
A number of test cases for BPF selftests test_verifier to check how
bpf_loop inline transformation rewrites the BPF program. The following
cases are covered:
 - happy path
 - no-rewrite when flags is non-zero
 - no-rewrite when callback is non-constant
 - subprogno in insn_aux is updated correctly when dead sub-programs
   are removed
 - check that correct stack offsets are assigned for spilling of R6-R8
   registers

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/r/20220620235344.569325-5-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-20 17:40:51 -07:00
Eduard Zingerman
7a42008ca5 selftests/bpf: allow BTF specs and func infos in test_verifier tests
The BTF and func_info specification for test_verifier tests follows
the same notation as in prog_tests/btf.c tests. E.g.:

  ...
  .func_info = { { 0, 6 }, { 8, 7 } },
  .func_info_cnt = 2,
  .btf_strings = "\0int\0",
  .btf_types = {
    BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4),
    BTF_PTR_ENC(1),
  },
  ...

The BTF specification is loaded only when specified.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/r/20220620235344.569325-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-20 17:40:51 -07:00
Eduard Zingerman
933ff53191 selftests/bpf: specify expected instructions in test_verifier tests
Allows to specify expected and unexpected instruction sequences in
test_verifier test cases. The instructions are requested from kernel
after BPF program loading, thus allowing to check some of the
transformations applied by BPF verifier.

- `expected_insn` field specifies a sequence of instructions expected
  to be found in the program;
- `unexpected_insn` field specifies a sequence of instructions that
  are not expected to be found in the program;
- `INSN_OFF_MASK` and `INSN_IMM_MASK` values could be used to mask
  `off` and `imm` fields.
- `SKIP_INSNS` could be used to specify that some instructions in the
  (un)expected pattern are not important (behavior similar to usage of
  `\t` in `errstr` field).

The intended usage is as follows:

  {
	"inline simple bpf_loop call",
	.insns = {
	/* main */
	BPF_ALU64_IMM(BPF_MOV, BPF_REG_1, 1),
	BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, BPF_REG_2,
			BPF_PSEUDO_FUNC, 0, 6),
    ...
	BPF_EXIT_INSN(),
	/* callback */
	BPF_ALU64_IMM(BPF_MOV, BPF_REG_0, 1),
	BPF_EXIT_INSN(),
	},
	.expected_insns = {
	BPF_ALU64_IMM(BPF_MOV, BPF_REG_1, 1),
	SKIP_INSNS(),
	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_CALL, 8, 1)
	},
	.unexpected_insns = {
	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0,
			INSN_OFF_MASK, INSN_IMM_MASK),
	},
	.prog_type = BPF_PROG_TYPE_TRACEPOINT,
	.result = ACCEPT,
	.runs = 0,
  },

Here it is expected that move of 1 to register 1 would remain in place
and helper function call instruction would be replaced by a relative
call instruction.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/r/20220620235344.569325-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-20 17:40:51 -07:00
Linus Torvalds
c5b3a0946b perf tool fixes for v5.19, 1st batch:
- Don't set data source if it's not a memory operation in ARM SPE (Statistical
   Profiling Extensions).
 
 - Fix handling of exponent floating point values in perf stat expressions.
 
 - Don't leak fd on failure on libperf open.
 
 - Fix 'perf test' CPU topology test for PPC guest systems.
 
 - Fix undefined behaviour on breakpoint account 'perf test' entry.
 
 - Record only user callchains on the "Check ARM64 callgraphs are complete in FP
   mode" 'perf test' entry.
 
 - Fix "perf stat CSV output linter" test on s390.
 
 - Sync batch of kernel headers with tools/perf/.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYq93OQAKCRCyPKLppCJ+
 Jy0RAQD+B7rvd/4VcbLY7hJkhH4h8C2J4Le59Owq3W8TZTTNDgEAupY1IDPnJlWZ
 BxIkcRKh0+aB6tVGMKIASlWs3ugc+wQ=
 =Ym7P
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-fixes-for-v5.19-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tool fixes from Arnaldo Carvalho de Melo:

 - Don't set data source if it's not a memory operation in ARM SPE
   (Statistical Profiling Extensions).

 - Fix handling of exponent floating point values in perf stat
   expressions.

 - Don't leak fd on failure on libperf open.

 - Fix 'perf test' CPU topology test for PPC guest systems.

 - Fix undefined behaviour on breakpoint account 'perf test' entry.

 - Record only user callchains on the "Check ARM64 callgraphs are
   complete in FP mode" 'perf test' entry.

 - Fix "perf stat CSV output linter" test on s390.

 - Sync batch of kernel headers with tools/perf/.

* tag 'perf-tools-fixes-for-v5.19-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  tools headers UAPI: Sync linux/prctl.h with the kernel sources
  perf metrics: Ensure at least 1 id per metric
  tools headers arm64: Sync arm64's cputype.h with the kernel sources
  tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
  perf arm-spe: Don't set data source if it's not a memory operation
  perf expr: Allow exponents on floating point values
  perf test topology: Use !strncmp(right platform) to fix guest PPC comparision check
  perf test: Record only user callchains on the "Check Arm64 callgraphs are complete in fp mode" test
  perf beauty: Update copy of linux/socket.h with the kernel sources
  perf test: Fix variable length array undefined behavior in bp_account
  libperf evsel: Open shouldn't leak fd on failure
  perf test: Fix "perf stat CSV output linter" test on s390
  perf unwind: Fix uninitialized variable
2022-06-20 09:31:03 -05:00
Maxim Mikityanskiy
e068c0776b selftests/bpf: Enable config options needed for xdp_synproxy test
This commit adds the kernel config options needed to run the recently
added xdp_synproxy test. Users without these options will hit errors
like this:

test_synproxy:FAIL:iptables -t raw -I PREROUTING         -i tmp1 -p
tcp -m tcp --syn --dport 8080 -j CT --notrack unexpected error: 256
(errno 22)

Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220620104939.4094104-1-maximmi@nvidia.com
2022-06-20 14:10:54 +02:00
Riccardo Paolo Bestetti
313c502fa3 ipv4: fix bind address validity regression tests
Commit 8ff978b8b2 ("ipv4/raw: support binding to nonlocal addresses")
introduces support for binding to nonlocal addresses, as well as some
basic test coverage for some of the related cases.

Commit b4a028c4d0 ("ipv4: ping: fix bind address validity check")
fixes a regression which incorrectly removed some checks for bind
address validation. In addition, it introduces regression tests for
those specific checks. However, those regression tests are defective, in
that they perform the tests using an incorrect combination of bind
flags. As a result, those tests fail when they should succeed.

This commit introduces additional regression tests for nonlocal binding
and fixes the defective regression tests. It also introduces new
set_sysctl calls for the ipv4_bind test group, as to perform the ICMP
binding tests it is necessary to allow ICMP socket creation by setting
the net.ipv4.ping_group_range knob.

Fixes: b4a028c4d0 ("ipv4: ping: fix bind address validity check")
Reported-by: Riccardo Paolo Bestetti <pbl@bestov.io>
Signed-off-by: Riccardo Paolo Bestetti <pbl@bestov.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-20 09:58:12 +01:00
Linus Torvalds
5d770f11a1 Build tool updates:
- Remove obsolete CONFIG_X86_SMAP reference from objtool
 
  - Fix overlapping text section failures in faddr2line for real
 
  - Remove OBJECT_FILES_NON_STANDARD usage from x86 ftrace and replace it
    with finegrained annotations so objtool can validate that code
    correctly.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmKvHbwTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoQlsEADC7BotE1Mi8HITScIlHT19bkZ7Bm6M
 g46RCtUw0u+lo7BclIetNYGWQ0z+D+1DbmZTBv5D22N/Wd6MrwOuuwRVtJ2dEosF
 l/EK1qKLxCGMdQ1x0KOmrjzHB4WmcUH0pQwDHa5N5s5IyrJRRFHtoLk3EYp6QIhi
 mIZTJgGUEe/aS7BtFM5xARprm7JxdbhXgXVbnC1ctzpOi2y6O7F4Ek3x4j5zTGV/
 49iuO2ljqmKdoFlPa1fhuw+O9sTUf+RmKLPEU+2nm7Wg+wlIwKIB3yDDBoZ9Q7lm
 YdorW0/506x4qOwa8KAjxXEb2qgZkL/fQcMS0jQQKNNDD8GHU+3YdjCNbrmZjA+T
 Ib0rs9YsxgLstnmqzU3QfarhgtdzIMhzF0cofoFDK0a8tHLqOhZF4j7wLmYo52ec
 6Hrt8IvJPW2b3TC6KfRAF+uWsDDoaum0OOFVjVu1b9G6fFf0i8h5JGWdkJh8cau9
 0qvAYg0sDjRf7zAZ6kbbGDVCljECyypIdCBF3ihA6yMhJX16VihEWTJyqIaoO+a0
 0XzPvkhX+6mY0kqn+5HvvhLEY4POA8pFKeTPx4eABvKnw7m01I9cJfHa7HEBQHe6
 ZJA7qWMBRrPfK2Ogrx8zohj2sOdHMHlIQU+6Ai+7+BHs13Nje4umGMu2YuZrnfIF
 gQ1ayCN8OsUwVA==
 =12KJ
 -----END PGP SIGNATURE-----

Merge tag 'objtool-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull build tooling updates from Thomas Gleixner:

 - Remove obsolete CONFIG_X86_SMAP reference from objtool

 - Fix overlapping text section failures in faddr2line for real

 - Remove OBJECT_FILES_NON_STANDARD usage from x86 ftrace and replace it
   with finegrained annotations so objtool can validate that code
   correctly.

* tag 'objtool-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ftrace: Remove OBJECT_FILES_NON_STANDARD usage
  faddr2line: Fix overlapping text section failures, the sequel
  objtool: Fix obsolete reference to CONFIG_X86_SMAP
2022-06-19 09:54:16 -05:00
Arnaldo Carvalho de Melo
140cd9ec8f tools headers UAPI: Sync linux/prctl.h with the kernel sources
To pick the changes in:

  9e4ab6c891 ("arm64/sme: Implement vector length configuration prctl()s")

That don't result in any changes in tooling:

  $ tools/perf/trace/beauty/prctl_option.sh > before
  $ cp include/uapi/linux/prctl.h tools/include/uapi/linux/prctl.h
  $ tools/perf/trace/beauty/prctl_option.sh > after
  $ diff -u before after
  $

Just silences this perf tools build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/prctl.h' differs from latest version at 'include/uapi/linux/prctl.h'
  diff -u tools/include/uapi/linux/prctl.h include/uapi/linux/prctl.h

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Link: http://lore.kernel.org/lkml/Yq81we+XFOqlBWyu@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-19 11:42:25 -03:00
Ian Rogers
c788ef61ef perf metrics: Ensure at least 1 id per metric
We may have no events for a metric evaluated to a constant. In such a
case ensure a tool event is at least evaluated for metric parsing and
displaying.

Fixes: 8586d2744f ("perf metrics: Don't add all tool events for sharing")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220618013957.999321-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-19 11:24:05 -03:00
Arnaldo Carvalho de Melo
37402d5d06 tools headers arm64: Sync arm64's cputype.h with the kernel sources
To get the changes in:

  cae889302e ("KVM: arm64: vgic-v3: List M1 Pro/Max as requiring the SEIS workaround")

That addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/arm64/include/asm/cputype.h' differs from latest version at 'arch/arm64/include/asm/cputype.h'
  diff -u tools/arch/arm64/include/asm/cputype.h arch/arm64/include/asm/cputype.h

Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/lkml/Yq8w7p4omYKNwOij@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-19 11:23:04 -03:00
Arnaldo Carvalho de Melo
2e323f360a tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
To pick the changes in:

  f1a9761fbb ("KVM: x86: Allow userspace to opt out of hypercall patching")

That just rebuilds kvm-stat.c on x86, no change in functionality.

This silences these perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h'
  diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h

Cc: Oliver Upton <oupton@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/lkml/Yq8qgiMwRcl9ds+f@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-19 11:22:59 -03:00
Leo Yan
51ba539f5b perf arm-spe: Don't set data source if it's not a memory operation
Except for memory load and store operations, ARM SPE records also can
support other operation types, bug when set the data source field the
current code assumes a record is a either load operation or store
operation, this leads to wrongly synthesize memory samples.

This patch strictly checks the record operation type, it only sets data
source only for the operation types ARM_SPE_LD and ARM_SPE_ST,
otherwise, returns zero for data source.  Therefore, we can synthesize
memory samples only when data source is a non-zero value, the function
arm_spe__is_memory_event() is useless and removed.

Fixes: e55ed3423c ("perf arm-spe: Synthesize memory event")
Reviewed-by: Ali Saidi <alisaidi@amazon.com>
Reviewed-by: German Gomez <german.gomez@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Ali Saidi <alisaidi@amazon.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: alisaidi@amazon.com
Cc: Andrew Kilroy <andrew.kilroy@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Forrington <nick.forrington@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: http://lore.kernel.org/lkml/20220517020326.18580-5-alisaidi@amazon.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-19 10:41:43 -03:00
Ian Rogers
e5287e6dd3 perf expr: Allow exponents on floating point values
Pass the optional exponent component through to strtod that already
supports it. We already have exponents in ScaleUnit and so this adds
uniformity.

Reported-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Reviewed-By: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20220527020653.4160884-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-19 10:41:43 -03:00
Athira Rajeev
b236371421 perf test topology: Use !strncmp(right platform) to fix guest PPC comparision check
commit cfd7092c31 ("perf test session topology: Fix test to skip
the test in guest environment") added check to skip the testcase if the
socket_id can't be fetched from topology info.

But the condition check uses strncmp which should be changed to !strncmp
and to correctly match platform.

Fix this condition check.

Fixes: cfd7092c31 ("perf test session topology: Fix test to skip the test in guest environment")
Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Link: https://lore.kernel.org/r/20220610135939.63361-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-19 10:41:43 -03:00
Michael Petlan
72dcae8efd perf test: Record only user callchains on the "Check Arm64 callgraphs are complete in fp mode" test
The testcase 'Check Arm64 callgraphs are complete in fp mode' wants to
see the following output:

    610 leaf
    62f parent
    648 main

However, without excluding kernel callchains, the output might look like:

	ffffc2ff40ef1b5c arch_local_irq_enable
	ffffc2ff419d032c __schedule
	ffffc2ff419d06c0 schedule
	ffffc2ff40e4da30 do_notify_resume
	ffffc2ff40e421b0 work_pending
	             610 leaf
	             62f parent
	             648 main

Adding '--user-callchains' leaves only the wanted symbols in the chain.

Fixes: cd6382d827 ("perf test arm64: Test unwinding using fame-pointer (fp) mode")
Suggested-by: German Gomez <german.gomez@arm.com>
Reviewed-by: German Gomez <german.gomez@arm.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220614105207.26223-1-mpetlan@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-19 10:41:43 -03:00
Arnaldo Carvalho de Melo
67e7d77158 perf beauty: Update copy of linux/socket.h with the kernel sources
To pick the changes in:

  f94fd25cb0 ("tcp: pass back data left in socket after receive")

That don't result in any changes in the tables generated from that
header.

This silences this perf build warning:

  Warning: Kernel ABI header at 'tools/perf/trace/beauty/include/linux/socket.h' differs from latest version at 'include/linux/socket.h'
  diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h

Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/all/YqORj9d58AiGYl8b@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-19 10:41:43 -03:00
Ian Rogers
cc2145526c perf test: Fix variable length array undefined behavior in bp_account
Fix:

  tests/bp_account.c:154:9: runtime error: variable length array bound evaluates to non-positive value 0

by switching from a variable length to an allocated array.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220610180247.444798-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-19 10:41:43 -03:00
Ian Rogers
94725994cf libperf evsel: Open shouldn't leak fd on failure
If perf_event_open() fails the fd is opened but it is only freed by
closing (not by delete).

Typically when an open fails you don't call close and so this results in
a memory leak. To avoid this, add a close when open fails.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-By: Kajol Jain <kjain@linux.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220609052355.1300162-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-19 10:41:43 -03:00
Thomas Richter
ec906102e5 perf test: Fix "perf stat CSV output linter" test on s390
perf test -F 83 ("perf stat CSV output linter") fails on s390.

Reason is the wrong number of fields for certain CPU core/die/socket
related output.

On x84_64 the output of command:

  # ./perf stat -x, -A -a --no-merge true
  CPU0,1.50,msec,cpu-clock,1502781,100.00,1.052,CPUs utilized
  CPU1,1.48,msec,cpu-clock,1476113,100.00,1.034,CPUs utilized
  ...

results in 8 fields with 7 comma separators.

On s390 the output of command:

  #  ./perf stat -x, -A -a --no-merge -- true
  0.95,msec,cpu-clock,949800,100.00,1.060,CPUs utilized
  ...

results in 7 fields with 6 comma separators. Therefore this tests
fails on s390. Similar issues exist for per-die and per-socket output
which is not supported on s390.

I have rewritten the python program to count commas in each output line
into a bash function to achieve the same result. I hope this makes it a
bit easier.

Output before:

  # ./perf test -F 83
  83: perf stat CSV output linter  :
  Checking CSV output: no args [Success]
  Checking CSV output: system wide [Success]
  Checking CSV output: system wide Checking CSV output: \
	  system wide no aggregation 6.92,msec,cpu-clock,\
	  6918131,100.00,6.972,CPUs utilized
  ...
  RuntimeError: wrong number of fields. expected 7 in \
	  6.92,msec,cpu-clock,6918131,100.00,6.972,CPUs utilized

  FAILED!
  #

Output after:

  # ./perf test -F 83
  83: perf stat CSV output linter             :
  Checking CSV output: no args [Success]
  Checking CSV output: system wide [Success]
  Checking CSV output: system wide Checking CSV output:\
	  system wide no aggregation [Success]
  Checking CSV output: interval [Success]
  Checking CSV output: event [Success]
  Checking CSV output: per core [Success]
  Checking CSV output: per thread [Success]
  Checking CSV output: per die [Success]
  Checking CSV output: per node [Success]
  Checking CSV output: per socket [Success]
  Ok
  #

Committer notes:

Continues to work on x86_64

  $ perf test lint
   89: perf stat CSV output linter                                     : Ok
  $ perf test -v lint
  Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc
   89: perf stat CSV output linter                                     :
  --- start ---
  test child forked, pid 53133
  Checking CSV output: no args [Success]
  Checking CSV output: system wide [Skip] paranoid and not root
  Checking CSV output: system wide [Skip] paranoid and not root
  Checking CSV output: interval [Success]
  Checking CSV output: event [Success]
  Checking CSV output: per core [Skip] paranoid and not root
  Checking CSV output: per thread [Skip] paranoid and not root
  Checking CSV output: per die [Skip] paranoid and not root
  Checking CSV output: per node [Skip] paranoid and not root
  Checking CSV output: per socket [Skip] paranoid and not root
  test child finished with 0
  ---- end ----
  perf stat CSV output linter: Ok
  $

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Claire Jensen <cjense@google.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: linux390-list@tuxmaker.boeblingen.de.ibm.com
Link: https://lore.kernel.org/r/20220603113034.2009728-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-19 10:41:43 -03:00
Ian Rogers
1d98cdf7fa perf unwind: Fix uninitialized variable
The 'ret' variable may be uninitialized on error goto paths.

Fixes: dc2cf4ca86 ("perf unwind: Fix segbase for ld.lld linked objects")
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Fangrui Song <maskray@google.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64)
Cc: Fangrui Song <maskray@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: llvm@lists.linux.dev
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Ullrich <sebasti@nullri.ch>
Link: https://lore.kernel.org/r/20220607000851.39798-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-19 10:41:43 -03:00
Jakub Kicinski
9fb424c4c2 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2022-06-17

We've added 72 non-merge commits during the last 15 day(s) which contain
a total of 92 files changed, 4582 insertions(+), 834 deletions(-).

The main changes are:

1) Add 64 bit enum value support to BTF, from Yonghong Song.

2) Implement support for sleepable BPF uprobe programs, from Delyan Kratunov.

3) Add new BPF helpers to issue and check TCP SYN cookies without binding to a
   socket especially useful in synproxy scenarios, from Maxim Mikityanskiy.

4) Fix libbpf's internal USDT address translation logic for shared libraries as
   well as uprobe's symbol file offset calculation, from Andrii Nakryiko.

5) Extend libbpf to provide an API for textual representation of the various
   map/prog/attach/link types and use it in bpftool, from Daniel Müller.

6) Provide BTF line info for RV64 and RV32 JITs, and fix a put_user bug in the
   core seen in 32 bit when storing BPF function addresses, from Pu Lehui.

7) Fix libbpf's BTF pointer size guessing by adding a list of various aliases
   for 'long' types, from Douglas Raillard.

8) Fix bpftool to readd setting rlimit since probing for memcg-based accounting
   has been unreliable and caused a regression on COS, from Quentin Monnet.

9) Fix UAF in BPF cgroup's effective program computation triggered upon BPF link
   detachment, from Tadeusz Struk.

10) Fix bpftool build bootstrapping during cross compilation which was pointing
    to the wrong AR process, from Shahab Vahedi.

11) Fix logic bug in libbpf's is_pow_of_2 implementation, from Yuze Chi.

12) BPF hash map optimization to avoid grabbing spinlocks of all CPUs when there
    is no free element. Also add a benchmark as reproducer, from Feng Zhou.

13) Fix bpftool's codegen to bail out when there's no BTF, from Michael Mullin.

14) Various minor cleanup and improvements all over the place.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (72 commits)
  bpf: Fix bpf_skc_lookup comment wrt. return type
  bpf: Fix non-static bpf_func_proto struct definitions
  selftests/bpf: Don't force lld on non-x86 architectures
  selftests/bpf: Add selftests for raw syncookie helpers in TC mode
  bpf: Allow the new syncookie helpers to work with SKBs
  selftests/bpf: Add selftests for raw syncookie helpers
  bpf: Add helpers to issue and check SYN cookies in XDP
  bpf: Allow helpers to accept pointers with a fixed size
  bpf: Fix documentation of th_len in bpf_tcp_{gen,check}_syncookie
  selftests/bpf: add tests for sleepable (uk)probes
  libbpf: add support for sleepable uprobe programs
  bpf: allow sleepable uprobe programs to attach
  bpf: implement sleepable uprobes by chaining gps
  bpf: move bpf_prog to bpf.h
  libbpf: Fix internal USDT address translation logic for shared libraries
  samples/bpf: Check detach prog exist or not in xdp_fwd
  selftests/bpf: Avoid skipping certain subtests
  selftests/bpf: Fix test_varlen verification failure with latest llvm
  bpftool: Do not check return value from libbpf_set_strict_mode()
  Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
  ...
====================

Link: https://lore.kernel.org/r/20220617220836.7373-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17 19:35:19 -07:00
Jakub Kicinski
582573f1b2 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2022-06-17

We've added 12 non-merge commits during the last 4 day(s) which contain
a total of 14 files changed, 305 insertions(+), 107 deletions(-).

The main changes are:

1) Fix x86 JIT tailcall count offset on BPF-2-BPF call, from Jakub Sitnicki.

2) Fix a kprobe_multi link bug which misplaces BPF cookies, from Jiri Olsa.

3) Fix an infinite loop when processing a module's BTF, from Kumar Kartikeya Dwivedi.

4) Fix getting a rethook only in RCU available context, from Masami Hiramatsu.

5) Fix request socket refcount leak in sk lookup helpers, from Jon Maxwell.

6) Fix xsk xmit behavior which wrongly adds skb to already full cq, from Ciara Loftus.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  rethook: Reject getting a rethook if RCU is not watching
  fprobe, samples: Add use_trace option and show hit/missed counter
  bpf, docs: Update some of the JIT/maintenance entries
  selftest/bpf: Fix kprobe_multi bench test
  bpf: Force cookies array to follow symbols sorting
  ftrace: Keep address offset in ftrace_lookup_symbols
  selftests/bpf: Shuffle cookies symbols in kprobe multi test
  selftests/bpf: Test tail call counting with bpf2bpf and data on stack
  bpf, x86: Fix tail call count offset calculation on bpf2bpf call
  bpf: Limit maximum modifier chain length in btf_check_type_tags
  bpf: Fix request_sock leak in sk lookup helpers
  xsk: Fix generic transmit when completion queue reservation fails
====================

Link: https://lore.kernel.org/r/20220617202119.2421-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17 18:30:01 -07:00
Riccardo Paolo Bestetti
b4a028c4d0 ipv4: ping: fix bind address validity check
Commit 8ff978b8b2 ("ipv4/raw: support binding to nonlocal addresses")
introduced a helper function to fold duplicated validity checks of bind
addresses into inet_addr_valid_or_nonlocal(). However, this caused an
unintended regression in ping_check_bind_addr(), which previously would
reject binding to multicast and broadcast addresses, but now these are
both incorrectly allowed as reported in [1].

This patch restores the original check. A simple reordering is done to
improve readability and make it evident that multicast and broadcast
addresses should not be allowed. Also, add an early exit for INADDR_ANY
which replaces lost behavior added by commit 0ce779a9f5 ("net: Avoid
unnecessary inet_addr_type() call when addr is INADDR_ANY").

Furthermore, this patch introduces regression selftests to catch these
specific cases.

[1] https://lore.kernel.org/netdev/CANP3RGdkAcDyAZoT1h8Gtuu0saq+eOrrTiWbxnOs+5zn+cpyKg@mail.gmail.com/

Fixes: 8ff978b8b2 ("ipv4/raw: support binding to nonlocal addresses")
Cc: Miaohe Lin <linmiaohe@huawei.com>
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Riccardo Paolo Bestetti <pbl@bestov.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17 11:41:34 +01:00
Ido Schimmel
ed62af4546 selftests: spectrum-2: tc_flower_scale: Dynamically set scale target
Instead of hard coding the scale target in the test, dynamically set it
based on the maximum number of flow counters and their current
occupancy.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17 10:31:33 +01:00
Petr Machata
be00853bfd selftests: mlxsw: Add a RIF counter scale test
This tests creates as many RIFs as possible, ideally more than there can be
RIF counters (though that is currently only possible on Spectrum-1). It
then tries to enable L3 HW stats on each of the RIFs. It also contains the
traffic test, which tries to run traffic through a log2 of those counters
and checks that the traffic is shown in the counter values.

Like with tc_flower traffic test, take a log2 subset of rules. The logic
behind picking log2 rules is that then every bit of the instantiated item's
number is exercised. This should catch issues whether they happen at the
high end, low end, or somewhere in between.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17 10:31:33 +01:00
Petr Machata
dd5d20e17c selftests: mlxsw: tc_flower_scale: Add a traffic test
Add a test that checks that the created filters do actually trigger on
matching traffic.

Exercising all the rules would be a very lengthy process. Instead, take a
log2 subset of rules. The logic behind picking log2 rules is that then
every bit of the instantiated item's number is exercised. This should catch
issues whether they happen at the high end, low end, or somewhere in
between.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17 10:31:33 +01:00
Petr Machata
35d5829e86 selftests: mlxsw: resource_scale: Pass target count to cleanup
The scale tests are verifying behavior of mlxsw when number of instances of
some resource reaches the ASIC capacity. The number of instances is
referred to as "target" number.

No scale tests so far needed to know this target number to clean up. E.g.
the tc_flower simply removes the clsact qdisc that all the tested filters
are hooked onto, and that takes care of collecting all the filters.

However, for the RIF counter test, which is being added in a future patch,
VLAN netdevices are created. These are created as part of the test, but of
course the cleanup needs to undo them again. For that it needs to know how
many there were. To support this usage, pass the target number to the
cleanup callback.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17 10:31:33 +01:00
Petr Machata
8cad339db3 selftests: mlxsw: resource_scale: Allow skipping a test
The scale tests are currently testing two things: that some number of
instances of a given resource can actually be created; and that when an
attempt is made to create more than the supported amount, the failures are
noted and handled gracefully.

Sometimes the scale test depends on more than one resource. In particular,
a following patch will add a RIF counter scale test, which depends on the
number of RIF counters that can be bound, and also on the number of RIFs
that can be created.

When the test is limited by the auxiliary resource and not by the primary
one, there's no point trying to run the overflow test, because it would be
testing exhaustion of the wrong resource.

To support this use case, when the $test_get_target yields 0, skip the test
instead.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17 10:31:33 +01:00
Petr Machata
3128b9f51e selftests: mlxsw: resource_scale: Introduce traffic tests
The scale tests are currently testing two things: that some number of
instances of a given resource can actually be created; and that when an
attempt is made to create more than the supported amount, the failures are
noted and handled gracefully.

However the ability to allocate the resource does not mean that the
resource actually works when passing traffic. For that, make it possible
for a given scale to also test traffic.

Traffic test is only run on the positive leg of the scale test (no point
trying to pass traffic when the expected outcome is that the resource will
not be allocated). Traffic tests are opt-in, if a given test does not
expose it, it is not run.

To this end, delay the test cleanup until after the traffic test is run.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17 10:31:33 +01:00
Ido Schimmel
d3ffeb2dba selftests: mlxsw: resource_scale: Update scale target after test setup
The scale of each resource is tested in the following manner:

1. The scale target is queried.
2. The test setup is prepared.
3. The test is invoked.

In some cases, the occupancy of a resource changes as part of the second
step, requiring the test to return a scale target that takes this change
into account.

Make this more robust by re-querying the scale target after the second
step.

Another possible solution is to swap the first and second steps, but
when a test needs to be skipped (i.e., scale target is zero), the setup
would have been in vain.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17 10:31:33 +01:00
Amit Cohen
e386a527fc selftests: mirror_gre_bridge_1q_lag: Enslave port to bridge before other configurations
Using mlxsw driver, the configurations are offloaded just in case that
there is a physical port which is enslaved to the virtual device
(e.g., to a bridge). In 'mirror_gre_bridge_1q_lag' test, the bridge gets an
address and route before there are ports in the bridge. It means that these
configurations are not offloaded.

Till now the test passes with mlxsw driver even that the RIF of the
bridge is not in the hardware, because the ARP packets are trapped in
layer 2 and also mirrored, so there is no real need of the RIF in hardware.
The previous patch changed the traps 'ARP_REQUEST' and 'ARP_RESPONSE' to
be done at layer 3 instead of layer 2. With this change the ARP packets are
not trapped during the test, as the RIF is not in the hardware because of
the order of configurations.

Reorder the configurations to make them to be offloaded, then the test will
pass with the change of the traps.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17 10:31:33 +01:00
Andrii Nakryiko
08c79c9cd6 selftests/bpf: Don't force lld on non-x86 architectures
LLVM's lld linker doesn't have a universal architecture support (e.g.,
it definitely doesn't work on s390x), so be safe and force lld for
urandom_read and liburandom_read.so only on x86 architectures.

This should fix s390x CI runs.

Fixes: 3e6fe5ce4d ("libbpf: Fix internal USDT address translation logic for shared libraries")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220617045512.1339795-1-andrii@kernel.org
2022-06-17 10:16:01 +02:00
Maxim Mikityanskiy
784d5dc0ef selftests/bpf: Add selftests for raw syncookie helpers in TC mode
This commit extends selftests for the new BPF helpers
bpf_tcp_raw_{gen,check}_syncookie_ipv{4,6} to also test the TC BPF
functionality added in the previous commit.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220615134847.3753567-7-maximmi@nvidia.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16 21:20:30 -07:00
Maxim Mikityanskiy
fb5cd0ce70 selftests/bpf: Add selftests for raw syncookie helpers
This commit adds selftests for the new BPF helpers:
bpf_tcp_raw_{gen,check}_syncookie_ipv{4,6}.

xdp_synproxy_kern.c is a BPF program that generates SYN cookies on
allowed TCP ports and sends SYNACKs to clients, accelerating synproxy
iptables module.

xdp_synproxy.c is a userspace control application that allows to
configure the following options in runtime: list of allowed ports, MSS,
window scale, TTL.

A selftest is added to prog_tests that leverages the above programs to
test the functionality of the new helpers.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220615134847.3753567-5-maximmi@nvidia.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16 21:20:30 -07:00
Maxim Mikityanskiy
33bf988504 bpf: Add helpers to issue and check SYN cookies in XDP
The new helpers bpf_tcp_raw_{gen,check}_syncookie_ipv{4,6} allow an XDP
program to generate SYN cookies in response to TCP SYN packets and to
check those cookies upon receiving the first ACK packet (the final
packet of the TCP handshake).

Unlike bpf_tcp_{gen,check}_syncookie these new helpers don't need a
listening socket on the local machine, which allows to use them together
with synproxy to accelerate SYN cookie generation.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220615134847.3753567-4-maximmi@nvidia.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16 21:20:30 -07:00
Maxim Mikityanskiy
ac80287a6a bpf: Fix documentation of th_len in bpf_tcp_{gen,check}_syncookie
bpf_tcp_gen_syncookie expects the full length of the TCP header (with
all options), and bpf_tcp_check_syncookie accepts lengths bigger than
sizeof(struct tcphdr). Fix the documentation that says these lengths
should be exactly sizeof(struct tcphdr).

While at it, fix a typo in the name of struct ipv6hdr.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220615134847.3753567-2-maximmi@nvidia.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16 21:20:29 -07:00
Jiri Olsa
730067022c selftest/bpf: Fix kprobe_multi bench test
With [1] the available_filter_functions file contains records
starting with __ftrace_invalid_address___ and marking disabled
entries.

We need to filter them out for the bench test to pass only
resolvable symbols to kernel.

[1] commit b39181f7c6 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function")

Fixes: b39181f7c6 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220615112118.497303-5-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16 19:42:21 -07:00
Jiri Olsa
ad8848535e selftests/bpf: Shuffle cookies symbols in kprobe multi test
There's a kernel bug that causes cookies to be misplaced and
the reason we did not catch this with this test is that we
provide bpf_fentry_test* functions already sorted by name.

Shuffling function bpf_fentry_test2 deeper in the list and
keeping the current cookie values as before will trigger
the bug.

The kernel fix is coming in following changes.

Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220615112118.497303-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16 19:42:21 -07:00
Delyan Kratunov
cb3f4a4a46 selftests/bpf: add tests for sleepable (uk)probes
Add tests that ensure sleepable uprobe programs work correctly.
Add tests that ensure sleepable kprobe programs cannot attach.

Also add tests that attach both sleepable and non-sleepable
uprobe programs to the same location (i.e. same bpf_prog_array).

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Delyan Kratunov <delyank@fb.com>
Link: https://lore.kernel.org/r/c744e5bb7a5c0703f05444dc41f2522ba3579a48.1655248076.git.delyank@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16 19:27:30 -07:00
Delyan Kratunov
c4cac71fc8 libbpf: add support for sleepable uprobe programs
Add section mappings for u(ret)probe.s programs.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Delyan Kratunov <delyank@fb.com>
Link: https://lore.kernel.org/r/aedbc3b74f3523f00010a7b0df8f3388cca59f16.1655248076.git.delyank@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16 19:27:30 -07:00
Andrii Nakryiko
3e6fe5ce4d libbpf: Fix internal USDT address translation logic for shared libraries
Perform the same virtual address to file offset translation that libbpf
is doing for executable ELF binaries also for shared libraries.
Currently libbpf is making a simplifying and sometimes wrong assumption
that for shared libraries relative virtual addresses inside ELF are
always equal to file offsets.

Unfortunately, this is not always the case with LLVM's lld linker, which
now by default generates quite more complicated ELF segments layout.
E.g., for liburandom_read.so from selftests/bpf, here's an excerpt from
readelf output listing ELF segments (a.k.a. program headers):

  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000000040 0x0000000000000040 0x0001f8 0x0001f8 R   0x8
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x0005e4 0x0005e4 R   0x1000
  LOAD           0x0005f0 0x00000000000015f0 0x00000000000015f0 0x000160 0x000160 R E 0x1000
  LOAD           0x000750 0x0000000000002750 0x0000000000002750 0x000210 0x000210 RW  0x1000
  LOAD           0x000960 0x0000000000003960 0x0000000000003960 0x000028 0x000029 RW  0x1000

Compare that to what is generated by GNU ld (or LLVM lld's with extra
-znoseparate-code argument which disables this cleverness in the name of
file size reduction):

  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x000550 0x000550 R   0x1000
  LOAD           0x001000 0x0000000000001000 0x0000000000001000 0x000131 0x000131 R E 0x1000
  LOAD           0x002000 0x0000000000002000 0x0000000000002000 0x0000ac 0x0000ac R   0x1000
  LOAD           0x002dc0 0x0000000000003dc0 0x0000000000003dc0 0x000262 0x000268 RW  0x1000

You can see from the first example above that for executable (Flg == "R E")
PT_LOAD segment (LOAD #2), Offset doesn't match VirtAddr columns.
And it does in the second case (GNU ld output).

This is important because all the addresses, including USDT specs,
operate in a virtual address space, while kernel is expecting file
offsets when performing uprobe attach. So such mismatches have to be
properly taken care of and compensated by libbpf, which is what this
patch is fixing.

Also patch clarifies few function and variable names, as well as updates
comments to reflect this important distinction (virtaddr vs file offset)
and to ephasize that shared libraries are not all that different from
executables in this regard.

This patch also changes selftests/bpf Makefile to force urand_read and
liburand_read.so to be built with Clang and LLVM's lld (and explicitly
request this ELF file size optimization through -znoseparate-code linker
parameter) to validate libbpf logic and ensure regressions don't happen
in the future. I've bundled these selftests changes together with libbpf
changes to keep the above description tied with both libbpf and
selftests changes.

Fixes: 74cc6311ce ("libbpf: Add USDT notes parsing and resolution logic")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220616055543.3285835-1-andrii@kernel.org
2022-06-17 01:20:10 +02:00
Joel Savitz
9b4d5c01eb selftests: make use of GUP_TEST_FILE macro
Commit 17de1e559c ("selftests: clarify common error when running
gup_test") had most of its hunks dropped due to a conflict with another
patch accepted into Linux around the same time that implemented the same
behavior as a subset of other changes.

However, the remaining hunk defines the GUP_TEST_FILE macro without
making use of it. This patch makes use of the macro in the two relevant
places.

Furthermore, the above mentioned commit's log message erroneously describes
the changes that were dropped from the patch.

This patch corrects the record.

Fixes: 17de1e559c ("selftests: clarify common error when running gup_test")

Signed-off-by: Joel Savitz <jsavitz@redhat.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Nico Pache <npache@redhat.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-06-16 17:05:50 -06:00
Ding Xiang
3084a4ec7f selftests: vm: Fix resource leak when return error
When return on an error path, file handle need to be closed
to prevent resource leak

Signed-off-by: Ding Xiang <dingxiang@cmss.chinamobile.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-06-16 14:14:08 -06:00
Yu Liao
12a29115be selftests dma: fix compile error for dma_map_benchmark
When building selftests/dma:
$ make -C tools/testing/selftests TARGETS=dma
I hit the following compilation error:

dma_map_benchmark.c:13:10: fatal error: linux/map_benchmark.h: No such file or directory
 #include <linux/map_benchmark.h>
          ^~~~~~~~~~~~~~~~~~~~~~~

dma/Makefile does not include the map_benchmark.h path, so add
more including path, and fix include order in dma_map_benchmark.c

Fixes: 8ddde07a3d ("dma-mapping: benchmark: extract a common header file for map_benchmark definition")
Signed-off-by: Yu Liao <liaoyu15@huawei.com>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-06-16 14:03:21 -06:00
Jakub Sitnicki
5e0b0a4c52 selftests/bpf: Test tail call counting with bpf2bpf and data on stack
Cover the case when tail call count needs to be passed from BPF function to
BPF function, and the caller has data on stack. Specifically when the size
of data allocated on BPF stack is not a multiple on 8.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220616162037.535469-3-jakub@cloudflare.com
2022-06-16 21:49:05 +02:00
Linus Torvalds
48a23ec6ff Mostly driver fixes.
Current release - regressions:
 
  - Revert "net: Add a second bind table hashed by port and address",
    needs more work
 
  - amd-xgbe: use platform_irq_count(), static setup of IRQ resources
    had been removed from DT core
 
  - dts: at91: ksz9477_evb: add phy-mode to fix port/phy validation
 
 Current release - new code bugs:
 
  - hns3: modify the ring param print info
 
 Previous releases - always broken:
 
  - axienet: make the 64b addressable DMA depends on 64b architectures
 
  - iavf: fix issue with MAC address of VF shown as zero
 
  - ice: fix PTP TX timestamp offset calculation
 
  - usb: ax88179_178a needs FLAG_SEND_ZLP
 
 Misc:
 
  - document some net.sctp.* sysctls
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmKrcx8ACgkQMUZtbf5S
 IrsKjA/9Ho+cxnGAvx7ngQepqAU8RQDFy7sQoFHiGqs+jMeph/E81PM2QDBR9g9h
 k/s7YRpLGuxWFT7KUJScNl0ZyPgSk5EHcqy202ToYyDQv+srLnh5bgbRykMF2Unc
 D4mf63a2pNo9S0L1PmMz87p+XaWIwblqQ0wbl5F97e7eAWel+y7rPCBqR0lZ9Il7
 w8rZp6iOVOhD495s1ikqOYUVCntepC9MQIo8iIE/WrREiOWmZNNbV8RzvuHRNQs6
 j9eLsukKwTfekQbzR3SXbYxyjwRowAQ3bD5sEL3MuqflsxRpVm5lEqN0AuVlAo3C
 IJZFSFqnusC4cSUYVdfWhYlx8om+uw4XKzfqQD/T7yobjoVA/Mmt/Uf7Mw6krR+g
 bI+/bpgX7WpLYQNtBFAils5pY36pthN+zg9FuU0v7tNLgC3AmQqA8sRI/fRCVJFV
 b1Wmk6Ldj1lCynX0KpzU6XSFGzP2Ht9CYReImiwvZbaABIoM14woHhRPrh8UGWIY
 sdpLoR+XRyL/0N1W7l0FgbGm/zOaEbh8fo0ZGYHLukXPUby6osiV36frzjxOj/NO
 DqNkPq4ajfWFvcWdqbfRKXwpLyM/Ki2WpQjvaNzDLOL74sDspr8wjnIOOLbuHv/8
 NW6tcWwfIu9nkDJOpRedh+O2gj6FKdruobdKVgQd376J0kxWLv0=
 =JSV9
 -----END PGP SIGNATURE-----

Merge tag 'net-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Mostly driver fixes.

  Current release - regressions:

   - Revert "net: Add a second bind table hashed by port and address",
     needs more work

   - amd-xgbe: use platform_irq_count(), static setup of IRQ resources
     had been removed from DT core

   - dts: at91: ksz9477_evb: add phy-mode to fix port/phy validation

  Current release - new code bugs:

   - hns3: modify the ring param print info

  Previous releases - always broken:

   - axienet: make the 64b addressable DMA depends on 64b architectures

   - iavf: fix issue with MAC address of VF shown as zero

   - ice: fix PTP TX timestamp offset calculation

   - usb: ax88179_178a needs FLAG_SEND_ZLP

  Misc:

   - document some net.sctp.* sysctls"

* tag 'net-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (31 commits)
  net: axienet: add missing error return code in axienet_probe()
  Revert "net: Add a second bind table hashed by port and address"
  net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsg
  net: usb: ax88179_178a needs FLAG_SEND_ZLP
  MAINTAINERS: add include/dt-bindings/net to NETWORKING DRIVERS
  ARM: dts: at91: ksz9477_evb: fix port/phy validation
  net: bgmac: Fix an erroneous kfree() in bgmac_remove()
  ice: Fix memory corruption in VF driver
  ice: Fix queue config fail handling
  ice: Sync VLAN filtering features for DVM
  ice: Fix PTP TX timestamp offset calculation
  mlxsw: spectrum_cnt: Reorder counter pools
  docs: networking: phy: Fix a typo
  amd-xgbe: Use platform_irq_count()
  octeontx2-vf: Add support for adaptive interrupt coalescing
  xilinx:  Fix build on x86.
  net: axienet: Use iowrite64 to write all 64b descriptor pointers
  net: axienet: make the 64b addresable DMA depends on 64b archectures
  net: hns3: fix tm port shapping of fibre port is incorrect after driver initialization
  net: hns3: fix PF rss size initialization bug
  ...
2022-06-16 11:51:32 -07:00
Joanne Koong
593d1ebe00 Revert "net: Add a second bind table hashed by port and address"
This reverts:

commit d5a42de8bd ("net: Add a second bind table hashed by port and address")
commit 538aaf9b23 ("selftests: Add test for timing a bind request to a port with a populated bhash entry")
Link: https://lore.kernel.org/netdev/20220520001834.2247810-1-kuba@kernel.org/

There are a few things that need to be fixed here:
* Updating bhash2 in cases where the socket's rcv saddr changes
* Adding bhash2 hashbucket locks

Links to syzbot reports:
https://lore.kernel.org/netdev/00000000000022208805e0df247a@google.com/
https://lore.kernel.org/netdev/0000000000003f33bc05dfaf44fe@google.com/

Fixes: d5a42de8bd ("net: Add a second bind table hashed by port and address")
Reported-by: syzbot+015d756bbd1f8b5c8f09@syzkaller.appspotmail.com
Reported-by: syzbot+98fd2d1422063b0f8c44@syzkaller.appspotmail.com
Reported-by: syzbot+0a847a982613c6438fba@syzkaller.appspotmail.com
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20220615193213.2419568-1-joannelkoong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-16 11:07:59 -07:00
Dmitry Klochkov
933b5f9f98 tools/kvm_stat: fix display of error when multiple processes are found
Instead of printing an error message, kvm_stat script fails when we
restrict statistics to a guest by its name and there are multiple guests
with such name:

  # kvm_stat -g my_vm
  Traceback (most recent call last):
    File "/usr/bin/kvm_stat", line 1819, in <module>
      main()
    File "/usr/bin/kvm_stat", line 1779, in main
      options = get_options()
    File "/usr/bin/kvm_stat", line 1718, in get_options
      options = argparser.parse_args()
    File "/usr/lib64/python3.10/argparse.py", line 1825, in parse_args
      args, argv = self.parse_known_args(args, namespace)
    File "/usr/lib64/python3.10/argparse.py", line 1858, in parse_known_args
      namespace, args = self._parse_known_args(args, namespace)
    File "/usr/lib64/python3.10/argparse.py", line 2067, in _parse_known_args
      start_index = consume_optional(start_index)
    File "/usr/lib64/python3.10/argparse.py", line 2007, in consume_optional
      take_action(action, args, option_string)
    File "/usr/lib64/python3.10/argparse.py", line 1935, in take_action
      action(self, namespace, argument_values, option_string)
    File "/usr/bin/kvm_stat", line 1649, in __call__
      ' to specify the desired pid'.format(" ".join(pids)))
  TypeError: sequence item 0: expected str instance, int found

To avoid this, it's needed to convert pids int values to strings before
pass them to join().

Signed-off-by: Dmitry Klochkov <kdmitry556@gmail.com>
Message-Id: <20220614121141.160689-1-kdmitry556@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-15 08:14:20 -04:00
Yonghong Song
3831cd1f9f selftests/bpf: Avoid skipping certain subtests
Commit 704c91e59f ('selftests/bpf: Test "bpftool gen min_core_btf"')
added a test test_core_btfgen to test core relocation with btf
generated with 'bpftool gen min_core_btf'. Currently,
among 76 subtests, 25 are skipped.

  ...
  #46/69   core_reloc_btfgen/enumval:OK
  #46/70   core_reloc_btfgen/enumval___diff:OK
  #46/71   core_reloc_btfgen/enumval___val3_missing:OK
  #46/72   core_reloc_btfgen/enumval___err_missing:SKIP
  #46/73   core_reloc_btfgen/enum64val:OK
  #46/74   core_reloc_btfgen/enum64val___diff:OK
  #46/75   core_reloc_btfgen/enum64val___val3_missing:OK
  #46/76   core_reloc_btfgen/enum64val___err_missing:SKIP
  ...
  #46      core_reloc_btfgen:SKIP
  Summary: 1/51 PASSED, 25 SKIPPED, 0 FAILED

Alexei found that in the above core_reloc_btfgen/enum64val___err_missing
should not be skipped.

Currently, the core_reloc tests have some negative tests.
In Commit 704c91e59f, for core_reloc_btfgen, all negative tests
are skipped with the following condition
  if (!test_case->btf_src_file || test_case->fails) {
	test__skip();
	continue;
  }
This is too conservative. Negative tests do not fail
mkstemp() and run_btfgen() should not be skipped.
There are a few negative tests indeed failing run_btfgen()
and this patch added 'run_btfgen_fails' to mark these tests
so that they can be skipped for btfgen tests. With this,
we have
  ...
  #46/69   core_reloc_btfgen/enumval:OK
  #46/70   core_reloc_btfgen/enumval___diff:OK
  #46/71   core_reloc_btfgen/enumval___val3_missing:OK
  #46/72   core_reloc_btfgen/enumval___err_missing:OK
  #46/73   core_reloc_btfgen/enum64val:OK
  #46/74   core_reloc_btfgen/enum64val___diff:OK
  #46/75   core_reloc_btfgen/enum64val___val3_missing:OK
  #46/76   core_reloc_btfgen/enum64val___err_missing:OK
  ...
  Summary: 1/62 PASSED, 14 SKIPPED, 0 FAILED

Totally 14 subtests are skipped instead of 25.

Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220614055526.628299-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-14 16:18:29 -07:00
Yonghong Song
96752e1ec0 selftests/bpf: Fix test_varlen verification failure with latest llvm
With latest llvm15, test_varlen failed with the following verifier log:

  17: (85) call bpf_probe_read_kernel_str#115   ; R0_w=scalar(smin=-4095,smax=256)
  18: (bf) r1 = r0                      ; R0_w=scalar(id=1,smin=-4095,smax=256) R1_w=scalar(id=1,smin=-4095,smax=256)
  19: (67) r1 <<= 32                    ; R1_w=scalar(smax=1099511627776,umax=18446744069414584320,var_off=(0x0; 0xffffffff00000000),s32_min=0,s32_max=0,u32_max=)
  20: (bf) r2 = r1                      ; R1_w=scalar(id=2,smax=1099511627776,umax=18446744069414584320,var_off=(0x0; 0xffffffff00000000),s32_min=0,s32_max=0,u32)
  21: (c7) r2 s>>= 32                   ; R2=scalar(smin=-2147483648,smax=256)
  ; if (len >= 0) {
  22: (c5) if r2 s< 0x0 goto pc+7       ; R2=scalar(umax=256,var_off=(0x0; 0x1ff))
  ; payload4_len1 = len;
  23: (18) r2 = 0xffffc90000167418      ; R2_w=map_value(off=1048,ks=4,vs=1572,imm=0)
  25: (63) *(u32 *)(r2 +0) = r0         ; R0=scalar(id=1,smin=-4095,smax=256) R2_w=map_value(off=1048,ks=4,vs=1572,imm=0)
  26: (77) r1 >>= 32                    ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
  ; payload += len;
  27: (18) r6 = 0xffffc90000167424      ; R6_w=map_value(off=1060,ks=4,vs=1572,imm=0)
  29: (0f) r6 += r1                     ; R1_w=Pscalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R6_w=map_value(off=1060,ks=4,vs=1572,umax=4294967295,var_off=(0)
  ; len = bpf_probe_read_kernel_str(payload, MAX_LEN, &buf_in2[0]);
  30: (bf) r1 = r6                      ; R1_w=map_value(off=1060,ks=4,vs=1572,umax=4294967295,var_off=(0x0; 0xffffffff)) R6_w=map_value(off=1060,ks=4,vs=1572,um)
  31: (b7) r2 = 256                     ; R2_w=256
  32: (18) r3 = 0xffffc90000164100      ; R3_w=map_value(off=256,ks=4,vs=1056,imm=0)
  34: (85) call bpf_probe_read_kernel_str#115
  R1 unbounded memory access, make sure to bounds check any such access
  processed 27 insns (limit 1000000) max_states_per_insn 0 total_states 2 peak_states 2 mark_read 1
  -- END PROG LOAD LOG --
  libbpf: failed to load program 'handler32_signed'

The failure is due to
  20: (bf) r2 = r1                      ; R1_w=scalar(id=2,smax=1099511627776,umax=18446744069414584320,var_off=(0x0; 0xffffffff00000000),s32_min=0,s32_max=0,u32)
  21: (c7) r2 s>>= 32                   ; R2=scalar(smin=-2147483648,smax=256)
  22: (c5) if r2 s< 0x0 goto pc+7       ; R2=scalar(umax=256,var_off=(0x0; 0x1ff))
  26: (77) r1 >>= 32                    ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
  29: (0f) r6 += r1                     ; R1_w=Pscalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R6_w=map_value(off=1060,ks=4,vs=1572,umax=4294967295,var_off=(0)
where r1 has conservative value range compared to r2 and r1 is used later.

In llvm, commit [1] triggered the above code generation and caused
verification failure.

It may take a while for llvm to address this issue. In the main time,
let us change the variable 'len' type to 'long' and adjust condition properly.
Tested with llvm14 and latest llvm15, both worked fine.

 [1] https://reviews.llvm.org/D126647

Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220613233449.2860753-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-14 16:17:07 -07:00
Quentin Monnet
93270357da bpftool: Do not check return value from libbpf_set_strict_mode()
The function always returns 0, so we don't need to check whether the
return value is 0 or not.

This change was first introduced in commit a777e18f1b ("bpftool: Use
libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"), but later reverted to
restore the unconditional rlimit bump in bpftool. Let's re-add it.

Co-developed-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220610112648.29695-3-quentin@isovalent.com
2022-06-14 22:18:56 +02:00
Quentin Monnet
6b4384ff10 Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
This reverts commit a777e18f1b.

In commit a777e18f1b ("bpftool: Use libbpf 1.0 API mode instead of
RLIMIT_MEMLOCK"), we removed the rlimit bump in bpftool, because the
kernel has switched to memcg-based memory accounting. Thanks to the
LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK, we attempted to keep compatibility
with other systems and ask libbpf to raise the limit for us if
necessary.

How do we know if memcg-based accounting is supported? There is a probe
in libbpf to check this. But this probe currently relies on the
availability of a given BPF helper, bpf_ktime_get_coarse_ns(), which
landed in the same kernel version as the memory accounting change. This
works in the generic case, but it may fail, for example, if the helper
function has been backported to an older kernel. This has been observed
for Google Cloud's Container-Optimized OS (COS), where the helper is
available but rlimit is still in use. The probe succeeds, the rlimit is
not raised, and probing features with bpftool, for example, fails.

A patch was submitted [0] to update this probe in libbpf, based on what
the cilium/ebpf Go library does [1]. It would lower the soft rlimit to
0, attempt to load a BPF object, and reset the rlimit. But it may induce
some hard-to-debug flakiness if another process starts, or the current
application is killed, while the rlimit is reduced, and the approach was
discarded.

As a workaround to ensure that the rlimit bump does not depend on the
availability of a given helper, we restore the unconditional rlimit bump
in bpftool for now.

  [0] https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/
  [1] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Yafang Shao <laoar.shao@gmail.com>
Cc: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20220610112648.29695-2-quentin@isovalent.com
2022-06-14 22:18:06 +02:00
Mark Brown
795285ef24 selftests: Fix clang cross compilation
Unlike GCC clang uses a single compiler image to support multiple target
architectures meaning that we can't simply rely on CROSS_COMPILE to select
the output architecture. Instead we must pass --target to the compiler to
tell it what to output, kselftest was not doing this so cross compilation
of kselftest using clang resulted in kselftest being built for the host
architecture.

More work is required to fix tests using custom rules but this gets the
bulk of things building.

Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-06-14 11:24:24 -06:00
Yonghong Song
c49a44b39b libbpf: Fix an unsigned < 0 bug
Andrii reported a bug with the following information:

  2859 	if (enum64_placeholder_id == 0) {
  2860 		enum64_placeholder_id = btf__add_int(btf, "enum64_placeholder", 1, 0);
  >>>     CID 394804:  Control flow issues  (NO_EFFECT)
  >>>     This less-than-zero comparison of an unsigned value is never true. "enum64_placeholder_id < 0U".
  2861 		if (enum64_placeholder_id < 0)
  2862 			return enum64_placeholder_id;
  2863    	...

Here enum64_placeholder_id declared as '__u32' so enum64_placeholder_id < 0
is always false. Declare enum64_placeholder_id as 'int' in order to capture
the potential error properly.

Fixes: f2a625889b ("libbpf: Add enum64 sanitization")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220613054314.1251905-1-yhs@fb.com
2022-06-14 17:01:54 +02:00
Linus Torvalds
24625f7d91 ARM64:
* Properly reset the SVE/SME flags on vcpu load
 
 * Fix a vgic-v2 regression regarding accessing the pending
 state of a HW interrupt from userspace (and make the code
 common with vgic-v3)
 
 * Fix access to the idreg range for protected guests
 
 * Ignore 'kvm-arm.mode=protected' when using VHE
 
 * Return an error from kvm_arch_init_vm() on allocation failure
 
 * A bunch of small cleanups (comments, annotations, indentation)
 
 RISC-V:
 
 * Typo fix in arch/riscv/kvm/vmid.c
 
 * Remove broken reference pattern from MAINTAINERS entry
 
 x86-64:
 
 * Fix error in page tables with MKTME enabled
 
 * Dirty page tracking performance test extended to running a nested
   guest
 
 * Disable APICv/AVIC in cases that it cannot implement correctly
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmKjTIAUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNhPQgAiIVtp8aepujUM/NhkNyK3SIdLzlS
 oZCZiS6bvaecKXi/QvhBU0EBxAEyrovk3lmVuYNd41xI+PDjyaA4SDIl5DnToGUw
 bVPNFSYqjpF939vUUKjc0RCdZR4o5g3Od3tvWoHTHviS1a8aAe5o9pcpHpD0D6Mp
 Gc/o58nKAOPl3htcFKmjymqo3Y6yvkJU9NB7DCbL8T5mp5pJ959Mw1/LlmBaAzJC
 OofrynUm4NjMyAj/mAB1FhHKFyQfjBXLhiVlS0SLiiEA/tn9/OXyVFMKG+n5VkAZ
 Q337GMFe2RikEIuMEr3Rc4qbZK3PpxHhaj+6MPRuM0ho/P4yzl2Nyb/OhA==
 =h81Q
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "While last week's pull request contained miscellaneous fixes for x86,
  this one covers other architectures, selftests changes, and a bigger
  series for APIC virtualization bugs that were discovered during 5.20
  development. The idea is to base 5.20 development for KVM on top of
  this tag.

  ARM64:

   - Properly reset the SVE/SME flags on vcpu load

   - Fix a vgic-v2 regression regarding accessing the pending state of a
     HW interrupt from userspace (and make the code common with vgic-v3)

   - Fix access to the idreg range for protected guests

   - Ignore 'kvm-arm.mode=protected' when using VHE

   - Return an error from kvm_arch_init_vm() on allocation failure

   - A bunch of small cleanups (comments, annotations, indentation)

  RISC-V:

   - Typo fix in arch/riscv/kvm/vmid.c

   - Remove broken reference pattern from MAINTAINERS entry

  x86-64:

   - Fix error in page tables with MKTME enabled

   - Dirty page tracking performance test extended to running a nested
     guest

   - Disable APICv/AVIC in cases that it cannot implement correctly"

[ This merge also fixes a misplaced end parenthesis bug introduced in
  commit 3743c2f025 ("KVM: x86: inhibit APICv/AVIC on changes to APIC
  ID or APIC base") pointed out by Sean Christopherson ]

Link: https://lore.kernel.org/all/20220610191813.371682-1-seanjc@google.com/

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (34 commits)
  KVM: selftests: Restrict test region to 48-bit physical addresses when using nested
  KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2
  KVM: selftests: Clean up LIBKVM files in Makefile
  KVM: selftests: Link selftests directly with lib object files
  KVM: selftests: Drop unnecessary rule for STATIC_LIBS
  KVM: selftests: Add a helper to check EPT/VPID capabilities
  KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h
  KVM: selftests: Refactor nested_map() to specify target level
  KVM: selftests: Drop stale function parameter comment for nested_map()
  KVM: selftests: Add option to create 2M and 1G EPT mappings
  KVM: selftests: Replace x86_page_size with PG_LEVEL_XX
  KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSE
  KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/put
  KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un|}blocking
  KVM: x86: disable preemption while updating apicv inhibition
  KVM: x86: SVM: fix avic_kick_target_vcpus_fast
  KVM: x86: SVM: remove avic's broken code that updated APIC ID
  KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base
  KVM: x86: document AVIC/APICv inhibit reasons
  KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRs
  ...
2022-06-14 07:57:18 -07:00
Linus Torvalds
8e8afafb0b Yet another hw vulnerability with a software mitigation: Processor MMIO
Stale Data.
 
 They are a class of MMIO-related weaknesses which can expose stale data
 by propagating it into core fill buffers. Data which can then be leaked
 using the usual speculative execution methods.
 
 Mitigations include this set along with microcode updates and are
 similar to MDS and TAA vulnerabilities: VERW now clears those buffers
 too.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmKXMkMTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWGPD/idalLIhhV5F2+hZIKm0WSnsBxAOh9K
 7y8xBxpQQ5FUfW3vm7Pg3ro6VJp7w2CzKoD4lGXzGHriusn3qst3vkza9Ay8xu8g
 RDwKe6hI+p+Il9BV9op3f8FiRLP9bcPMMReW/mRyYsOnJe59hVNwRAL8OG40PY4k
 hZgg4Psfvfx8bwiye5efjMSe4fXV7BUCkr601+8kVJoiaoszkux9mqP+cnnB5P3H
 zW1d1jx7d6eV1Y063h7WgiNqQRYv0bROZP5BJkufIoOHUXDpd65IRF3bDnCIvSEz
 KkMYJNXb3qh7EQeHS53NL+gz2EBQt+Tq1VH256qn6i3mcHs85HvC68gVrAkfVHJE
 QLJE3MoXWOqw+mhwzCRrEXN9O1lT/PqDWw8I4M/5KtGG/KnJs+bygmfKBbKjIVg4
 2yQWfMmOgQsw3GWCRjgEli7aYbDJQjany0K/qZTq54I41gu+TV8YMccaWcXgDKrm
 cXFGUfOg4gBm4IRjJ/RJn+mUv6u+/3sLVqsaFTs9aiib1dpBSSUuMGBh548Ft7g2
 5VbFVSDaLjB2BdlcG7enlsmtzw0ltNssmqg7jTK/L7XNVnvxwUoXw+zP7RmCLEYt
 UV4FHXraMKNt2ZketlomC8ui2hg73ylUp4pPdMXCp7PIXp9sVamRTbpz12h689VJ
 /s55bWxHkR6S
 =LBxT
 -----END PGP SIGNATURE-----

Merge tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 MMIO stale data fixes from Thomas Gleixner:
 "Yet another hw vulnerability with a software mitigation: Processor
  MMIO Stale Data.

  They are a class of MMIO-related weaknesses which can expose stale
  data by propagating it into core fill buffers. Data which can then be
  leaked using the usual speculative execution methods.

  Mitigations include this set along with microcode updates and are
  similar to MDS and TAA vulnerabilities: VERW now clears those buffers
  too"

* tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation/mmio: Print SMT warning
  KVM: x86/speculation: Disable Fill buffer clear within guests
  x86/speculation/mmio: Reuse SRBDS mitigation for SBDS
  x86/speculation/srbds: Update SRBDS mitigation selection
  x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data
  x86/speculation/mmio: Enable CPU Fill buffer clearing on idle
  x86/bugs: Group MDS, TAA & Processor MMIO Stale Data mitigations
  x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data
  x86/speculation: Add a common function for MD_CLEAR mitigation update
  x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug
  Documentation: Add documentation for Processor MMIO Stale Data
2022-06-14 07:43:15 -07:00
Linus Torvalds
3cae0d8475 Random number generator fixes for Linux 5.19-rc2.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmKktaEACgkQSfxwEqXe
 A65Q+RAAkyLfX6kxQlTZbglTTiapbGwpk7+VhM0pJfc3guXhIe8kNZ5cX/mfkGeZ
 f/ebUmWSJhqaNz1OIZeBZQ98ESwh8imfWaxWDkqsFh4c+hGsSp2xwIszMn3Hg+7L
 Sm/0Q71eZaSnRBGxWRVBbz3tTppUBS4nJxvFj8iM3jWWUffZa0m/w1lMvqc8kNJu
 kM1ONqb+CEuHOJyltUaH2qEXD6fQE3IOpPdC6PdsFqFX8qLN/pwZO6tpY8tYPt3j
 AUubp8F3eR4Y7WcmMi1b7BmiRgg47jdsS18aqRSH8CuYvIBbXHNM+tuK54zh/888
 d+s9Bj6ALR4z1/a8HXqtudCazYU+1VozWxVIELcpDWTX4wKgqVZ0HLEz7sAEfCgV
 wSIcIY8TRJlOTL43KenbJqbyIOsvAidqWEYz5ogF9WYUlaD82s2j8pUMj4wQoD5w
 VJqF2CaoewU0BOGK7rZFnElN5rPlfEJtNBZQOEo16BzA2tXilPgRFoOmKs9TMRQo
 dGotfp62rUuS14b9x7zc9be0QmtnmwxQKO9U6SRVd5X2HXU/E5PsqTsbt0PKlSbA
 qemw2CehPB+uFs0cqbbeI5VBpVnPQJkaflTfOVr04h7623KJ5Pnblv7/8iSedOOh
 L4ACxPO5I5MrM8rJAR4g85SYVg3A/HaTufA5XtR5QB/qmEaErXg=
 =1Lf0
 -----END PGP SIGNATURE-----

Merge tag 'random-5.19-rc2-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random

Pull random number generator fixes from Jason Donenfeld:

 - A fix for a 5.19 regression for a case in which early device tree
   initializes the RNG, which flips a static branch.

   On most plaforms, jump labels aren't initialized until much later, so
   this caused splats. On a few mailing list threads, we cooked up easy
   fixes for arm64, arm32, and risc-v. But then things looked slightly
   more involved for xtensa, powerpc, arc, and mips. And at that point,
   when we're patching 7 architectures in a place before the console is
   even available, it seems like the cost/risk just wasn't worth it.

   So random.c works around it now by checking the already exported
   `static_key_initialized` boolean, as though somebody already ran into
   this issue in the past. I'm not super jazzed about that; it'd be
   prettier to not have to complicate downstream code. But I suppose
   it's practical.

 - A few small code nits and adding a missing __init annotation.

 - A change to the default config values to use the cpu and bootloader's
   seeds for initializing the RNG earlier.

   This brings them into line with what all the distros do (Fedora/RHEL,
   Debian, Ubuntu, Gentoo, Arch, NixOS, Alpine, SUSE, and Void... at
   least), and moreover will now give us test coverage in various test
   beds that might have caught the above device tree bug earlier.

 - A change to WireGuard CI's configuration to increase test coverage
   around the RNG.

 - A documentation comment fix to unrelated maintainerless CRC code that
   I was asked to take, I guess because it has to do with polynomials
   (which the RNG thankfully no longer uses).

* tag 'random-5.19-rc2-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  wireguard: selftests: use maximum cpu features and allow rng seeding
  random: remove rng_has_arch_random()
  random: credit cpu and bootloader seeds by default
  random: do not use jump labels before they are initialized
  random: account for arch randomness in bits
  random: mark bootloader randomness code as __init
  random: avoid checking crng_ready() twice in random_init()
  crc-itu-t: fix typo in CRC ITU-T polynomial comment
2022-06-12 10:33:38 -07:00
Feng Zhou
89eda98428 selftest/bpf/benchs: Add bpf_map benchmark
Add benchmark for hash_map to reproduce the worst case
that non-stop update when map's free is zero.

Just like this:
./run_bench_bpf_hashmap_full_update.sh
Setting up benchmark 'bpf-hashmap-ful-update'...
Benchmark 'bpf-hashmap-ful-update' started.
1:hash_map_full_perf 555830 events per sec
...

Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Link: https://lore.kernel.org/r/20220610023308.93798-3-zhoufeng.zf@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-11 14:25:35 -07:00
Jason A. Donenfeld
17b0128a13 wireguard: selftests: use maximum cpu features and allow rng seeding
By forcing the maximum CPU that QEMU has available, we expose additional
capabilities, such as the RNDR instruction, which increases test
coverage. This then allows the CI to skip the fake seeding step in some
cases. Also enable STRICT_KERNEL_RWX to catch issues related to early
jump labels when the RNG is initialized at boot.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-06-11 15:38:08 +02:00
Linus Torvalds
825464e79d Networking fixes for 5.19-rc2, including fixes from bpf and netfilter.
Current release - regressions:
   - eth: amt: fix possible null-ptr-deref in amt_rcv()
 
 Previous releases - regressions:
   - tcp: use alloc_large_system_hash() to allocate table_perturb
 
   - af_unix: fix a data-race in unix_dgram_peer_wake_me()
 
   - nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling
 
   - eth: ixgbe: fix unexpected VLAN rx in promisc mode on VF
 
 Previous releases - always broken:
   - ipv6: fix signed integer overflow in __ip6_append_data
 
   - netfilter:
     - nat: really support inet nat without l3 address
     - nf_tables: memleak flow rule from commit path
 
   - bpf: fix calling global functions from BPF_PROG_TYPE_EXT programs
 
   - openvswitch: fix misuse of the cached connection on tuple changes
 
   - nfc: nfcmrvl: fix memory leak in nfcmrvl_play_deferred
 
   - eth: altera: fix refcount leak in altera_tse_mdio_create
 
 Misc:
   - add Quentin Monnet to bpftool maintainers
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmKhykgSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkN7sQAIn+ZmzQqTm5MVWnlvt/GcRGjjMP2VQY
 60oS2re8QC773yWoP6PvXqxCSFc99paDCC5BmCK6DMLbp9yuVSp5W8iAPuFuyjXE
 /Nur4Ti57LcGJ8ZpcJheBD4cRFbf+xtsGzx9a1WhUDrCYASo7vqRes5Eos2dT7P7
 qjgTduhUtaj6S1CfenfTnYqemZPzSGa+1euDuQ/Bu4mjCPUTrNZZQVYjmfTYM9p1
 UzwfCQr9TtmRKo8wLFHnYDLoWHNpfp55SNL0ShAwIQqgldiJ2OdMje+a2Sa4m6uF
 etRz8H0WrGVqfneD424tdyZv4nwhHw5dnaSrGe8DGq98c4/lIIcVyC38oDAbfWqI
 l8p7ZmtvNid7rpgoQFcxKpb2TAYAI+jaFq5GySEhvj5ZAblNQgFyghfMGPoncXCO
 XW6va8TtP2lmHFScAljQiQb6GNwDO52x77/q14Jkwvr+DILRKXMZZ3hCGrKUn5JM
 lafGkdL5ufm+E9C9RlaWN3imb2KoRj+wdThgV79efEPGG1py7yLOPVMoOCP3qmLq
 torcGcfDi1LGb7ohQxN6tCMv0JgXjS5nd1i+qJnImpkhRrUmahOfmpnElHoPuzs3
 6FU8HR77Eo15x70Jt+WOMy4oXrNh2MeEm8/Fhpj84MEhKpxVn+2o/53M+++5h+ru
 YtiLwEri0dCA
 =rdoB
 -----END PGP SIGNATURE-----

Merge tag 'net-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bpf and netfilter.

  Current release - regressions:

   - eth: amt: fix possible null-ptr-deref in amt_rcv()

  Previous releases - regressions:

   - tcp: use alloc_large_system_hash() to allocate table_perturb

   - af_unix: fix a data-race in unix_dgram_peer_wake_me()

   - nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling

   - eth: ixgbe: fix unexpected VLAN rx in promisc mode on VF

  Previous releases - always broken:

   - ipv6: fix signed integer overflow in __ip6_append_data

   - netfilter:
       - nat: really support inet nat without l3 address
       - nf_tables: memleak flow rule from commit path

   - bpf: fix calling global functions from BPF_PROG_TYPE_EXT programs

   - openvswitch: fix misuse of the cached connection on tuple changes

   - nfc: nfcmrvl: fix memory leak in nfcmrvl_play_deferred

   - eth: altera: fix refcount leak in altera_tse_mdio_create

  Misc:

   - add Quentin Monnet to bpftool maintainers"

* tag 'net-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits)
  net: amd-xgbe: fix clang -Wformat warning
  tcp: use alloc_large_system_hash() to allocate table_perturb
  net: dsa: realtek: rtl8365mb: fix GMII caps for ports with internal PHY
  net: dsa: mv88e6xxx: correctly report serdes link failure
  net: dsa: mv88e6xxx: fix BMSR error to be consistent with others
  net: dsa: mv88e6xxx: use BMSR_ANEGCOMPLETE bit for filling an_complete
  net: altera: Fix refcount leak in altera_tse_mdio_create
  net: openvswitch: fix misuse of the cached connection on tuple changes
  net: ethernet: mtk_eth_soc: fix misuse of mem alloc interface netdev[napi]_alloc_frag
  ip_gre: test csum_start instead of transport header
  au1000_eth: stop using virt_to_bus()
  ipv6: Fix signed integer overflow in l2tp_ip6_sendmsg
  ipv6: Fix signed integer overflow in __ip6_append_data
  nfc: nfcmrvl: Fix memory leak in nfcmrvl_play_deferred
  nfc: st21nfca: fix incorrect sizing calculations in EVT_TRANSACTION
  nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling
  nfc: st21nfca: fix incorrect validating logic in EVT_TRANSACTION
  net: ipv6: unexport __init-annotated seg6_hmac_init()
  net: xfrm: unexport __init-annotated xfrm4_protocol_init()
  net: mdio: unexport __init-annotated mdio_bus_init()
  ...
2022-06-09 12:06:52 -07:00
David Matlack
e0f3f46e42 KVM: selftests: Restrict test region to 48-bit physical addresses when using nested
The selftests nested code only supports 4-level paging at the moment.
This means it cannot map nested guest physical addresses with more than
48 bits. Allow perf_test_util nested mode to work on hosts with more
than 48 physical addresses by restricting the guest test region to
48-bits.

While here, opportunistically fix an off-by-one error when dealing with
vm_get_max_gfn(). perf_test_util.c was treating this as the maximum
number of GFNs, rather than the maximum allowed GFN. This didn't result
in any correctness issues, but it did end up shifting the test region
down slightly when using huge pages.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-12-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09 10:52:27 -04:00
David Matlack
71d4896619 KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2
Add an option to dirty_log_perf_test that configures the vCPUs to run in
L2 instead of L1. This makes it possible to benchmark the dirty logging
performance of nested virtualization, which is particularly interesting
because KVM must shadow L1's EPT/NPT tables.

For now this support only works on x86_64 CPUs with VMX. Otherwise
passing -n results in the test being skipped.

Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-11-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09 10:52:27 -04:00
David Matlack
cf97d5e99f KVM: selftests: Clean up LIBKVM files in Makefile
Break up the long lines for LIBKVM and alphabetize each architecture.
This makes reading the Makefile easier, and will make reading diffs to
LIBKVM easier.

No functional change intended.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-10-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09 10:52:26 -04:00
David Matlack
cdc979dae2 KVM: selftests: Link selftests directly with lib object files
The linker does obey strong/weak symbols when linking static libraries,
it simply resolves an undefined symbol to the first-encountered symbol.
This means that defining __weak arch-generic functions and then defining
arch-specific strong functions to override them in libkvm will not
always work.

More specifically, if we have:

lib/generic.c:

  void __weak foo(void)
  {
          pr_info("weak\n");
  }

  void bar(void)
  {
          foo();
  }

lib/x86_64/arch.c:

  void foo(void)
  {
          pr_info("strong\n");
  }

And a selftest that calls bar(), it will print "weak". Now if you make
generic.o explicitly depend on arch.o (e.g. add function to arch.c that
is called directly from generic.c) it will print "strong". In other
words, it seems that the linker is free to throw out arch.o when linking
because generic.o does not explicitly depend on it, which causes the
linker to lose the strong symbol.

One solution is to link libkvm.a with --whole-archive so that the linker
doesn't throw away object files it thinks are unnecessary. However that
is a bit difficult to plumb since we are using the common selftests
makefile rules. An easier solution is to drop libkvm.a just link
selftests with all the .o files that were originally in libkvm.a.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-9-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09 10:52:25 -04:00
David Matlack
acf57736e7 KVM: selftests: Drop unnecessary rule for STATIC_LIBS
Drop the "all: $(STATIC_LIBS)" rule. The KVM selftests already depend
on $(STATIC_LIBS), so there is no reason to have an extra "all" rule.

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-8-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09 10:52:25 -04:00
David Matlack
c363d95986 KVM: selftests: Add a helper to check EPT/VPID capabilities
Create a small helper function to check if a given EPT/VPID capability
is supported. This will be re-used in a follow-up commit to check for 1G
page support.

No functional change intended.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-7-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09 10:52:24 -04:00
David Matlack
b6c086d04c KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h
This is a VMX-related macro so move it to vmx.h. While here, open code
the mask like the rest of the VMX bitmask macros.

No functional change intended.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-6-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09 10:52:24 -04:00
David Matlack
ce690e9c17 KVM: selftests: Refactor nested_map() to specify target level
Refactor nested_map() to specify that it explicityl wants 4K mappings
(the existing behavior) and push the implementation down into
__nested_map(), which can be used in subsequent commits to create huge
page mappings.

No function change intended.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-5-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09 10:52:23 -04:00
David Matlack
b8ca01ea19 KVM: selftests: Drop stale function parameter comment for nested_map()
nested_map() does not take a parameter named eptp_memslot. Drop the
comment referring to it.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-4-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09 10:52:23 -04:00
David Matlack
c5a0ccec4c KVM: selftests: Add option to create 2M and 1G EPT mappings
The current EPT mapping code in the selftests only supports mapping 4K
pages. This commit extends that support with an option to map at 2M or
1G. This will be used in a future commit to create large page mappings
to test eager page splitting.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-3-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09 10:52:22 -04:00
David Matlack
4ee602e78d KVM: selftests: Replace x86_page_size with PG_LEVEL_XX
x86_page_size is an enum used to communicate the desired page size with
which to map a range of memory. Under the hood they just encode the
desired level at which to map the page. This ends up being clunky in a
few ways:

 - The name suggests it encodes the size of the page rather than the
   level.
 - In other places in x86_64/processor.c we just use a raw int to encode
   the level.

Simplify this by adopting the kernel style of PG_LEVEL_XX enums and pass
around raw ints when referring to the level. This makes the code easier
to understand since these macros are very common in KVM MMU code.

Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-2-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09 10:52:22 -04:00
Paolo Bonzini
66da65005a KVM/riscv fixes for 5.19, take #1
- Typo fix in arch/riscv/kvm/vmid.c
 
 - Remove broken reference pattern from MAINTAINERS entry
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmKhgGkACgkQrUjsVaLH
 LAfCCw/+LEz6af/lm4PXr5CGkJ91xq2xMpU9o39jMBm0RltzsG7zt/90SaUdK/Oz
 MpX7CLFgb1Cm2ZZ/+l5cBlLc7NUaMMxHH9dpScyrYC8xAb75QYimpe/jfjuMyXjO
 IaYJB2WCs2gfTYXA58c4sB2WR5rLahLnQGJrwW2CfMSvpv/nAyEZyWYtgXw8tSxH
 oM06Z/cLWU53uWuX0hbKAVQMdAIrQK5H+z46bhbpFC6gk/XSvaBBEngoOiiE6lC6
 uM8i8ZIeUgqSeWWreczd6H25eYwyLuVxXHWSIgbdvEcvBUn0VDO+Ox4UA2ab3g3d
 uHubqdRY5GnrkbRK0ue6tXfON8NxGlKwlcc6kp9Vqxb3Jxjr2qwToTubHYAUVXUi
 XzrvSxoZRRikwstb1+PNXECCNYUHkNdj4FBA4WoF0Y3Br1IfSwZLUX+EKkY/DHv+
 L4MhFFNqsQPzVly2wNiyxuWwRQyxupHekeMQlp13P9vZnGcptxxEyuQlM1Hf40ST
 iiOC8L+TCQzc5dN156/KjQIUFPud4huJO+0xHQtang628yVzQazzcxD+ialPkcqt
 JnpMmNbvvNzFYLoB3dQ/36flmYRA6SbK4Tt4bdhls+UcnLnfHDZow7OLmX5yj8+A
 QiKx6IOS6KI10LXhVZguAmZuKjXajyLVaCWpBl0tV6XpV9Y5t98=
 =w6dT
 -----END PGP SIGNATURE-----

Merge tag 'kvm-riscv-fixes-5.19-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv fixes for 5.19, take #1

- Typo fix in arch/riscv/kvm/vmid.c

- Remove broken reference pattern from MAINTAINERS entry
2022-06-09 09:45:00 -04:00
Andrii Nakryiko
fe92833524 libbpf: Fix uprobe symbol file offset calculation logic
Fix libbpf's bpf_program__attach_uprobe() logic of determining
function's *file offset* (which is what kernel is actually expecting)
when attaching uprobe/uretprobe by function name. Previously calculation
was determining virtual address offset relative to base load address,
which (offset) is not always the same as file offset (though very
frequently it is which is why this went unnoticed for a while).

Fixes: 433966e3ae ("libbpf: Support function name-based attach uprobes")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Riham Selim <rihams@fb.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20220606220143.3796908-1-andrii@kernel.org
2022-06-09 14:09:41 +02:00
Shahab Vahedi
0b817059a8 bpftool: Fix bootstrapping during a cross compilation
This change adjusts the Makefile to use "HOSTAR" as the archive tool
to keep the sanity of the build process for the bootstrap part in
check. For the rationale, please continue reading.

When cross compiling bpftool with buildroot, it leads to an invocation
like:

$ AR="/path/to/buildroot/host/bin/arc-linux-gcc-ar" \
  CC="/path/to/buildroot/host/bin/arc-linux-gcc"    \
  ...
  make

Which in return fails while building the bootstrap section:

----------------------------------8<----------------------------------

  make: Entering directory '/src/bpftool-v6.7.0/src'
  ...                        libbfd: [ on  ]
  ...        disassembler-four-args: [ on  ]
  ...                          zlib: [ on  ]
  ...                        libcap: [ OFF ]
  ...               clang-bpf-co-re: [ on  ] <-- triggers bootstrap

  .
  .
  .

    LINK     /src/bpftool-v6.7.0/src/bootstrap/bpftool
  /usr/bin/ld: /src/bpftool-v6.7.0/src/bootstrap/libbpf/libbpf.a:
               error adding symbols: archive has no index; run ranlib
               to add one
  collect2: error: ld returned 1 exit status
  make: *** [Makefile:211: /src/bpftool-v6.7.0/src/bootstrap/bpftool]
            Error 1
  make: *** Waiting for unfinished jobs....
    AR       /src/bpftool-v6.7.0/src/libbpf/libbpf.a
    make[1]: Leaving directory '/src/bpftool-v6.7.0/libbpf/src'
    make: Leaving directory '/src/bpftool-v6.7.0/src'

---------------------------------->8----------------------------------

This occurs because setting "AR" confuses the build process for the
bootstrap section and it calls "arc-linux-gcc-ar" to create and index
"libbpf.a" instead of the host "ar".

Signed-off-by: Shahab Vahedi <shahab@synopsys.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/bpf/8d297f0c-cfd0-ef6f-3970-6dddb3d9a87a@synopsys.com
2022-06-09 14:01:21 +02:00
Jakub Kicinski
d5d4c36398 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2022-06-09

We've added 6 non-merge commits during the last 2 day(s) which contain
a total of 8 files changed, 49 insertions(+), 15 deletions(-).

The main changes are:

1) Fix an illegal copy_to_user() attempt seen by syzkaller through arm64
   BPF JIT compiler, from Eric Dumazet.

2) Fix calling global functions from BPF_PROG_TYPE_EXT programs by using
   the correct program context type, from Toke Høiland-Jørgensen.

3) Fix XSK TX batching invalid descriptor handling, from Maciej Fijalkowski.

4) Fix potential integer overflows in multi-kprobe link code by using safer
   kvmalloc_array() allocation helpers, from Dan Carpenter.

5) Add Quentin as bpftool maintainer, from Quentin Monnet.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  MAINTAINERS: Add a maintainer for bpftool
  xsk: Fix handling of invalid descriptors in XSK TX batching API
  selftests/bpf: Add selftest for calling global functions from freplace
  bpf: Fix calling global functions from BPF_PROG_TYPE_EXT programs
  bpf: Use safer kvmalloc_array() where possible
  bpf, arm64: Clear prog->jited_len along prog->jited
====================

Link: https://lore.kernel.org/r/20220608234133.32265-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08 20:31:21 -07:00
Linus Torvalds
34f4335c16 * Fix syzkaller NULL pointer dereference
* Fix TDP MMU performance issue with disabling dirty logging
 * Fix 5.14 regression with SVM TSC scaling
 * Fix indefinite stall on applying live patches
 * Fix unstable selftest
 * Fix memory leak from wrong copy-and-paste
 * Fix missed PV TLB flush when racing with emulation
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmKglysUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOJDAgArpPcAnJbeT2VQTQcp94e4tp9k1Sf
 gmUewajco4zFVB/sldE0fIporETkaX+FYYPiaNDdNgJ2lUw/HUJBN7KoFEYTZ37N
 Xx/qXiIXQYFw1bmxTnacLzIQtD3luMCzOs/6/Q7CAFZIBpUtUEjkMlQOBuxoKeG0
 B0iLCTJSw0taWcN170aN8G6T+5+bdR3AJW1k2wkgfESfYF9NfJoTUHQj9WTMzM2R
 aBRuXvUI/rWKvQY3DfoRmgg9Ig/SirSC+abbKIs4H08vZIEUlPk3WOZSKpsN/Wzh
 3XDnVRxgnaRLx6NI/ouI2UYJCmjPKbNcueGCf5IfUcHvngHjAEG/xxe4Qw==
 =zQ9u
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:

 - syzkaller NULL pointer dereference

 - TDP MMU performance issue with disabling dirty logging

 - 5.14 regression with SVM TSC scaling

 - indefinite stall on applying live patches

 - unstable selftest

 - memory leak from wrong copy-and-paste

 - missed PV TLB flush when racing with emulation

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: do not report a vCPU as preempted outside instruction boundaries
  KVM: x86: do not set st->preempted when going back to user space
  KVM: SVM: fix tsc scaling cache logic
  KVM: selftests: Make hyperv_clock selftest more stable
  KVM: x86/MMU: Zap non-leaf SPTEs when disabling dirty logging
  x86: drop bogus "cc" clobber from __try_cmpxchg_user_asm()
  KVM: x86/mmu: Check every prev_roots in __kvm_mmu_free_obsolete_roots()
  entry/kvm: Exit to user mode when TIF_NOTIFY_SIGNAL is set
  KVM: Don't null dereference ops->destroy
2022-06-08 09:16:31 -07:00
Jakub Kicinski
91ffb08932 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Fix NAT support for NFPROTO_INET without layer 3 address,
   from Florian Westphal.

2) Use kfree_rcu(ptr, rcu) variant in nf_tables clean_net path.

3) Use list to collect flowtable hooks to be deleted.

4) Initialize list of hook field in flowtable transaction.

5) Release hooks on error for flowtable updates.

6) Memleak in hardware offload rule commit and abort paths.

7) Early bail out in case device does not support for hardware offload.
   This adds a new interface to net/core/flow_offload.c to check if the
   flow indirect block list is empty.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: bail out early if hardware offload is not supported
  netfilter: nf_tables: memleak flow rule from commit path
  netfilter: nf_tables: release new hooks on unsupported flowtable flags
  netfilter: nf_tables: always initialize flowtable hook list in transaction
  netfilter: nf_tables: delete flowtable hooks via transaction list
  netfilter: nf_tables: use kfree_rcu(ptr, rcu) to release hooks in clean_net path
  netfilter: nat: really support inet nat without l3 address
====================

Link: https://lore.kernel.org/r/20220606212055.98300-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-07 17:49:48 -07:00
Toke Høiland-Jørgensen
2cf7b7ffda selftests/bpf: Add selftest for calling global functions from freplace
Add a selftest that calls a global function with a context object parameter
from an freplace function to check that the program context type is
correctly converted to the freplace target when fetching the context type
from the kernel BTF.

v2:
- Trim includes
- Get rid of global function
- Use __noinline

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20220606075253.28422-2-toke@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07 10:41:20 -07:00
Yonghong Song
f4db3dd528 selftests/bpf: Add a test for enum64 value relocations
Add a test for enum64 value relocations.
The test will be skipped if clang version is 14 or lower
since enum64 is only supported from version 15.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062718.3726307-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07 10:20:44 -07:00
Yonghong Song
adc26d134e selftests/bpf: Test BTF_KIND_ENUM64 for deduplication
Add a few unit tests for BTF_KIND_ENUM64 deduplication.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062713.3725409-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07 10:20:44 -07:00
Yonghong Song
3b5325186d selftests/bpf: Add BTF_KIND_ENUM64 unit tests
Add unit tests for basic BTF_KIND_ENUM64 encoding.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062708.3724845-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07 10:20:44 -07:00
Yonghong Song
2b7301457f selftests/bpf: Test new enum kflag and enum64 API functions
Add tests to use the new enum kflag and enum64 API functions
in selftest btf_write.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062703.3724287-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07 10:20:43 -07:00
Yonghong Song
d932815a43 selftests/bpf: Fix selftests failure
The kflag is supported now for BTF_KIND_ENUM.
So remove the test which tests verifier failure
due to existence of kflag.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062657.3723737-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07 10:20:43 -07:00
Yonghong Song
58a53978fd bpftool: Add btf enum64 support
Add BTF_KIND_ENUM64 support.
For example, the following enum is defined in uapi bpf.h.
  $ cat core.c
  enum A {
        BPF_F_INDEX_MASK                = 0xffffffffULL,
        BPF_F_CURRENT_CPU               = BPF_F_INDEX_MASK,
        BPF_F_CTXLEN_MASK               = (0xfffffULL << 32),
  } g;
Compiled with
  clang -target bpf -O2 -g -c core.c
Using bpftool to dump types and generate format C file:
  $ bpftool btf dump file core.o
  ...
  [1] ENUM64 'A' encoding=UNSIGNED size=8 vlen=3
        'BPF_F_INDEX_MASK' val=4294967295ULL
        'BPF_F_CURRENT_CPU' val=4294967295ULL
        'BPF_F_CTXLEN_MASK' val=4503595332403200ULL
  $ bpftool btf dump file core.o format c
  ...
  enum A {
        BPF_F_INDEX_MASK = 4294967295ULL,
        BPF_F_CURRENT_CPU = 4294967295ULL,
        BPF_F_CTXLEN_MASK = 4503595332403200ULL,
  };
  ...

Note that for raw btf output, the encoding (UNSIGNED or SIGNED)
is printed out as well. The 64bit value is also represented properly
in BTF and C dump.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062652.3722649-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07 10:20:43 -07:00
Yonghong Song
23b2a3a8f6 libbpf: Add enum64 relocation support
The enum64 relocation support is added. The bpf local type
could be either enum or enum64 and the remote type could be
either enum or enum64 too. The all combinations of local enum/enum64
and remote enum/enum64 are supported.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062647.3721719-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07 10:20:43 -07:00
Yonghong Song
6ec7d79be2 libbpf: Add enum64 support for bpf linking
Add BTF_KIND_ENUM64 support for bpf linking, which is
very similar to BTF_KIND_ENUM.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062642.3721494-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07 10:20:43 -07:00
Yonghong Song
f2a625889b libbpf: Add enum64 sanitization
When old kernel does not support enum64 but user space btf
contains non-zero enum kflag or enum64, libbpf needs to
do proper sanitization so modified btf can be accepted
by the kernel.

Sanitization for enum kflag can be achieved by clearing
the kflag bit. For enum64, the type is replaced with an
union of integer member types and the integer member size
must be smaller than enum64 size. If such an integer
type cannot be found, a new type is created and used
for union members.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062636.3721375-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07 10:20:43 -07:00
Yonghong Song
d90ec262b3 libbpf: Add enum64 support for btf_dump
Add enum64 btf dumping support. For long long and unsigned long long
dump, suffixes 'LL' and 'ULL' are added to avoid compilation errors
in some cases.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062631.3720526-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07 10:20:43 -07:00
Yonghong Song
2ef2026349 libbpf: Add enum64 deduplication support
Add enum64 deduplication support. BTF_KIND_ENUM64 handling
is very similar to BTF_KIND_ENUM.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062626.3720166-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07 10:20:43 -07:00
Yonghong Song
dffbbdc2d9 libbpf: Add enum64 parsing and new enum64 public API
Add enum64 parsing support and two new enum64 public APIs:
  btf__add_enum64
  btf__add_enum64_value

Also add support of signedness for BTF_KIND_ENUM. The
BTF_KIND_ENUM API signatures are not changed. The signedness
will be changed from unsigned to signed if btf__add_enum_value()
finds any negative values.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062621.3719391-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07 10:20:43 -07:00
Yonghong Song
8479aa7522 libbpf: Refactor btf__add_enum() for future code sharing
Refactor btf__add_enum() function to create a separate
function btf_add_enum_common() so later the common function
can be used to add enum64 btf type. There is no functionality
change for this patch.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062615.3718063-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07 10:20:42 -07:00
Yonghong Song
b58b2b3a31 libbpf: Fix an error in 64bit relocation value computation
Currently, the 64bit relocation value in the instruction
is computed as follows:
  __u64 imm = insn[0].imm + ((__u64)insn[1].imm << 32)

Suppose insn[0].imm = -1 (0xffffffff) and insn[1].imm = 1.
With the above computation, insn[0].imm will first sign-extend
to 64bit -1 (0xffffffffFFFFFFFF) and then add 0x1FFFFFFFF,
producing incorrect value 0xFFFFFFFF. The correct value
should be 0x1FFFFFFFF.

Changing insn[0].imm to __u32 first will prevent 64bit sign
extension and fix the issue. Merging high and low 32bit values
also changed from '+' to '|' to be consistent with other
similar occurences in kernel and libbpf.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062610.3717378-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07 10:20:42 -07:00
Yonghong Song
776281652d libbpf: Permit 64bit relocation value
Currently, the libbpf limits the relocation value to be 32bit
since all current relocations have such a limit. But with
BTF_KIND_ENUM64 support, the enum value could be 64bit.
So let us permit 64bit relocation value in libbpf.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062605.3716779-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07 10:20:42 -07:00
Yonghong Song
6089fb325c bpf: Add btf enum64 support
Currently, BTF only supports upto 32bit enum value with BTF_KIND_ENUM.
But in kernel, some enum indeed has 64bit values, e.g.,
in uapi bpf.h, we have
  enum {
        BPF_F_INDEX_MASK                = 0xffffffffULL,
        BPF_F_CURRENT_CPU               = BPF_F_INDEX_MASK,
        BPF_F_CTXLEN_MASK               = (0xfffffULL << 32),
  };
In this case, BTF_KIND_ENUM will encode the value of BPF_F_CTXLEN_MASK
as 0, which certainly is incorrect.

This patch added a new btf kind, BTF_KIND_ENUM64, which permits
64bit value to cover the above use case. The BTF_KIND_ENUM64 has
the following three fields followed by the common type:
  struct bpf_enum64 {
    __u32 nume_off;
    __u32 val_lo32;
    __u32 val_hi32;
  };
Currently, btf type section has an alignment of 4 as all element types
are u32. Representing the value with __u64 will introduce a pad
for bpf_enum64 and may also introduce misalignment for the 64bit value.
Hence, two members of val_hi32 and val_lo32 are chosen to avoid these issues.

The kflag is also introduced for BTF_KIND_ENUM and BTF_KIND_ENUM64
to indicate whether the value is signed or unsigned. The kflag intends
to provide consistent output of BTF C fortmat with the original
source code. For example, the original BTF_KIND_ENUM bit value is 0xffffffff.
The format C has two choices, printing out 0xffffffff or -1 and current libbpf
prints out as unsigned value. But if the signedness is preserved in btf,
the value can be printed the same as the original source code.
The kflag value 0 means unsigned values, which is consistent to the default
by libbpf and should also cover most cases as well.

The new BTF_KIND_ENUM64 is intended to support the enum value represented as
64bit value. But it can represent all BTF_KIND_ENUM values as well.
The compiler ([1]) and pahole will generate BTF_KIND_ENUM64 only if the value has
to be represented with 64 bits.

In addition, a static inline function btf_kind_core_compat() is introduced which
will be used later when libbpf relo_core.c changed. Here the kernel shares the
same relo_core.c with libbpf.

  [1] https://reviews.llvm.org/D124641

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220607062600.3716578-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07 10:20:42 -07:00
Vitaly Kuznetsov
eae260be3a KVM: selftests: Make hyperv_clock selftest more stable
hyperv_clock doesn't always give a stable test result, especially with
AMD CPUs. The test compares Hyper-V MSR clocksource (acquired either
with rdmsr() from within the guest or KVM_GET_MSRS from the host)
against rdtsc(). To increase the accuracy, increase the measured delay
(done with nop loop) by two orders of magnitude and take the mean rdtsc()
value before and after rdmsr()/KVM_GET_MSRS.

Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220601144322.1968742-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-07 11:28:50 -04:00
Lina Wang
cf67838c44 selftests net: fix bpf build error
bpf_helpers.h has been moved to tools/lib/bpf since 5.10, so add more
including path.

Fixes: edae34a3ed ("selftests net: add UDP GRO fraglist + bpf self-tests")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Lina Wang <lina.wang@mediatek.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20220606064517.8175-1-lina.wang@mediatek.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-06-07 13:19:51 +02:00
Josh Poimboeuf
7b6c7a877c x86/ftrace: Remove OBJECT_FILES_NON_STANDARD usage
The file-wide OBJECT_FILES_NON_STANDARD annotation is used with
CONFIG_FRAME_POINTER to tell objtool to skip the entire file when frame
pointers are enabled.  However that annotation is now deprecated because
it doesn't work with IBT, where objtool runs on vmlinux.o instead of
individual translation units.

Instead, use more fine-grained function-specific annotations:

- The 'save_mcount_regs' macro does funny things with the frame pointer.
  Use STACK_FRAME_NON_STANDARD_FP to tell objtool to ignore the
  functions using it.

- The return_to_handler() "function" isn't actually a callable function.
  Instead of being called, it's returned to.  The real return address
  isn't on the stack, so unwinding is already doomed no matter which
  unwinder is used.  So just remove the STT_FUNC annotation, telling
  objtool to ignore it.  That also removes the implicit
  ANNOTATE_NOENDBR, which now needs to be made explicit.

Fixes the following warning:

  vmlinux.o: warning: objtool: __fentry__+0x16: return with modified stack frame

Fixes: ed53a0d971 ("x86/alternative: Use .ibt_endbr_seal to seal indirect calls")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/b7a7a42fe306aca37826043dac89e113a1acdbac.1654268610.git.jpoimboe@kernel.org
2022-06-06 11:50:22 -07:00
Linus Torvalds
e17fee8976 A single featurelet for delay accounting. Delayed a bit
because,unusually, it has dependencies on both the mm-stable and
 mm-nonmm-stable queues.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYpz3SAAKCRDdBJ7gKXxA
 jou6AP9bY89NifR7Tc8U59Xu4c9amphXS9rTJv7Ysj3GxBMoRwEAuXvvJTet6mEn
 UdmytDdb4BtAlx7Itd7IKu4S9JD6mQw=
 =bAU1
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull delay-accounting update from Andrew Morton:
 "A single featurette for delay accounting.

  Delayed a bit because, unusually, it had dependencies on both the
  mm-stable and mm-nonmm-stable queues"

* tag 'mm-nonmm-stable-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  delayacct: track delays from write-protect copy
2022-06-05 16:58:27 -07:00
Linus Torvalds
44688ffd11 A set of objtool fixes:
- Handle __ubsan_handle_builtin_unreachable() correctly and treat it as
     noreturn.
 
   - Allow architectures to select uaccess validation
 
   - Use the non-instrumented bit test for test_cpu_has() to prevent escape
     from non-instrumentable regions.
 
   - Use arch_ prefixed atomics for JUMP_LABEL=n builds to prevent escape
     from non-instrumentable regions.
 
   - Mark a few tiny inline as __always_inline to prevent GCC from bringing
     them out of line and instrumenting them.
 
   - Mark the empty stub context_tracking_enabled() as always inline as GCC
     brings them out of line and instruments the empty shell.
 
   - Annotate ex_handler_msr_mce() as dead end
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmKccvMTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoW39EAC1w/mSwn3b1lYkzOUcoe4EOMXzao2U
 my0ThnPNpa5k14Xfp6tOIpWRGTuW6mcVi4g+x4+LJo9V5tt5BxmMO1VFrTCQKn7H
 iJ1sWZNGq503aXldIT+pC0Zz67CIVnbGiz0D67aEYQ7w4ACdkubx8kcx5Of7BNbm
 KyQllP8XFXy7b+wgc8MrX1h/wPXNV9PBJwRAFrBw52c4s5euYui7iUNUm4RtKRem
 OpI3RFholAITLzvV8j+Xs9EmfUDjvmU3e1NEEas2n3MHm7tkYo5aSOSYX/Z7C5YD
 MvpMS3UAgwRGdaXvRVJK7eWcwayjODGGYrW9x9w9RMKM492uB4vAzfr4PE3Lru5G
 mnOxDjEP4QRK7Jl8bC0Idc5G6bxmw4DnQl7vkoaNYn3EyxKaEvREUokFKy5eWp3U
 klFQZXgQreUGSEkVA8VW7yT6knzVNsBk2WSFDUPdQZ0PV7JAVLyGZX8gEbhDyyim
 czkmI21A3hmGR97FKxyQ0I1N6q8eKSodZWbquPdOW52Jdt6pkpzUqPok9r74PK/p
 83ip/bNthbaR8FccNCHbnCLd8kvp6lsjqLqnMQHhMtUju6uRPRTzW1rxKik3Cbfh
 8VmqP6ltNGD7MkQW/jW+Vq7GIM+9onnEHbA/aEntH/ZKDHEefYtE66T0BjSrS6YK
 5dMr/vz4Jx1bwg==
 =eNVp
 -----END PGP SIGNATURE-----

Merge tag 'objtool-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fixes from Thomas Gleixner:

 - Handle __ubsan_handle_builtin_unreachable() correctly and treat it as
   noreturn

 - Allow architectures to select uaccess validation

 - Use the non-instrumented bit test for test_cpu_has() to prevent
   escape from non-instrumentable regions

 - Use arch_ prefixed atomics for JUMP_LABEL=n builds to prevent escape
   from non-instrumentable regions

 - Mark a few tiny inline as __always_inline to prevent GCC from
   bringing them out of line and instrumenting them

 - Mark the empty stub context_tracking_enabled() as always inline as
   GCC brings them out of line and instruments the empty shell

 - Annotate ex_handler_msr_mce() as dead end

* tag 'objtool-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/extable: Annotate ex_handler_msr_mce() as a dead end
  context_tracking: Always inline empty stubs
  x86: Always inline on_thread_stack() and current_top_of_stack()
  jump_label,noinstr: Avoid instrumentation for JUMP_LABEL=n builds
  x86/cpu: Elide KCSAN for cpu_has() and friends
  objtool: Mark __ubsan_handle_builtin_unreachable() as noreturn
  objtool: Add CONFIG_HAVE_UACCESS_VALIDATION
2022-06-05 09:45:27 -07:00
Linus Torvalds
2981436374 hte: New subsystem for v5.19-rc1
This contains the new HTE subsystem that has been in the works for a
 couple of months now. The infrastructure provided allows for drivers to
 register as hardware timestamp providers, while consumers will be able
 to request events that they are interested in (such as GPIOs and IRQs)
 to be timestamped by the hardware providers.
 
 Note that this currently supports only one provider, but there seems to
 be enough interest in this functionality and we expect to see more
 drivers added once this is merged.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEiOrDCAFJzPfAjcif3SOs138+s6EFAmKZ77ATHHRyZWRpbmdA
 bnZpZGlhLmNvbQAKCRDdI6zXfz6zoTgVD/4tdJptbwblr8CCP5jklb81quuKBFu4
 2VaUpyGo4TDD4g0jchmAkmB0JsjO7mauURZ/vwWUfDx+uvsOJyf79smY/4OvL0v6
 MdYFeXb1rqDX8SiZSnpa+PKNI96l9/l1sbkbj1PPIod8hJSgXsASRP4lF21U97ZY
 QTI7u3kMJsUEvZhbEs9E2TXPAUO4+M8HfogJuEoaVRyHdwVHY1+Z+jlUsVXRd1qU
 XpIaKKMWF07FWrs2QUAbdqIUc8cITlcP+ExCc35PMwZnemWHnVwvk0mVyC0XD39P
 PHGlQOR2zTwJmijCwFkKTwuhGufE6bbvKvdns6gyTUlbzpQ4vcjPfVubt9ehX3dp
 acEJp5WdJFUhFU4dhjsLGVVzwE/L7vsZ3RPPh1j/Hjt0wYPg/EYPLr/wz+0wbIxt
 z4AtQZBLwrXSxXUuGkzl139kx0lTEtQZvfiziwi8BWrl6aqeBcGSNYHFLs2rDUTh
 sap+aEYRQ4cGWYfLMhv3yRLkkTlGxDMmEPM9VWGJb96osBcfidvgT+SE5qwoz6yg
 yJyMWwCrjYvYl+ZHwfKE3pEE4z9mrs7VLgcW9yWOF4UwFLuH0DyIEtCx/I4Uk+QG
 7F1HuV3mYZNlOwgJMZooSJ5Z9uzrn+eGzivhw4klJibFZBXqICCQjL5KvQ8wyuY7
 60Ns9+912HV/tQ==
 =sOJn
 -----END PGP SIGNATURE-----

Merge tag 'hte/for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux

Pull hardware timestamping subsystem from Thierry Reding:
 "This contains the new HTE (hardware timestamping engine) subsystem
  that has been in the works for a couple of months now.

  The infrastructure provided allows for drivers to register as hardware
  timestamp providers, while consumers will be able to request events
  that they are interested in (such as GPIOs and IRQs) to be timestamped
  by the hardware providers.

  Note that this currently supports only one provider, but there seems
  to be enough interest in this functionality and we expect to see more
  drivers added once this is merged"

[ Linus Walleij mentions the Intel PMC in the Elkhart and Tiger Lake
  platforms as another future timestamp provider ]

* tag 'hte/for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
  dt-bindings: timestamp: Correct id path
  dt-bindings: Renamed hte directory to timestamp
  hte: Uninitialized variable in hte_ts_get()
  hte: Fix off by one in hte_push_ts_ns()
  hte: Fix possible use-after-free in tegra_hte_test_remove()
  hte: Remove unused including <linux/version.h>
  MAINTAINERS: Add HTE Subsystem
  hte: Add Tegra HTE test driver
  tools: gpio: Add new hardware clock type
  gpiolib: cdev: Add hardware timestamp clock type
  gpio: tegra186: Add HTE support
  gpiolib: Add HTE support
  dt-bindings: Add HTE bindings
  hte: Add Tegra194 HTE kernel provider
  drivers: Add hardware timestamp engine (HTE) subsystem
  Documentation: Add HTE subsystem guide
2022-06-05 09:12:28 -07:00
Linus Torvalds
d0e60d46bc Bitmap patches for 5.19-rc1
This series includes the following patchsets:
  - bitmap: optimize bitmap_weight() usage(w/o bitmap_weight_cmp), from me;
  - lib/bitmap.c make bitmap_print_bitmask_to_buf parseable, from Mauro
    Carvalho Chehab;
  - include/linux/find: Fix documentation, from Anna-Maria Behnsen;
  - bitmap: fix conversion from/to fix-sized arrays, from me;
  - bitmap: Fix return values to be unsigned, from Kees Cook.
 
 It has been in linux-next for at least a week with no problems.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmKaEzYACgkQsUSA/Tof
 vsiGKwv8Dgr3G0mLbSfmHZqdFMIsmSmwhxlEH6eBNtX6vjQbGafe/Buhj/1oSY8N
 NYC4+5Br6s7MmMRth3Kp6UECdl94TS3Ka06T+lVBKkG+C+B1w1/svqUMM2ZCQF3e
 Z5R/HhR6av9X9Qb2mWSasWLkWp629NjdtRsJSDWiVt1emVVwh+iwxQnMH9VuE+ao
 z3mvaQfSRhe4h+xCZOiohzFP+0jZb1EnPrQAIVzNUjigo7mglpNvVyO7p/8LU7gD
 dIjfGmSbtsHU72J+/0lotRqjhjORl1F/EILf8pIzx5Ga7ExUGhOzGWAj7/3uZxfA
 Cp1Z/QV271MGwv/sNdSPwCCJHf51eOmsbyOyUScjb3gFRwIStEa1jB4hKwLhS5wF
 3kh4kqu3WGuIQAdxkUpDBsy3CQjAPDkvtRJorwyWGbjwa9xUETESAgH7XCCTsgWc
 0sIuldWWaxC581+fAP1Dzmo8uuWBURTaOrVmRMILQHMTw54zoFyLY+VI9gEAT9aV
 gnPr3M4F
 =U7DN
 -----END PGP SIGNATURE-----

Merge tag 'bitmap-for-5.19-rc1' of https://github.com/norov/linux

Pull bitmap updates from Yury Norov:

 - bitmap: optimize bitmap_weight() usage, from me

 - lib/bitmap.c make bitmap_print_bitmask_to_buf parseable, from Mauro
   Carvalho Chehab

 - include/linux/find: Fix documentation, from Anna-Maria Behnsen

 - bitmap: fix conversion from/to fix-sized arrays, from me

 - bitmap: Fix return values to be unsigned, from Kees Cook

It has been in linux-next for at least a week with no problems.

* tag 'bitmap-for-5.19-rc1' of https://github.com/norov/linux: (31 commits)
  nodemask: Fix return values to be unsigned
  bitmap: Fix return values to be unsigned
  KVM: x86: hyper-v: replace bitmap_weight() with hweight64()
  KVM: x86: hyper-v: fix type of valid_bank_mask
  ia64: cleanup remove_siblinginfo()
  drm/amd/pm: use bitmap_{from,to}_arr32 where appropriate
  KVM: s390: replace bitmap_copy with bitmap_{from,to}_arr64 where appropriate
  lib/bitmap: add test for bitmap_{from,to}_arr64
  lib: add bitmap_{from,to}_arr64
  lib/bitmap: extend comment for bitmap_(from,to)_arr32()
  include/linux/find: Fix documentation
  lib/bitmap.c make bitmap_print_bitmask_to_buf parseable
  MAINTAINERS: add cpumask and nodemask files to BITMAP_API
  arch/x86: replace nodes_weight with nodes_empty where appropriate
  mm/vmstat: replace cpumask_weight with cpumask_empty where appropriate
  clocksource: replace cpumask_weight with cpumask_empty in clocksource.c
  genirq/affinity: replace cpumask_weight with cpumask_empty where appropriate
  irq: mips: replace cpumask_weight with cpumask_empty where appropriate
  drm/i915/pmu: replace cpumask_weight with cpumask_empty where appropriate
  arch/x86: replace cpumask_weight with cpumask_empty where appropriate
  ...
2022-06-04 14:04:27 -07:00
Linus Torvalds
45b2e5ad68 perf tools changes for v5.19: 3rd batch
- Synthesize task events for pre-existing threads when using 'perf lock --threads',
   as we need to show task names.
 
 - Fix unwinding with ld.lld (>= version 10.0) linked objects, where
   .eh_frame_hdr and .text are in different PT_LOAD program headers, which makes
   perf record --call-graph dwarf fail with such obkects.
 
 - Check if 'perf record' hangs in the ARM SPE (Statistical Profiling Extensions)
   'perf test' entry when recording a workload with forks.
 
 - Trace physical address for Arm SPE events, needed for 'perf c2c' to locate
   the memory node for samples.
 
 - Fix sorting in percent_rmt_hitm_cmp() in 'perf c2c'.
 
 - Further support for Intel hybrid systems in the evlist and 'perf record' code.
 
 - Update IBM s/390 vendor event JSON tables.
 
 - Add metrics (JSON) for Intel Sapphirerapids.
 
 - Update metrics for Intel Alderlake.
 
 - Correct typo of sysf 'event_source' directory in the documentation.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHQEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYpqK2AAKCRCyPKLppCJ+
 J90fAPdtl123WsK9lERbzVizTUy5/s0Gqgv/DRnW6vG+NAfpAQC8fpD16Pgz2/OS
 igQqTAZ8nJwdPNU4o27I9VvtePOvCA==
 =xIiH
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v5.19-2022-06-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull more perf tools updates from Arnaldo Carvalho de Melo:

 - Synthesize task events for pre-existing threads when using 'perf lock
   --threads', as we need to show task names.

 - Fix unwinding with ld.lld (>= version 10.0) linked objects, where
  .eh_frame_hdr and .text are in different PT_LOAD program headers,
   which makes perf record --call-graph dwarf fail with such obkects.

 - Check if 'perf record' hangs in the ARM SPE (Statistical Profiling
   Extensions) 'perf test' entry when recording a workload with forks.

 - Trace physical address for Arm SPE events, needed for 'perf c2c' to
   locate the memory node for samples.

 - Fix sorting in percent_rmt_hitm_cmp() in 'perf c2c'.

 - Further support for Intel hybrid systems in the evlist and 'perf
   record' code.

 - Update IBM s/390 vendor event JSON tables.

 - Add metrics (JSON) for Intel Sapphirerapids.

 - Update metrics for Intel Alderlake.

 - Correct typo of sysf 'event_source' directory in the documentation.

* tag 'perf-tools-for-v5.19-2022-06-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf vendor events intel: Update metrics for Alderlake
  perf vendor events intel: Add metrics for Sapphirerapids
  perf c2c: Fix sorting in percent_rmt_hitm_cmp()
  perf mem: Trace physical address for Arm SPE events
  perf list: Update event description for IBM zEC12/zBC12 to latest level
  perf list: Update event description for IBM z196/z114 to latest level
  perf list: Update event description for IBM z15 to latest level
  perf list: Update event description for IBM z14 to latest level
  perf list: Update event description for IBM z13 to latest level
  perf list: Update event description for IBM z10 to latest level
  perf list: Add IBM z16 event description for s390
  perf record: Support sample-read topdown metric group for hybrid platforms
  perf lock: Change to synthesize task events
  perf unwind: Fix segbase for ld.lld linked objects
  perf test arm-spe: Check if perf-record hangs when recording workload with forks
  perf docs: Correct typo of event_sources
  perf evlist: Extend arch_evsel__must_be_in_group to support hybrid systems
2022-06-04 13:33:12 -07:00
Hangbin Liu
02f4afebf8 selftests/bpf: Add drv mode testing for xdping
As subject, we only test SKB mode for xdping at present.
Now add DRV mode for xdping.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220602032507.464453-1-liuhangbin@gmail.com
2022-06-03 14:53:34 -07:00
Yuze Chi
611edf1bac libbpf: Fix is_pow_of_2
Move the correct definition from linker.c into libbpf_internal.h.

Fixes: 0087a681fa ("libbpf: Automatically fix up BPF_MAP_TYPE_RINGBUF size, if necessary")
Reported-by: Yuze Chi <chiyuze@google.com>
Signed-off-by: Yuze Chi <chiyuze@google.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220603055156.2830463-1-irogers@google.com
2022-06-03 14:53:33 -07:00
Martin KaFai Lau
e6ff92f41b selftests/bpf: Fix tc_redirect_dtime
tc_redirect_dtime was reported flaky from time to time.  It
always fails at the udp test and complains about the bpf@tc-ingress
got a skb->tstamp when handling udp packet.  It is unexpected
because the skb->tstamp should have been cleared when crossing
different netns.

The most likely cause is that the skb is actually a tcp packet
from the earlier tcp test.  It could be the final TCP_FIN handling.

This patch tightens the skb->tstamp check in the bpf prog.  It ensures
the skb is the current testing traffic.  First, it checks that skb
matches the IPPROTO of the running test (i.e. tcp vs udp).
Second, it checks the server port (dst_ns_port).  The server
port is unique for each test (50000 + test_enum).

Also fixed a typo in test_udp_dtime(): s/P100/P101/

Fixes: c803475fd8 ("bpf: selftests: test skb->tstamp in redirect_neigh")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220601234050.2572671-1-kafai@fb.com
2022-06-03 14:53:33 -07:00
Daniel Müller
9bbdfad8a5 libbpf: Fix a couple of typos
This change fixes a couple of typos that were encountered while studying
the source code.

Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220601154025.3295035-1-deso@posteo.net
2022-06-03 14:53:33 -07:00
Linus Torvalds
c6f2f3e2c8 Add partial Loongarch architecture code
This is the majority of the loongarch architecture code, including
 the final system call interface and all core functionality.
 
 It still misses three sets of peripheral but vital patches to add
 support for other subsystems, which have yet to pass review:
 
  - The drivers/firmware/efi stub for booting from a standard UEFI
    firmware implementation. Both the original custom boot interface
    and a draft implementation of the EFI stub did not make it, so
    it is currently impossible to boot the kernel, until the
    loongarch specific portions get accepted into the UEFI subsystem
 
  - The drivers/irqchip/irq-loongson-*.c drivers are shared with the
    the MIPS port, but currently lack support for ACPI based booting,
    which will get merged through the irqchip subsystem.
 
  - Similarly, the drivers/pci/controller/pci-loongson.c needs to be
    modified for ACPI support, which will be merged through the
    PCI subsystem.
 
 While the port cannot actually be used before all the above are
 merged, having it in 5.19 helps to establish the user space ABI
 for the libc ports to build on, and to help any treewide changes
 in the mainline kernel get applied here as well. A gcc-12 based
 tool chains for build testing is now included in
 https://mirrors.edge.kernel.org/pub/tools/crosstool/.
 
 Original description from Huacai Chen at
 https://lore.kernel.org/lkml/20220603072053.35005-1-chenhuacai@loongson.cn/:
 
  "LoongArch is a new RISC ISA, which is a bit like MIPS or RISC-V.
   LoongArch includes a reduced 32-bit version (LA32R), a standard 32-bit
   version (LA32S) and a 64-bit version (LA64). LoongArch use ACPI as its
   boot protocol LoongArch-specific interrupt controllers (similar to APIC)
   are already added in the next revision of ACPI Specification (current
   revision is 6.4).
 
   This patchset is adding basic LoongArch support in mainline kernel, we
   can see a complete snapshot here:
   https://github.com/loongson/linux/tree/loongarch-next
   https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson.git/log/?h=loongarch-next
 
   Cross-compile tool chain to build kernel:
   https://github.com/loongson/build-tools/releases/download/2021.12.21/loongarch64-clfs-2022-03-03-cross-tools-gcc-glibc.tar.xz
 
   A CLFS-based Linux distro:
   https://github.com/loongson/build-tools/releases/download/2021.12.21/loongarch64-clfs-system-2022-03-03.tar.bz2
 
   Open-source tool chain which is under review (Binutils and Gcc are already upstream):
   https://github.com/loongson/binutils-gdb/tree/upstream_v3.1
   https://github.com/loongson/gcc/tree/loongarch_upstream_v6.3
   https://github.com/loongson/glibc/tree/loongarch_2_35_dev_v2.2
 
   Loongson and LoongArch documentations:
   https://github.com/loongson/LoongArch-Documentation
 
   LoongArch-specific interrupt controllers:
   https://mantis.uefi.org/mantis/view.php?id=2203
   https://mantis.uefi.org/mantis/view.php?id=2313"
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmKZ/akACgkQmmx57+YA
 GNnIARAAwUsSqjU0o8jyWb3hSTNstoZa5hszrS3NoMIvMHFfIaBbz1rre+D4xnYg
 5weLv/xewJRcrAKG9GvjiqLnbhXadTveTNmYA6a2C5pAKAElLgOEzt2DnljAUs3E
 1Gal9kM56TnA07khpcP/2NjHWOUG5mbTy+hzVcOqJvDJsuxo35Clm2OcS9Zk/Kmk
 gRCQ6bgHtyktEakAXFpyPvzt0ZOJVGVLFA7v1HIqdAZRfX7BH3nBJewSALXtdiQI
 5M/dasqXnAh6aBetLeaDYONQcNICpg/uv6LesowgU/AwepVObO1NUh5Or8zXV5je
 a79TeidaijLPBEUtV7+oa2P4n1HMfKOVsaKL6THWzLbGA6izjMOwWq/E9+iaTkra
 hnaJMPlHuPz0M9c+/Z/8rp1u0K38vz27wle+kn9jyoMI1/q9EFEys1HQYGowJnQf
 0G6RlnKLPQqRX6iNge43E91J8L0NBr8h+d9hu6ETcSBQuaO5d2asq9FLauQ2sK1j
 Qxsu6SpdbuyXVjHoPl/zfbuz8BX2t62tj+twGuRSJp13hcdo0Bf6fn/yoZCnLjfq
 kNRKTfIJwjNB93LCp5E49XEWfOfSy6KgLoeglp6T/B+gjrl2kfddCnmKfW9JBOZI
 4B5N2DEYnyYBwkgumxLMmpKzUfETBD+KqAYj+OMwD3ZfhCYMQg0=
 =hkwN
 -----END PGP SIGNATURE-----

Merge tag 'loongarch-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull initial Loongarch architecture code from Arnd Bergmann:
 "This is the majority of the loongarch architecture code, including the
  final system call interface and all core functionality.

  It still misses three sets of peripheral but vital patches to add
  support for other subsystems, which have yet to pass review:

   - The drivers/firmware/efi stub for booting from a standard UEFI
     firmware implementation. Both the original custom boot interface
     and a draft implementation of the EFI stub did not make it, so it
     is currently impossible to boot the kernel, until the loongarch
     specific portions get accepted into the UEFI subsystem

   - The drivers/irqchip/irq-loongson-*.c drivers are shared with the
     the MIPS port, but currently lack support for ACPI based booting,
     which will get merged through the irqchip subsystem.

   - Similarly, the drivers/pci/controller/pci-loongson.c needs to be
     modified for ACPI support, which will be merged through the PCI
     subsystem.

  While the port cannot actually be used before all the above are
  merged, having it in 5.19 helps to establish the user space ABI for
  the libc ports to build on, and to help any treewide changes in the
  mainline kernel get applied here as well.

  A gcc-12 based tool chains for build testing is now included in

    https://mirrors.edge.kernel.org/pub/tools/crosstool/"

Original description from Huacai Chen:
 "LoongArch is a new RISC ISA, which is a bit like MIPS or RISC-V.
  LoongArch includes a reduced 32-bit version (LA32R), a standard 32-bit
  version (LA32S) and a 64-bit version (LA64). LoongArch use ACPI as its
  boot protocol LoongArch-specific interrupt controllers (similar to APIC)
  are already added in the next revision of ACPI Specification (current
  revision is 6.4).

  This patchset is adding basic LoongArch support in mainline kernel, we
  can see a complete snapshot here:

    https://github.com/loongson/linux/tree/loongarch-next
    https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson.git/log/?h=loongarch-next

  Cross-compile tool chain to build kernel:

    https://github.com/loongson/build-tools/releases/download/2021.12.21/loongarch64-clfs-2022-03-03-cross-tools-gcc-glibc.tar.xz

  A CLFS-based Linux distro:

    https://github.com/loongson/build-tools/releases/download/2021.12.21/loongarch64-clfs-system-2022-03-03.tar.bz2

  Open-source tool chain which is under review (Binutils and Gcc are already upstream):

    https://github.com/loongson/binutils-gdb/tree/upstream_v3.1
    https://github.com/loongson/gcc/tree/loongarch_upstream_v6.3
    https://github.com/loongson/glibc/tree/loongarch_2_35_dev_v2.2

  Loongson and LoongArch documentations:

    https://github.com/loongson/LoongArch-Documentation

  LoongArch-specific interrupt controllers:

    https://mantis.uefi.org/mantis/view.php?id=2203
    https://mantis.uefi.org/mantis/view.php?id=2313"

Link: https://lore.kernel.org/lkml/20220603072053.35005-1-chenhuacai@loongson.cn/

* tag 'loongarch-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (24 commits)
  MAINTAINERS: Add maintainer information for LoongArch
  LoongArch: Add Loongson-3 default config file
  LoongArch: Add Non-Uniform Memory Access (NUMA) support
  LoongArch: Add multi-processor (SMP) support
  LoongArch: Add VDSO and VSYSCALL support
  LoongArch: Add some library functions
  LoongArch: Add misc common routines
  LoongArch: Add ELF and module support
  LoongArch: Add signal handling support
  LoongArch: Add system call support
  LoongArch: Add memory management
  LoongArch: Add process management
  LoongArch: Add exception/interrupt handling
  LoongArch: Add boot and setup routines
  LoongArch: Add other common headers
  LoongArch: Add atomic/locking headers
  LoongArch: Add CPU definition headers
  LoongArch: Add build infrastructure
  LoongArch: Add writecombine support for drm
  LoongArch: Add ELF-related definitions
  ...
2022-06-03 14:09:21 -07:00
Linus Torvalds
21873bd66b arm64 fixes for 5.19-rc1:
- Initialise jump labels before setup_machine_fdt(), needed by commit
   f5bda35fba ("random: use static branch for crng_ready()").
 
 - Sparse warnings: missing prototype, incorrect __user annotation.
 
 - Skip SVE kselftest if not sufficient vector lengths supported.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmKaQ/8ACgkQa9axLQDI
 XvE+eRAAjVQr4ehVgT75SIYq9cv/6A8cAlYzK4hi7cwhFLTii/9flZLmqTfjUwbY
 Ifnm1/c/ka11vAC4JfzhEcAjIm5jJ7Wlw4XqMhGC2woTjmCrDxSlG1qrdt1zZlgI
 H7RXCjCpfstCWFoZy4uDaaWZYgN6c91/IPuuamBUfADXu5NbeD9yfZALYUhiE9FP
 yZ61BLDP0f8l0x5PMmDATidZ/vtIp+Hfiq1iezX0VugIY4+v6nbi6/Ba20Wc419v
 cmDgjRc1ODaWnuWLUTiriUlt2VEtBs3LrGXa81WSrTKR9+DXih4Gwj4orpW7Rqx3
 CFeSgVCDd5V9/XGEExN5GtMtiolDWNG8dA58jwwaWHWgp8AveSGY8AxPT+8/30/S
 FQyVGjxrX1JUkIya3lbVuDLKHdzCKEX0O83dFQSm3bP6lUMZUCYCcFNpVpSYPsQr
 ZZea9dNWhwiI82z19p7igfWKwerBMdUeptxMm247jKmdz4H/i6xGtK6nJc731zCZ
 kjxoI6HCYAMH/r9kgWYldgeov/UNQXi5Q2ftsXkUepPPjpEd5YBJfkL6CzKgQ6u5
 wtOTI8fainjPVWPGbcpHnEjlwG5UMY80jcJpfkD54g9N5WxunCWf7L3OZj77toTp
 Z2kU2b189Di4bNOV9bCKFqrecqajNychMhXcc5anMfdN+284mdg=
 =25eW
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:
 "Most of issues addressed were introduced during this merging window.

   - Initialise jump labels before setup_machine_fdt(), needed by commit
     f5bda35fba ("random: use static branch for crng_ready()").

   - Sparse warnings: missing prototype, incorrect __user annotation.

   - Skip SVE kselftest if not sufficient vector lengths supported"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  kselftest/arm64: signal: Skip SVE signal test if not enough VLs supported
  arm64: Initialize jump labels before setup_machine_fdt()
  arm64: hibernate: Fix syntax errors in comments
  arm64: Remove the __user annotation for the restore_za_context() argument
  ftrace/fgraph: fix increased missing-prototypes warnings
2022-06-03 14:05:34 -07:00
Zhengjun Xing
1bcca2b1bd perf vendor events intel: Update metrics for Alderlake
Update JSON metrics for Alderlake to perf.

It included both P-core and E-core metrics.

P-core metrics based on TMA 4.4 (TMA_Metrics-full.csv)
E-core metrics based on E-core TMA 2.0 (E-core_TMA_Metrics.csv)

https://download.01.org/perfmon/

Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220528095933.1784141-2-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-03 21:45:32 +02:00
Zhengjun Xing
122657820f perf vendor events intel: Add metrics for Sapphirerapids
Add JSON metrics for Sapphirerapids to perf.

Based on TMA4.4 metrics.

https://download.01.org/perfmon/TMA_Metrics-full.csv

Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220528095933.1784141-1-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-03 21:45:15 +02:00
Leo Yan
b24192a173 perf c2c: Fix sorting in percent_rmt_hitm_cmp()
The function percent_rmt_hitm_cmp() wrongly uses local HITMs for
sorting remote HITMs.

Since this function is to sort cache lines for remote HITMs, this patch
changes to use 'rmt_hitm' field for correct sorting.

Fixes: 9cb3500afc ("perf c2c report: Add hitm/store percent related sort keys")
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Joe Mario <jmario@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220530084253.750190-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-03 21:40:15 +02:00
Leo Yan
62e6eb8d54 perf mem: Trace physical address for Arm SPE events
Currently, Arm SPE events don't trace physical address, therefore, the
field 'phys_addr' is always zero in synthesized memory samples.  This
leads to perf c2c tool cannot locate the memory node for samples.

This patch enables configuration 'pa_enable' for Arm SPE events, so the
physical address packet can be traced, finally this can allow perf c2c
tool to locate properly for memory node.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220530083645.253432-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-03 21:39:27 +02:00
Thomas Richter
882f54243a perf list: Update event description for IBM zEC12/zBC12 to latest level
Update IBM zEC12/zBC12 event counter description to the latest level
as described in the documents
1. SA23-2260-07:
   "The Load-Program-Parameter and the CPU-Measurement Facilities."
   released on May, 2022
for the following counter sets:
   * Basic counter set
   * Problem counter set
   * Crypto counter set

2. SA23-2261-07:
   "The CPU-Measurement Facility Extended Counters Definition
   for z10, z196/z114, zEC12/zBC12, z13/z13s, z14, z15 and z16"
   released on April 29, 2022
for the following counter sets:
   * Extended counter set
   * MT-Diagnostic counter set

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20220531092706.1931503-7-tmricht@linux.ibm.com
Cc: acme@kernel.org
Cc: gor@linux.ibm.com
Cc: hca@linux.ibm.com
Cc: svens@linux.ibm.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
2022-06-03 21:37:27 +02:00
Thomas Richter
dfeab63acd perf list: Update event description for IBM z196/z114 to latest level
Update IBM z196/z114 event counter description to the latest level
as described in the documents
1. SA23-2260-07:
   "The Load-Program-Parameter and the CPU-Measurement Facilities."
   released on May, 2022
for the following counter sets:
   * Basic counter set
   * Problem counter set
   * Crypto counter set

2. SA23-2261-07:
   "The CPU-Measurement Facility Extended Counters Definition
    for z10, z196/z114, zEC12/zBC12, z13/z13s, z14, z15 and z16"
released on April 29, 2022
for the following counter sets:
   * Extended counter set
   * MT-Diagnostic counter set

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20220531092706.1931503-6-tmricht@linux.ibm.com
Cc: acme@kernel.org
Cc: gor@linux.ibm.com
Cc: hca@linux.ibm.com
Cc: svens@linux.ibm.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
2022-06-03 21:37:27 +02:00