Commit Graph

8476 Commits

Author SHA1 Message Date
Linus Torvalds
224478289c Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
 "ARM fixes:

   - Another state update on exit to userspace fix

   - Prevent the creation of mixed 32/64 VMs

   - Fix regression with irqbypass not restarting the guest on failed
     connect

   - Fix regression with debug register decoding resulting in
     overlapping access

   - Commit exception state on exit to usrspace

   - Fix the MMU notifier return values

   - Add missing 'static' qualifiers in the new host stage-2 code

  x86 fixes:

   - fix guest missed wakeup with assigned devices

   - fix WARN reported by syzkaller

   - do not use BIT() in UAPI headers

   - make the kvm_amd.avic parameter bool

  PPC fixes:

   - make halt polling heuristics consistent with other architectures

  selftests:

   - various fixes

   - new performance selftest memslot_perf_test

   - test UFFD minor faults in demand_paging_test"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (44 commits)
  selftests: kvm: fix overlapping addresses in memslot_perf_test
  KVM: X86: Kill off ctxt->ud
  KVM: X86: Fix warning caused by stale emulation context
  KVM: X86: Use kvm_get_linear_rip() in single-step and #DB/#BP interception
  KVM: x86/mmu: Fix comment mentioning skip_4k
  KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
  KVM: rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCK
  KVM: x86: add start_assignment hook to kvm_x86_ops
  KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switch
  selftests: kvm: do only 1 memslot_perf_test run by default
  KVM: X86: Use _BITUL() macro in UAPI headers
  KVM: selftests: add shared hugetlbfs backing source type
  KVM: selftests: allow using UFFD minor faults for demand paging
  KVM: selftests: create alias mappings when using shared memory
  KVM: selftests: add shmem backing source type
  KVM: selftests: refactor vm_mem_backing_src_type flags
  KVM: selftests: allow different backing source types
  KVM: selftests: compute correct demand paging size
  KVM: selftests: simplify setup_demand_paging error handling
  KVM: selftests: Print a message if /dev/kvm is missing
  ...
2021-05-29 06:02:25 -10:00
Paolo Bonzini
000ac42953 selftests: kvm: fix overlapping addresses in memslot_perf_test
vm_create allocates memory and maps it close to GPA.  This memory
is separate from what is allocated in subsequent calls to
vm_userspace_mem_region_add, so it is incorrect to pass the
test memory size to vm_create_default.  Just pass a small
fixed amount of memory which can be used later for page table,
otherwise GPAs are already allocated at MEM_GPA and the
test aborts.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-29 06:28:06 -04:00
Paolo Abeni
69ca3d29a7 mptcp: update selftest for fallback due to OoO
The previous commit noted that we can have fallback
scenario due to OoO (or packet drop). Update the self-tests
accordingly

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28 13:51:40 -07:00
Yonghong Song
fc8c262e0e bpf, docs: Add llvm_reloc.rst to explain llvm bpf relocations
LLVM upstream commit https://reviews.llvm.org/D102712 made some changes
to bpf relocations to make them llvm linker lld friendly. The scope of
existing relocations R_BPF_64_{64,32} is narrowed and new relocations
R_BPF_64_{ABS32,ABS64,NODYLD32} are introduced.

Let us add some documentation about llvm bpf relocations so people can
understand how to resolve them properly in their respective tools.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210526152457.335210-1-yhs@fb.com
2021-05-28 22:12:04 +02:00
Christian Brauner
15845cbcd1 test: add openat2() test for invalid upper 32 bit flag value
Test that openat2() rejects unknown flags in the upper 32 bit range.

Link: https://lore.kernel.org/r/20210528092417.3942079-4-brauner@kernel.org
Cc: Richard Guy Briggs <rgb@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-05-28 17:44:37 +02:00
Jiri Pirko
557c4d2f78 selftests: devlink_lib: add check for devlink device existence
If user passes devlink handle over DEVLINK_DEV variable, check if the
device exists.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20210527105515.790330-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-27 14:49:07 -07:00
Jakub Kicinski
5ada57a9a6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
cdc-wdm: s/kill_urbs/poison_urbs/ to fix build

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-27 09:55:10 -07:00
Paolo Bonzini
fb0f94794b selftests: kvm: do only 1 memslot_perf_test run by default
The test takes a long time with the current implementation of
memslots, so cut the run time a bit.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27 07:45:57 -04:00
Axel Rasmussen
33090a884d KVM: selftests: add shared hugetlbfs backing source type
This lets us run the demand paging test on top of a shared
hugetlbfs-backed area. The "shared" is key, as this allows us to
exercise userfaultfd minor faults on hugetlbfs.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Message-Id: <20210519200339.829146-11-axelrasmussen@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27 07:45:57 -04:00
Axel Rasmussen
a4b9722a59 KVM: selftests: allow using UFFD minor faults for demand paging
UFFD handling of MINOR faults is a new feature whose use case is to
speed up demand paging (compared to MISSING faults). So, it's
interesting to let this selftest exercise this new mode.

Modify the demand paging test to have the option of using UFFD minor
faults, as opposed to missing faults. Now, when turning on userfaultfd
with '-u', the desired mode has to be specified ("MISSING" or "MINOR").

If we're in minor mode, before registering, prefault via the *alias*.
This way, the guest will trigger minor faults, instead of missing
faults, and we can UFFDIO_CONTINUE to resolve them.

Modify the page fault handler function to use the right ioctl depending
on the mode we're running in. In MINOR mode, use UFFDIO_CONTINUE.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Message-Id: <20210519200339.829146-10-axelrasmussen@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27 07:45:57 -04:00
Axel Rasmussen
94f3f2b31a KVM: selftests: create alias mappings when using shared memory
When a memory region is added with a src_type specifying that it should
use some kind of shared memory, also create an alias mapping to the same
underlying physical pages.

And, add an API so tests can get access to these alias addresses.
Basically, for a guest physical address, let us look up the analogous
host *alias* address.

In a future commit, we'll modify the demand paging test to take
advantage of this to exercise UFFD minor faults. The idea is, we
pre-fault the underlying pages *via the alias*. When the *guest*
faults, it gets a "minor" fault (PTEs don't exist yet, but a page is
already in the page cache). Then, the userfaultfd theads can handle the
fault: they could potentially modify the underlying memory *via the
alias* if they wanted to, and then they install the PTEs and let the
guest carry on via a UFFDIO_CONTINUE ioctl.

Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Message-Id: <20210519200339.829146-9-axelrasmussen@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27 07:45:56 -04:00
Axel Rasmussen
c9befd5958 KVM: selftests: add shmem backing source type
This lets us run the demand paging test on top of a shmem-backed area.
In follow-up commits, we'll 1) leverage this new capability to create an
alias mapping, and then 2) use the alias mapping to exercise UFFD minor
faults.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Message-Id: <20210519200339.829146-8-axelrasmussen@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27 07:45:56 -04:00
Axel Rasmussen
b3784bc28c KVM: selftests: refactor vm_mem_backing_src_type flags
Each struct vm_mem_backing_src_alias has a flags field, which denotes
the flags used to mmap() an area of that type. Previously, this field
never included MAP_PRIVATE | MAP_ANONYMOUS, because
vm_userspace_mem_region_add assumed that *all* types would always use
those flags, and so it hardcoded them.

In a follow-up commit, we'll add a new type: shmem. Areas of this type
must not have MAP_PRIVATE | MAP_ANONYMOUS, and instead they must have
MAP_SHARED.

So, refactor things. Make it so that the flags field of
struct vm_mem_backing_src_alias really is a complete set of flags, and
don't add in any extras in vm_userspace_mem_region_add. This will let us
easily tack on shmem.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Message-Id: <20210519200339.829146-7-axelrasmussen@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27 07:45:56 -04:00
Axel Rasmussen
0368c2c1b4 KVM: selftests: allow different backing source types
Add an argument which lets us specify a different backing memory type
for the test. The default is just to use anonymous, matching existing
behavior.

This is in preparation for testing UFFD minor faults. For that, we'll
need to use a new backing memory type which is setup with MAP_SHARED.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Message-Id: <20210519200339.829146-6-axelrasmussen@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27 07:45:56 -04:00
Axel Rasmussen
32ffa4f71e KVM: selftests: compute correct demand paging size
This is a preparatory commit needed before we can use different kinds of
backing pages for guest memory.

Previously, we used perf_test_args.host_page_size, which is the host's
native page size (commonly 4K). For VM_MEM_SRC_ANONYMOUS this turns out
to be okay, but in a follow-up commit we want to allow using different
kinds of backing memory.

Take VM_MEM_SRC_ANONYMOUS_HUGETLB for example. Without this change, if
we used that backing page type, when we issued a UFFDIO_COPY ioctl we'd
only do so with 4K, rather than the full 2M of a backing hugepage. In
this case, UFFDIO_COPY returns -EINVAL (__mcopy_atomic_hugetlb checks
the size).

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Message-Id: <20210519200339.829146-5-axelrasmussen@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27 07:45:55 -04:00
Axel Rasmussen
25408e5a02 KVM: selftests: simplify setup_demand_paging error handling
A small cleanup. Our caller writes:

  r = setup_demand_paging(...);
  if (r < 0) exit(-r);

Since we're just going to exit anyway, instead of returning an error we
can just re-use TEST_ASSERT. This makes the caller simpler, as well as
the function itself - no need to write our branches, etc.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Message-Id: <20210519200339.829146-3-axelrasmussen@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27 07:45:55 -04:00
David Matlack
2aab4b355c KVM: selftests: Print a message if /dev/kvm is missing
If a KVM selftest is run on a machine without /dev/kvm, it will exit
silently. Make it easy to tell what's happening by printing an error
message.

Opportunistically consolidate all codepaths that open /dev/kvm into a
single function so they all print the same message.

This slightly changes the semantics of vm_is_unrestricted_guest() by
changing a TEST_ASSERT() to exit(KSFT_SKIP). However
vm_is_unrestricted_guest() is only called in one place
(x86_64/mmio_warning_test.c) and that is to determine if the test should
be skipped or not.

Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20210511202120.1371800-1-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27 07:45:55 -04:00
Axel Rasmussen
c887d6a126 KVM: selftests: trivial comment/logging fixes
Some trivial fixes I found while touching related code in this series,
factored out into a separate commit for easier reviewing:

- s/gor/got/ and add a newline in demand_paging_test.c
- s/backing_src/src_type/ in a comment to be consistent with the real
  function signature in kvm_util.c

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Message-Id: <20210519200339.829146-2-axelrasmussen@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27 07:45:55 -04:00
David Matlack
a10453c038 KVM: selftests: Fix hang in hardware_disable_test
If /dev/kvm is not available then hardware_disable_test will hang
indefinitely because the child process exits before posting to the
semaphore for which the parent is waiting.

Fix this by making the parent periodically check if the child has
exited. We have to be careful to forward the child's exit status to
preserve a KSFT_SKIP status.

I considered just checking for /dev/kvm before creating the child
process, but there are so many other reasons why the child could exit
early that it seemed better to handle that as general case.

Tested:

$ ./hardware_disable_test
/dev/kvm not available, skipping test
$ echo $?
4
$ modprobe kvm_intel
$ ./hardware_disable_test
$ echo $?
0

Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20210514230521.2608768-1-dmatlack@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27 07:45:55 -04:00
David Matlack
50bc913d52 KVM: selftests: Ignore CPUID.0DH.1H in get_cpuid_test
Similar to CPUID.0DH.0H this entry depends on the vCPU's XCR0 register
and IA32_XSS MSR. Since this test does not control for either before
assigning the vCPU's CPUID, these entries will not necessarily match
the supported CPUID exposed by KVM.

This fixes get_cpuid_test on Cascade Lake CPUs.

Suggested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20210519211345.3944063-1-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27 07:45:54 -04:00
David Matlack
ef4c9f4f65 KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn()
vm_get_max_gfn() casts vm->max_gfn from a uint64_t to an unsigned int,
which causes the upper 32-bits of the max_gfn to get truncated.

Nobody noticed until now likely because vm_get_max_gfn() is only used
as a mechanism to create a memslot in an unused region of the guest
physical address space (the top), and the top of the 32-bit physical
address space was always good enough.

This fix reveals a bug in memslot_modification_stress_test which was
trying to create a dummy memslot past the end of guest physical memory.
Fix that by moving the dummy memslot lower.

Fixes: 52200d0d94 ("KVM: selftests: Remove duplicate guest mode handling")
Reviewed-by: Venkatesh Srinivas <venkateshs@chromium.org>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20210521173828.1180619-1-dmatlack@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27 07:45:54 -04:00
Maciej S. Szmigiero
cad347fab1 KVM: selftests: add a memslot-related performance benchmark
This benchmark contains the following tests:
* Map test, where the host unmaps guest memory while the guest writes to
it (maps it).

The test is designed in a way to make the unmap operation on the host
take a negligible amount of time in comparison with the mapping
operation in the guest.

The test area is actually split in two: the first half is being mapped
by the guest while the second half in being unmapped by the host.
Then a guest <-> host sync happens and the areas are reversed.

* Unmap test which is broadly similar to the above map test, but it is
designed in an opposite way: to make the mapping operation in the guest
take a negligible amount of time in comparison with the unmap operation
on the host.
This test is available in two variants: with per-page unmap operation
or a chunked one (using 2 MiB chunk size).

* Move active area test which involves moving the last (highest gfn)
memslot a bit back and forth on the host while the guest is
concurrently writing around the area being moved (including over the
moved memslot).

* Move inactive area test which is similar to the previous move active
area test, but now guest writes all happen outside of the area being
moved.

* Read / write test in which the guest writes to the beginning of each
page of the test area while the host writes to the middle of each such
page.
Then each side checks the values the other side has written.
This particular test is not expected to give different results depending
on particular memslots implementation, it is meant as a rough sanity
check and to provide insight on the spread of test results expected.

Each test performs its operation in a loop until a test period ends
(this is 5 seconds by default, but it is configurable).
Then the total count of loops done is divided by the actual elapsed
time to give the test result.

The tests have a configurable memslot cap with the "-s" test option, by
default the system maximum is used.
Each test is repeated a particular number of times (by default 20
times), the best result achieved is printed.

The test memory area is divided equally between memslots, the reminder
is added to the last memslot.
The test area size does not depend on the number of memslots in use.

The tests also measure the time that it took to add all these memslots.
The best result from the tests that use the whole test area is printed
after all the requested tests are done.

In general, these tests are designed to use as much memory as possible
(within reason) while still doing 100+ loops even on high memslot counts
with the default test length.
Increasing the test runtime makes it increasingly more likely that some
event will happen on the system during the test run, which might lower
the test result.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <8d31bb3d92bc8fa33a9756fa802ee14266ab994e.1618253574.git.maciej.szmigiero@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27 07:45:54 -04:00
Maciej S. Szmigiero
22721a5610 KVM: selftests: Keep track of memslots more efficiently
The KVM selftest framework was using a simple list for keeping track of
the memslots currently in use.
This resulted in lookups and adding a single memslot being O(n), the
later due to linear scanning of the existing memslot set to check for
the presence of any conflicting entries.

Before this change, benchmarking high count of memslots was more or less
impossible as pretty much all the benchmark time was spent in the
selftest framework code.

We can simply use a rbtree for keeping track of both of gfn and hva.
We don't need an interval tree for hva here as we can't have overlapping
memslots because we allocate a completely new memory chunk for each new
memslot.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-Id: <b12749d47ee860468240cf027412c91b76dbe3db.1618253574.git.maciej.szmigiero@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27 07:45:54 -04:00
Paolo Bonzini
a13534d667 selftests: kvm: fix potential issue with ELF loading
vm_vaddr_alloc() sets up GVA to GPA mapping page by page; therefore, GPAs
may not be continuous if same memslot is used for data and page table allocation.

kvm_vm_elf_load() however expects a continuous range of HVAs (and thus GPAs)
because it does not try to read file data page by page.  Fix this mismatch
by allocating memory in one step.

Reported-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27 07:45:53 -04:00
Zhenzhong Duan
39fe2fc966 selftests: kvm: make allocation of extra memory take effect
The extra memory pages is missed to be allocated during VM creating.
perf_test_util and kvm_page_table_test use it to alloc extra memory
currently.

Fix it by adding extra_mem_pages to the total memory calculation before
allocate.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20210512043107.30076-1-zhenzhong.duan@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27 07:45:53 -04:00
Linus Torvalds
d7c5303fbc Merge tag 'net-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Networking fixes for 5.13-rc4, including fixes from bpf, netfilter,
  can and wireless trees. Notably including fixes for the recently
  announced "FragAttacks" WiFi vulnerabilities. Rather large batch,
  touching some core parts of the stack, too, but nothing hair-raising.

  Current release - regressions:

   - tipc: make node link identity publish thread safe

   - dsa: felix: re-enable TAS guard band mode

   - stmmac: correct clocks enabled in stmmac_vlan_rx_kill_vid()

   - stmmac: fix system hang if change mac address after interface
     ifdown

  Current release - new code bugs:

   - mptcp: avoid OOB access in setsockopt()

   - bpf: Fix nested bpf_bprintf_prepare with more per-cpu buffers

   - ethtool: stats: fix a copy-paste error - init correct array size

  Previous releases - regressions:

   - sched: fix packet stuck problem for lockless qdisc

   - net: really orphan skbs tied to closing sk

   - mlx4: fix EEPROM dump support

   - bpf: fix alu32 const subreg bound tracking on bitwise operations

   - bpf: fix mask direction swap upon off reg sign change

   - bpf, offload: reorder offload callback 'prepare' in verifier

   - stmmac: Fix MAC WoL not working if PHY does not support WoL

   - packetmmap: fix only tx timestamp on request

   - tipc: skb_linearize the head skb when reassembling msgs

  Previous releases - always broken:

   - mac80211: address recent "FragAttacks" vulnerabilities

   - mac80211: do not accept/forward invalid EAPOL frames

   - mptcp: avoid potential error message floods

   - bpf, ringbuf: deny reserve of buffers larger than ringbuf to
     prevent out of buffer writes

   - bpf: forbid trampoline attach for functions with variable arguments

   - bpf: add deny list of functions to prevent inf recursion of tracing
     programs

   - tls splice: check SPLICE_F_NONBLOCK instead of MSG_DONTWAIT

   - can: isotp: prevent race between isotp_bind() and
     isotp_setsockopt()

   - netfilter: nft_set_pipapo_avx2: Add irq_fpu_usable() check,
     fallback to non-AVX2 version

  Misc:

   - bpf: add kconfig knob for disabling unpriv bpf by default"

* tag 'net-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (172 commits)
  net: phy: Document phydev::dev_flags bits allocation
  mptcp: validate 'id' when stopping the ADD_ADDR retransmit timer
  mptcp: avoid error message on infinite mapping
  mptcp: drop unconditional pr_warn on bad opt
  mptcp: avoid OOB access in setsockopt()
  nfp: update maintainer and mailing list addresses
  net: mvpp2: add buffer header handling in RX
  bnx2x: Fix missing error code in bnx2x_iov_init_one()
  net: zero-initialize tc skb extension on allocation
  net: hns: Fix kernel-doc
  sctp: fix the proc_handler for sysctl encap_port
  sctp: add the missing setting for asoc encap_port
  bpf, selftests: Adjust few selftest result_unpriv outcomes
  bpf: No need to simulate speculative domain for immediates
  bpf: Fix mask direction swap upon off reg sign change
  bpf: Wrap aux data inside bpf_sanitize_info container
  bpf: Fix BPF_LSM kconfig symbol dependency
  selftests/bpf: Add test for l3 use of bpf_redirect_peer
  bpftool: Add sock_release help info for cgroup attach/prog load command
  net: dsa: microchip: enable phy errata workaround on 9567
  ...
2021-05-26 17:44:49 -10:00
Florent Revest
d6a6a55518 libbpf: Move BPF_SEQ_PRINTF and BPF_SNPRINTF to bpf_helpers.h
These macros are convenient wrappers around the bpf_seq_printf and
bpf_snprintf helpers. They are currently provided by bpf_tracing.h which
targets low level tracing primitives. bpf_helpers.h is a better fit.

The __bpf_narg and __bpf_apply are needed in both files and provided
twice. __bpf_empty isn't used anywhere and is removed from bpf_tracing.h

Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210526164643.2881368-1-revest@chromium.org
2021-05-26 10:45:41 -07:00
Hangbin Liu
d232924762 selftests/bpf: Add xdp_redirect_multi test
Add a bpf selftest for new helper xdp_redirect_map_multi(). In this
test there are 3 forward groups and 1 exclude group. The test will
redirect each interface's packets to all the interfaces in the forward
group, and exclude the interface in exclude map.

Two maps (DEVMAP, DEVMAP_HASH) and two xdp modes (generic, drive) will
be tested. XDP egress program will also be tested by setting pkt src MAC
to egress interface's MAC address.

For more test details, you can find it in the test script. Here is
the test result.
]# time ./test_xdp_redirect_multi.sh
Pass: xdpgeneric arp(F_BROADCAST) ns1-1
Pass: xdpgeneric arp(F_BROADCAST) ns1-2
Pass: xdpgeneric arp(F_BROADCAST) ns1-3
Pass: xdpgeneric IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-1
Pass: xdpgeneric IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-2
Pass: xdpgeneric IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-3
Pass: xdpgeneric IPv6 (no flags) ns1-1
Pass: xdpgeneric IPv6 (no flags) ns1-2
Pass: xdpdrv arp(F_BROADCAST) ns1-1
Pass: xdpdrv arp(F_BROADCAST) ns1-2
Pass: xdpdrv arp(F_BROADCAST) ns1-3
Pass: xdpdrv IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-1
Pass: xdpdrv IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-2
Pass: xdpdrv IPv4 (F_BROADCAST|F_EXCLUDE_INGRESS) ns1-3
Pass: xdpdrv IPv6 (no flags) ns1-1
Pass: xdpdrv IPv6 (no flags) ns1-2
Pass: xdpegress mac ns1-2
Pass: xdpegress mac ns1-3
Summary: PASS 18, FAIL 0

real    1m18.321s
user    0m0.123s
sys     0m0.350s

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210519090747.1655268-5-liuhangbin@gmail.com
2021-05-26 09:46:16 +02:00
Andrii Nakryiko
bad2e478af selftests/bpf: Turn on libbpf 1.0 mode and fix all IS_ERR checks
Turn ony libbpf 1.0 mode. Fix all the explicit IS_ERR checks that now will be
broken because libbpf returns NULL on error (and sets errno). Fix
ASSERT_OK_PTR and ASSERT_ERR_PTR to work for both old mode and new modes and
use them throughout selftests. This is trivial to do by using
libbpf_get_error() API that all libbpf users are supposed to use, instead of
IS_ERR checks.

A bunch of checks also did explicit -1 comparison for various fd-returning
APIs. Such checks are replaced with >= 0 or < 0 cases.

There were also few misuses of bpf_object__find_map_by_name() in test_maps.
Those are fixed in this patch as well.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210525035935.1461796-3-andrii@kernel.org
2021-05-25 17:32:35 -07:00
David S. Miller
f5d287126f Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2021-05-26

The following pull-request contains BPF updates for your *net* tree.

We've added 14 non-merge commits during the last 14 day(s) which contain
a total of 17 files changed, 513 insertions(+), 231 deletions(-).

The main changes are:

1) Fix bpf_skb_change_head() helper to reset mac_len, from Jussi Maki.

2) Fix masking direction swap upon off-reg sign change, from Daniel Borkmann.

3) Fix BPF offloads in verifier by reordering driver callback, from Yinjun Zhang.

4) BPF selftest for ringbuf mmap ro/rw restrictions, from Andrii Nakryiko.

5) Follow-up fixes to nested bprintf per-cpu buffers, from Florent Revest.

6) Fix bpftool sock_release attach point help info, from Liu Jian.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-25 15:59:24 -07:00
Daniel Borkmann
1bad6fd52b bpf, selftests: Adjust few selftest result_unpriv outcomes
Given we don't need to simulate the speculative domain for registers with
immediates anymore since the verifier uses direct imm-based rewrites instead
of having to mask, we can also lift a few cases that were previously rejected.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-05-25 22:08:53 +02:00
Mark Brown
63ebdb77af kselftest/arm64: Add missing newline to SVE test skipping output
The newline is expected to come from the caller but got missed for this
test.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20210518163331.38268-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-25 19:19:29 +01:00
Jussi Maki
6fd5fb6382 selftests/bpf: Add test for l3 use of bpf_redirect_peer
Add a test case for using bpf_skb_change_head() in combination with
bpf_redirect_peer() to redirect a packet from a L3 device to veth and back.

The test uses a BPF program that adds L2 headers to the packet coming
from a L3 device and then calls bpf_redirect_peer() to redirect the packet
to a veth device. The test fails as skb->mac_len is not set properly and
thus the ethernet headers are not properly skb_pull'd in cls_bpf_classify(),
causing tcp_v4_rcv() to point the TCP header into middle of the IP header.

Signed-off-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210525102955.2811090-1-joamaki@gmail.com
2021-05-25 18:00:28 +02:00
Denis Salopek
49c299b694 selftests/bpf: Add bpf_lookup_and_delete_elem tests
Add bpf selftests and extend existing ones for a new function
bpf_lookup_and_delete_elem() for (percpu) hash and (percpu) LRU hash map
types.
In test_lru_map and test_maps we add an element, lookup_and_delete it,
then check whether it's deleted.
The newly added lookup_and_delete prog tests practically do the same
thing but additionally use a BPF program to change the value of the
element for LRU maps.

Signed-off-by: Denis Salopek <denis.salopek@sartura.hr>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/d30d3e0060c1f750e133579623cf1c60ff58f3d9.1620763117.git.denis.salopek@sartura.hr
2021-05-24 13:30:52 -07:00
Tejun Heo
c2a1197154 Merge branch 'for-5.13-fixes' into for-5.14 2021-05-24 13:43:56 -04:00
Davide Caratti
3a62fed2fd net/sched: fq_pie: re-factor fix for fq_pie endless loop
the patch that fixed an endless loop in_fq_pie_init() was not considering
that 65535 is a valid class id. The correct bugfix for this infinite loop
is to change 'idx' to become an u32, like Colin proposed in the past [1].

Fix this as follows:
 - restore 65536 as maximum possible values of 'flows_cnt'
 - use u32 'idx' when iterating on 'q->flows'
 - fix the TDC selftest

This reverts commit bb2f930d6d.

[1] https://lore.kernel.org/netdev/20210407163808.499027-1-colin.king@canonical.com/

CC: Colin Ian King <colin.king@canonical.com>
CC: stable@vger.kernel.org
Fixes: bb2f930d6d ("net/sched: fix infinite loop in sch_fq_pie")
Fixes: ec97ecf1eb ("net: sched: add Flow Queue PIE packet scheduler")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-23 17:16:09 -07:00
Linus Torvalds
28ceac6959 Merge tag 'powerpc-5.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:

 - Fix breakage of strace (and other ptracers etc.) when using the new
   scv ABI (Power9 or later with glibc >= 2.33).

 - Fix early_ioremap() on 64-bit, which broke booting on some machines.

Thanks to Dmitry V. Levin, Nicholas Piggin, Alexey Kardashevskiy, and
Christophe Leroy.

* tag 'powerpc-5.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s/syscall: Fix ptrace syscall info with scv syscalls
  powerpc/64s/syscall: Use pt_regs.trap to distinguish syscall ABI difference between sc and scv syscalls
  powerpc: Fix early setup to make early_ioremap() work
2021-05-23 06:07:33 -10:00
Jiapeng Chong
c67454615c selftests/powerpc: Fix duplicate included pthread.h
Clean up the following includecheck warning:

./tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c: pthread.h is
included more than once.

No functional change.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1620903820-68213-1-git-send-email-jiapeng.chong@linux.alibaba.com
2021-05-23 20:51:34 +10:00
Yang Yingliang
4d1cd3b2c5 tools/testing/selftests/exec: fix link error
Fix the link error by adding '-static':

  gcc -Wall  -Wl,-z,max-page-size=0x1000 -pie load_address.c -o /home/yang/linux/tools/testing/selftests/exec/load_address_4096
  /usr/bin/ld: /tmp/ccopEGun.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `stderr@@GLIBC_2.17' which may bind externally can not be used when making a shared object; recompile with -fPIC
  /usr/bin/ld: /tmp/ccopEGun.o(.text+0x158): unresolvable R_AARCH64_ADR_PREL_PG_HI21 relocation against symbol `stderr@@GLIBC_2.17'
  /usr/bin/ld: final link failed: bad value
  collect2: error: ld returned 1 exit status
  make: *** [Makefile:25: tools/testing/selftests/exec/load_address_4096] Error 1

Link: https://lkml.kernel.org/r/20210514092422.2367367-1-yangyingliang@huawei.com
Fixes: 206e22f019 ("tools/testing/selftests: add self-test for verifying load alignment")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Cc: Chris Kennelly <ckennelly@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-22 15:09:07 -10:00
Linus Torvalds
a0e31f3a38 Merge branch 'for-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull siginfo fix from Eric Biederman:
 "During the merge window an issue with si_perf and the siginfo ABI came
  up. The alpha and sparc siginfo structure layout had changed with the
  addition of SIGTRAP TRAP_PERF and the new field si_perf.

  The reason only alpha and sparc were affected is that they are the
  only architectures that use si_trapno.

  Looking deeper it was discovered that si_trapno is used for only a few
  select signals on alpha and sparc, and that none of the other
  _sigfault fields past si_addr are used at all. Which means technically
  no regression on alpha and sparc.

  While the alignment concerns might be dismissed the abuse of si_errno
  by SIGTRAP TRAP_PERF does have the potential to cause regressions in
  existing userspace.

  While we still have time before userspace starts using and depending
  on the new definition siginfo for SIGTRAP TRAP_PERF this set of
  changes cleans up siginfo_t.

   - The si_trapno field is demoted from magic alpha and sparc status
     and made an ordinary union member of the _sigfault member of
     siginfo_t. Without moving it of course.

   - si_perf is replaced with si_perf_data and si_perf_type ending the
     abuse of si_errno.

   - Unnecessary additions to signalfd_siginfo are removed"

* 'for-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  signalfd: Remove SIL_PERF_EVENT fields from signalfd_siginfo
  signal: Deliver all of the siginfo perf data in _perf
  signal: Factor force_sig_perf out of perf_sigtrap
  signal: Implement SIL_FAULT_TRAPNO
  siginfo: Move si_trapno inside the union inside _si_fault
2021-05-21 06:12:52 -10:00
Po-Hsu Lin
25173dd409 selftests: net: devlink_port_split.py: skip the test if no devlink device
When there is no devlink device, the following command will return:
  $ devlink -j dev show
  {dev:{}}

This will cause IndexError when trying to access the first element
in dev of this json dataset. Use the kselftest framework skip code
to skip this test in this case.

Example output with this change:
  # selftests: net: devlink_port_split.py
  # no devlink device was found, test skipped
  ok 7 selftests: net: devlink_port_split.py # SKIP

Link: https://bugs.launchpad.net/bugs/1928889
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-20 15:46:57 -07:00
Andrii Nakryiko
704e2beba2 selftests/bpf: Test ringbuf mmap read-only and read-write restrictions
Extend ringbuf selftest to validate read/write and read-only restrictions on
memory mapping consumer/producer/data pages. Ensure no "escalations" from
PROT_READ to PROT_WRITE/PROT_EXEC is allowed. And test that mremap() fails to
expand mmap()'ed area.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210514180726.843157-1-andrii@kernel.org
2021-05-20 23:48:38 +02:00
Stanislav Fomichev
8f1634b821 selftests/bpf: Convert static to global in tc_redirect progs
Both IFINDEX_SRC and IFINDEX_DST are set from the userspace
and it won't work once bpf merges with bpf-next.

Fixes: 096eccdef0 ("selftests/bpf: Rewrite test_tc_redirect.sh as prog_tests/tc_redirect.c")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210514170528.3750250-1-sdf@google.com
2021-05-20 23:48:37 +02:00
David Matlack
8570e75a55 selftests: Add .gitignore for nci test suite
Building the nci test suite produces a binary, nci_dev, that git then
tries to track. Add a .gitignore file to tell git to ignore this binary.

Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-20 12:51:24 -07:00
Nicholas Piggin
5665bc35c1 powerpc/64s/syscall: Use pt_regs.trap to distinguish syscall ABI difference between sc and scv syscalls
The sc and scv 0 system calls have different ABI conventions, and
ptracers need to know which system call type is being used if they want
to look at the syscall registers.

Document that pt_regs.trap can be used for this, and fix one in-tree user
to work with scv 0 syscalls.

Fixes: 7fa95f9ada ("powerpc/64s: system call support for scv/rfscv instructions")
Cc: stable@vger.kernel.org # v5.9+
Reported-by: "Dmitry V. Levin" <ldv@altlinux.org>
Suggested-by: "Dmitry V. Levin" <ldv@altlinux.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210520111931.2597127-1-npiggin@gmail.com
2021-05-21 00:58:03 +10:00
H. Peter Anvin (Intel)
795e2a023b selftests/x86/syscall: Add tests under ptrace to syscall_numbering_64
Add tests running under ptrace for syscall_numbering_64. ptrace stopping on
syscall entry and possibly modifying the syscall number (regs.orig_rax) or
the default return value (regs.rax) can have different results than the
normal system call path.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210518191303.4135296-4-hpa@zytor.com
2021-05-20 15:19:48 +02:00
H. Peter Anvin (Intel)
c5c39488dc selftests/x86/syscall: Simplify message reporting in syscall_numbering
Reduce some boiler plate in printing and indenting messages.
This makes it easier to produce clean status output.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210518191303.4135296-3-hpa@zytor.com
2021-05-20 15:19:48 +02:00
H. Peter Anvin (Intel)
15c82d98a0 selftests/x86/syscall: Update and extend syscall_numbering_64
Update the syscall_numbering_64 selftest to reflect that a system call is
to be extended from 32 bits. Add a mix of tests for valid and invalid
system calls in 64-bit and x32 space.

Use an explicit system call instruction, because the glibc syscall()
wrapper might intercept instructions, extend the system call number
independently, or anything similar.

Use long long instead of long to make it possible to compile this test
on x32 as well as 64 bits.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210518191303.4135296-2-hpa@zytor.com
2021-05-20 15:19:48 +02:00
David S. Miller
7b16509b29 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-05-19

The following pull-request contains BPF updates for your *net-next* tree.

We've added 43 non-merge commits during the last 11 day(s) which contain
a total of 74 files changed, 3717 insertions(+), 578 deletions(-).

The main changes are:

1) syscall program type, fd array, and light skeleton, from Alexei.

2) Stop emitting static variables in skeleton, from Andrii.

3) Low level tc-bpf api, from Kumar.

4) Reduce verifier kmalloc/kfree churn, from Lorenz.
====================
2021-05-19 12:58:29 -07:00
Richard Fitzgerald
ef04d4ff4b selftests: lib: Add wrapper script for test_scanf
Adds a wrapper shell script for the test_scanf module.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210514161206.30821-4-rf@opensource.cirrus.com
2021-05-19 15:05:11 +02:00