Commit Graph

13929 Commits

Author SHA1 Message Date
Linus Torvalds
66a27abac3 powerpc updates for 6.9
- Add AT_HWCAP3 and AT_HWCAP4 aux vector entries for future use by glibc.
 
  - Add support for recognising the Power11 architected and raw PVRs.
 
  - Add support for nr_cpus=n on the command line where the boot CPU is >= n.
 
  - Add ppcxx_allmodconfig targets for all 32-bit sub-arches.
 
  - Other small features, cleanups and fixes.
 
 Thanks to: Akanksha J N, Brian King, Christophe Leroy, Dawei Li, Geoff Levand,
 Greg Kroah-Hartman, Jan-Benedict Glaw, Kajol Jain, Kunwu Chan, Li zeming,
 Madhavan Srinivasan, Masahiro Yamada, Nathan Chancellor, Nicholas Piggin, Peter
 Bergner, Qiheng Lin, Randy Dunlap, Ricardo B. Marliere, Rob Herring, Sathvika
 Vasireddy, Shrikanth Hegde, Uwe Kleine-König, Vaibhav Jain, Wen Xiong.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmX01vgTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgJ4bEACVsxXXjbjl+WKgWNjHsM7sVwUX/sSV
 z43iVycLPXDqochSkkgKjyIEFowaWhjgWVHFHmUXWxB5FjjFEEoH4FPo3VB0IY48
 VoSFT6PhzqXDrGmt2fWsJ+k6zUyJZa8pNS38DHg1yuuYDAa0KWxd3E/x/r0qzsbr
 vcas1uWcDWgjoUDMBuJpyx0sYTl6+mR9HlZuM4+aNQdzhTFU/jK69hAN0RFvryes
 K2/fLgI0fgLZpQDogCn4HV1/4uixi1eEFlVNXkwvMYDpQVo2FqiBaWLF0hNLWNCk
 kvm/fYIJhdFoNlp38jVKv0KJnBhW7aAs3prF+8B3YL2B23rLnvA6ZLZKHcdBAeLb
 8PJMRrbAbmVxOnVSAG0fgU+0dEdkJQ+0ABqa+usMOV7xIPg9uIui1YrKT1KVq6Fs
 KyGHM5EQuBC/P6bTsKO6X+1beY2QIfwWxaIkoo8pj6d0WU69qU4u+LzQiDO4XR0L
 UQQguB1Qo8yaip3rHXhuv0hlnMNVAVye56Zw63uq1MWGkewRKSkY91Ms02L+pXpF
 r6+96xoFB0ulKZFnyxyBdkj2iC0426fHtTiiJFfQ4R1fiibPKtAx9P59WYnqymVh
 QsSYqlgC2/jWzRgqJTweLp/XQK8fWqmFkNmCGDN1N9Sij9Xjx/8aZb5dvwJkSBnK
 rZ4ObxBoaCPbPA==
 =K9Ok
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Add AT_HWCAP3 and AT_HWCAP4 aux vector entries for future use
   by glibc

 - Add support for recognising the Power11 architected and raw PVRs

 - Add support for nr_cpus=n on the command line where the
   boot CPU is >= n

 - Add ppcxx_allmodconfig targets for all 32-bit sub-arches

 - Other small features, cleanups and fixes

Thanks to Akanksha J N, Brian King, Christophe Leroy, Dawei Li, Geoff
Levand, Greg Kroah-Hartman, Jan-Benedict Glaw, Kajol Jain, Kunwu Chan,
Li zeming, Madhavan Srinivasan, Masahiro Yamada, Nathan Chancellor,
Nicholas Piggin, Peter Bergner, Qiheng Lin, Randy Dunlap, Ricardo B.
Marliere, Rob Herring, Sathvika Vasireddy, Shrikanth Hegde, Uwe
Kleine-König, Vaibhav Jain, and Wen Xiong.

* tag 'powerpc-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (71 commits)
  powerpc/macio: Make remove callback of macio driver void returned
  powerpc/83xx: Fix build failure with FPU=n
  powerpc/64s: Fix get_hugepd_cache_index() build failure
  powerpc/4xx: Fix warp_gpio_leds build failure
  powerpc/amigaone: Make several functions static
  powerpc/embedded6xx: Fix no previous prototype for avr_uart_send() etc.
  macintosh/adb: make adb_dev_class constant
  powerpc: xor_vmx: Add '-mhard-float' to CFLAGS
  powerpc/fsl: Fix mfpmr() asm constraint error
  powerpc: Remove cpu-as-y completely
  powerpc/fsl: Modernise mt/mfpmr
  powerpc/fsl: Fix mfpmr build errors with newer binutils
  powerpc/64s: Use .machine power4 around dcbt
  powerpc/64s: Move dcbt/dcbtst sequence into a macro
  powerpc/mm: Code cleanup for __hash_page_thp
  powerpc/hv-gpci: Fix the H_GET_PERF_COUNTER_INFO hcall return value checks
  powerpc/irq: Allow softirq to hardirq stack transition
  powerpc: Stop using of_root
  powerpc/machdep: Define 'compatibles' property in ppc_md and use it
  of: Reimplement of_machine_is_compatible() using of_machine_compatible_match()
  ...
2024-03-15 17:53:48 -07:00
Linus Torvalds
4f712ee0cb S390:
* Changes to FPU handling came in via the main s390 pull request
 
 * Only deliver to the guest the SCLP events that userspace has
   requested.
 
 * More virtual vs physical address fixes (only a cleanup since
   virtual and physical address spaces are currently the same).
 
 * Fix selftests undefined behavior.
 
 x86:
 
 * Fix a restriction that the guest can't program a PMU event whose
   encoding matches an architectural event that isn't included in the
   guest CPUID.  The enumeration of an architectural event only says
   that if a CPU supports an architectural event, then the event can be
   programmed *using the architectural encoding*.  The enumeration does
   NOT say anything about the encoding when the CPU doesn't report support
   the event *in general*.  It might support it, and it might support it
   using the same encoding that made it into the architectural PMU spec.
 
 * Fix a variety of bugs in KVM's emulation of RDPMC (more details on
   individual commits) and add a selftest to verify KVM correctly emulates
   RDMPC, counter availability, and a variety of other PMC-related
   behaviors that depend on guest CPUID and therefore are easier to
   validate with selftests than with custom guests (aka kvm-unit-tests).
 
 * Zero out PMU state on AMD if the virtual PMU is disabled, it does not
   cause any bug but it wastes time in various cases where KVM would check
   if a PMC event needs to be synthesized.
 
 * Optimize triggering of emulated events, with a nice ~10% performance
   improvement in VM-Exit microbenchmarks when a vPMU is exposed to the
   guest.
 
 * Tighten the check for "PMI in guest" to reduce false positives if an NMI
   arrives in the host while KVM is handling an IRQ VM-Exit.
 
 * Fix a bug where KVM would report stale/bogus exit qualification information
   when exiting to userspace with an internal error exit code.
 
 * Add a VMX flag in /proc/cpuinfo to report 5-level EPT support.
 
 * Rework TDP MMU root unload, free, and alloc to run with mmu_lock held for
   read, e.g. to avoid serializing vCPUs when userspace deletes a memslot.
 
 * Tear down TDP MMU page tables at 4KiB granularity (used to be 1GiB).  KVM
   doesn't support yielding in the middle of processing a zap, and 1GiB
   granularity resulted in multi-millisecond lags that are quite impolite
   for CONFIG_PREEMPT kernels.
 
 * Allocate write-tracking metadata on-demand to avoid the memory overhead when
   a kernel is built with i915 virtualization support but the workloads use
   neither shadow paging nor i915 virtualization.
 
 * Explicitly initialize a variety of on-stack variables in the emulator that
   triggered KMSAN false positives.
 
 * Fix the debugregs ABI for 32-bit KVM.
 
 * Rework the "force immediate exit" code so that vendor code ultimately decides
   how and when to force the exit, which allowed some optimization for both
   Intel and AMD.
 
 * Fix a long-standing bug where kvm_has_noapic_vcpu could be left elevated if
   vCPU creation ultimately failed, causing extra unnecessary work.
 
 * Cleanup the logic for checking if the currently loaded vCPU is in-kernel.
 
 * Harden against underflowing the active mmu_notifier invalidation
   count, so that "bad" invalidations (usually due to bugs elsehwere in the
   kernel) are detected earlier and are less likely to hang the kernel.
 
 x86 Xen emulation:
 
 * Overlay pages can now be cached based on host virtual address,
   instead of guest physical addresses.  This removes the need to
   reconfigure and invalidate the cache if the guest changes the
   gpa but the underlying host virtual address remains the same.
 
 * When possible, use a single host TSC value when computing the deadline for
   Xen timers in order to improve the accuracy of the timer emulation.
 
 * Inject pending upcall events when the vCPU software-enables its APIC to fix
   a bug where an upcall can be lost (and to follow Xen's behavior).
 
 * Fall back to the slow path instead of warning if "fast" IRQ delivery of Xen
   events fails, e.g. if the guest has aliased xAPIC IDs.
 
 RISC-V:
 
 * Support exception and interrupt handling in selftests
 
 * New self test for RISC-V architectural timer (Sstc extension)
 
 * New extension support (Ztso, Zacas)
 
 * Support userspace emulation of random number seed CSRs.
 
 ARM:
 
 * Infrastructure for building KVM's trap configuration based on the
   architectural features (or lack thereof) advertised in the VM's ID
   registers
 
 * Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to
   x86's WC) at stage-2, improving the performance of interacting with
   assigned devices that can tolerate it
 
 * Conversion of KVM's representation of LPIs to an xarray, utilized to
   address serialization some of the serialization on the LPI injection
   path
 
 * Support for _architectural_ VHE-only systems, advertised through the
   absence of FEAT_E2H0 in the CPU's ID register
 
 * Miscellaneous cleanups, fixes, and spelling corrections to KVM and
   selftests
 
 LoongArch:
 
 * Set reserved bits as zero in CPUCFG.
 
 * Start SW timer only when vcpu is blocking.
 
 * Do not restart SW timer when it is expired.
 
 * Remove unnecessary CSR register saving during enter guest.
 
 * Misc cleanups and fixes as usual.
 
 Generic:
 
 * cleanup Kconfig by removing CONFIG_HAVE_KVM, which was basically always
   true on all architectures except MIPS (where Kconfig determines the
   available depending on CPU capabilities).  It is replaced either by
   an architecture-dependent symbol for MIPS, and IS_ENABLED(CONFIG_KVM)
   everywhere else.
 
 * Factor common "select" statements in common code instead of requiring
   each architecture to specify it
 
 * Remove thoroughly obsolete APIs from the uapi headers.
 
 * Move architecture-dependent stuff to uapi/asm/kvm.h
 
 * Always flush the async page fault workqueue when a work item is being
   removed, especially during vCPU destruction, to ensure that there are no
   workers running in KVM code when all references to KVM-the-module are gone,
   i.e. to prevent a very unlikely use-after-free if kvm.ko is unloaded.
 
 * Grab a reference to the VM's mm_struct in the async #PF worker itself instead
   of gifting the worker a reference, so that there's no need to remember
   to *conditionally* clean up after the worker.
 
 Selftests:
 
 * Reduce boilerplate especially when utilize selftest TAP infrastructure.
 
 * Add basic smoke tests for SEV and SEV-ES, along with a pile of library
   support for handling private/encrypted/protected memory.
 
 * Fix benign bugs where tests neglect to close() guest_memfd files.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmX0iP8UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroND7wf+JZoNvwZ+bmwWe/4jn/YwNoYi/C5z
 eypn8M1gsWEccpCpqPBwznVm9T29rF4uOlcMvqLEkHfTpaL1EKUUjP1lXPz/ileP
 6a2RdOGxAhyTiFC9fjy+wkkjtLbn1kZf6YsS0hjphP9+w0chNbdn0w81dFVnXryd
 j7XYI8R/bFAthNsJOuZXSEjCfIHxvTTG74OrTf1B1FEBB+arPmrgUeJftMVhffQK
 Sowgg8L/Ii/x6fgV5NZQVSIyVf1rp8z7c6UaHT4Fwb0+RAMW8p9pYv9Qp1YkKp8y
 5j0V9UzOHP7FRaYimZ5BtwQoqiZXYylQ+VuU/Y2f4X85cvlLzSqxaEMAPA==
 =mqOV
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "S390:

   - Changes to FPU handling came in via the main s390 pull request

   - Only deliver to the guest the SCLP events that userspace has
     requested

   - More virtual vs physical address fixes (only a cleanup since
     virtual and physical address spaces are currently the same)

   - Fix selftests undefined behavior

  x86:

   - Fix a restriction that the guest can't program a PMU event whose
     encoding matches an architectural event that isn't included in the
     guest CPUID. The enumeration of an architectural event only says
     that if a CPU supports an architectural event, then the event can
     be programmed *using the architectural encoding*. The enumeration
     does NOT say anything about the encoding when the CPU doesn't
     report support the event *in general*. It might support it, and it
     might support it using the same encoding that made it into the
     architectural PMU spec

   - Fix a variety of bugs in KVM's emulation of RDPMC (more details on
     individual commits) and add a selftest to verify KVM correctly
     emulates RDMPC, counter availability, and a variety of other
     PMC-related behaviors that depend on guest CPUID and therefore are
     easier to validate with selftests than with custom guests (aka
     kvm-unit-tests)

   - Zero out PMU state on AMD if the virtual PMU is disabled, it does
     not cause any bug but it wastes time in various cases where KVM
     would check if a PMC event needs to be synthesized

   - Optimize triggering of emulated events, with a nice ~10%
     performance improvement in VM-Exit microbenchmarks when a vPMU is
     exposed to the guest

   - Tighten the check for "PMI in guest" to reduce false positives if
     an NMI arrives in the host while KVM is handling an IRQ VM-Exit

   - Fix a bug where KVM would report stale/bogus exit qualification
     information when exiting to userspace with an internal error exit
     code

   - Add a VMX flag in /proc/cpuinfo to report 5-level EPT support

   - Rework TDP MMU root unload, free, and alloc to run with mmu_lock
     held for read, e.g. to avoid serializing vCPUs when userspace
     deletes a memslot

   - Tear down TDP MMU page tables at 4KiB granularity (used to be
     1GiB). KVM doesn't support yielding in the middle of processing a
     zap, and 1GiB granularity resulted in multi-millisecond lags that
     are quite impolite for CONFIG_PREEMPT kernels

   - Allocate write-tracking metadata on-demand to avoid the memory
     overhead when a kernel is built with i915 virtualization support
     but the workloads use neither shadow paging nor i915 virtualization

   - Explicitly initialize a variety of on-stack variables in the
     emulator that triggered KMSAN false positives

   - Fix the debugregs ABI for 32-bit KVM

   - Rework the "force immediate exit" code so that vendor code
     ultimately decides how and when to force the exit, which allowed
     some optimization for both Intel and AMD

   - Fix a long-standing bug where kvm_has_noapic_vcpu could be left
     elevated if vCPU creation ultimately failed, causing extra
     unnecessary work

   - Cleanup the logic for checking if the currently loaded vCPU is
     in-kernel

   - Harden against underflowing the active mmu_notifier invalidation
     count, so that "bad" invalidations (usually due to bugs elsehwere
     in the kernel) are detected earlier and are less likely to hang the
     kernel

  x86 Xen emulation:

   - Overlay pages can now be cached based on host virtual address,
     instead of guest physical addresses. This removes the need to
     reconfigure and invalidate the cache if the guest changes the gpa
     but the underlying host virtual address remains the same

   - When possible, use a single host TSC value when computing the
     deadline for Xen timers in order to improve the accuracy of the
     timer emulation

   - Inject pending upcall events when the vCPU software-enables its
     APIC to fix a bug where an upcall can be lost (and to follow Xen's
     behavior)

   - Fall back to the slow path instead of warning if "fast" IRQ
     delivery of Xen events fails, e.g. if the guest has aliased xAPIC
     IDs

  RISC-V:

   - Support exception and interrupt handling in selftests

   - New self test for RISC-V architectural timer (Sstc extension)

   - New extension support (Ztso, Zacas)

   - Support userspace emulation of random number seed CSRs

  ARM:

   - Infrastructure for building KVM's trap configuration based on the
     architectural features (or lack thereof) advertised in the VM's ID
     registers

   - Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to
     x86's WC) at stage-2, improving the performance of interacting with
     assigned devices that can tolerate it

   - Conversion of KVM's representation of LPIs to an xarray, utilized
     to address serialization some of the serialization on the LPI
     injection path

   - Support for _architectural_ VHE-only systems, advertised through
     the absence of FEAT_E2H0 in the CPU's ID register

   - Miscellaneous cleanups, fixes, and spelling corrections to KVM and
     selftests

  LoongArch:

   - Set reserved bits as zero in CPUCFG

   - Start SW timer only when vcpu is blocking

   - Do not restart SW timer when it is expired

   - Remove unnecessary CSR register saving during enter guest

   - Misc cleanups and fixes as usual

  Generic:

   - Clean up Kconfig by removing CONFIG_HAVE_KVM, which was basically
     always true on all architectures except MIPS (where Kconfig
     determines the available depending on CPU capabilities). It is
     replaced either by an architecture-dependent symbol for MIPS, and
     IS_ENABLED(CONFIG_KVM) everywhere else

   - Factor common "select" statements in common code instead of
     requiring each architecture to specify it

   - Remove thoroughly obsolete APIs from the uapi headers

   - Move architecture-dependent stuff to uapi/asm/kvm.h

   - Always flush the async page fault workqueue when a work item is
     being removed, especially during vCPU destruction, to ensure that
     there are no workers running in KVM code when all references to
     KVM-the-module are gone, i.e. to prevent a very unlikely
     use-after-free if kvm.ko is unloaded

   - Grab a reference to the VM's mm_struct in the async #PF worker
     itself instead of gifting the worker a reference, so that there's
     no need to remember to *conditionally* clean up after the worker

  Selftests:

   - Reduce boilerplate especially when utilize selftest TAP
     infrastructure

   - Add basic smoke tests for SEV and SEV-ES, along with a pile of
     library support for handling private/encrypted/protected memory

   - Fix benign bugs where tests neglect to close() guest_memfd files"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
  selftests: kvm: remove meaningless assignments in Makefiles
  KVM: riscv: selftests: Add Zacas extension to get-reg-list test
  RISC-V: KVM: Allow Zacas extension for Guest/VM
  KVM: riscv: selftests: Add Ztso extension to get-reg-list test
  RISC-V: KVM: Allow Ztso extension for Guest/VM
  RISC-V: KVM: Forward SEED CSR access to user space
  KVM: riscv: selftests: Add sstc timer test
  KVM: riscv: selftests: Change vcpu_has_ext to a common function
  KVM: riscv: selftests: Add guest helper to get vcpu id
  KVM: riscv: selftests: Add exception handling support
  LoongArch: KVM: Remove unnecessary CSR register saving during enter guest
  LoongArch: KVM: Do not restart SW timer when it is expired
  LoongArch: KVM: Start SW timer only when vcpu is blocking
  LoongArch: KVM: Set reserved bits as zero in CPUCFG
  KVM: selftests: Explicitly close guest_memfd files in some gmem tests
  KVM: x86/xen: fix recursive deadlock in timer injection
  KVM: pfncache: simplify locking and make more self-contained
  KVM: x86/xen: remove WARN_ON_ONCE() with false positives in evtchn delivery
  KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled
  KVM: x86/xen: improve accuracy of Xen timers
  ...
2024-03-15 13:03:13 -07:00
Paolo Bonzini
4781179012 selftests: kvm: remove meaningless assignments in Makefiles
$(shell ...) expands to the output of the command. It expands to the
empty string when the command does not print anything to stdout.
Hence, $(shell mkdir ...) is sufficient and does not need any
variable assignment in front of it.

Commit c2bd08ba20 ("treewide: remove meaningless assignments in
Makefiles", 2024-02-23) did this to all of tools/ but ignored in-flight
changes to tools/testing/selftests/kvm/Makefile, so reapply the change.

Cc: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-15 06:52:55 -04:00
Linus Torvalds
e5eb28f6d1 - Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min
heap optimizations".
 
 - Kuan-Wei Chiu has also sped up the library sorting code in the series
   "lib/sort: Optimize the number of swaps and comparisons".
 
 - Alexey Gladkov has added the ability for code running within an IPC
   namespace to alter its IPC and MQ limits.  The series is "Allow to
   change ipc/mq sysctls inside ipc namespace".
 
 - Geert Uytterhoeven has contributed some dhrystone maintenance work in
   the series "lib: dhry: miscellaneous cleanups".
 
 - Ryusuke Konishi continues nilfs2 maintenance work in the series
 
 	"nilfs2: eliminate kmap and kmap_atomic calls"
 	"nilfs2: fix kernel bug at submit_bh_wbc()"
 
 - Nathan Chancellor has updated our build tools requirements in the
   series "Bump the minimum supported version of LLVM to 13.0.1".
 
 - Muhammad Usama Anjum continues with the selftests maintenance work in
   the series "selftests/mm: Improve run_vmtests.sh".
 
 - Oleg Nesterov has done some maintenance work against the signal code
   in the series "get_signal: minor cleanups and fix".
 
 Plus the usual shower of singleton patches in various parts of the tree.
 Please see the individual changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZfMnvgAKCRDdBJ7gKXxA
 jjKMAP4/Upq07D4wjkMVPb+QrkipbbLpdcgJ++q3z6rba4zhPQD+M3SFriIJk/Xh
 tKVmvihFxfAhdDthseXcIf1nBjMALwY=
 =8rVc
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min
   heap optimizations".

 - Kuan-Wei Chiu has also sped up the library sorting code in the series
   "lib/sort: Optimize the number of swaps and comparisons".

 - Alexey Gladkov has added the ability for code running within an IPC
   namespace to alter its IPC and MQ limits. The series is "Allow to
   change ipc/mq sysctls inside ipc namespace".

 - Geert Uytterhoeven has contributed some dhrystone maintenance work in
   the series "lib: dhry: miscellaneous cleanups".

 - Ryusuke Konishi continues nilfs2 maintenance work in the series

	"nilfs2: eliminate kmap and kmap_atomic calls"
	"nilfs2: fix kernel bug at submit_bh_wbc()"

 - Nathan Chancellor has updated our build tools requirements in the
   series "Bump the minimum supported version of LLVM to 13.0.1".

 - Muhammad Usama Anjum continues with the selftests maintenance work in
   the series "selftests/mm: Improve run_vmtests.sh".

 - Oleg Nesterov has done some maintenance work against the signal code
   in the series "get_signal: minor cleanups and fix".

Plus the usual shower of singleton patches in various parts of the tree.
Please see the individual changelogs for details.

* tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
  nilfs2: prevent kernel bug at submit_bh_wbc()
  nilfs2: fix failure to detect DAT corruption in btree and direct mappings
  ocfs2: enable ocfs2_listxattr for special files
  ocfs2: remove SLAB_MEM_SPREAD flag usage
  assoc_array: fix the return value in assoc_array_insert_mid_shortcut()
  buildid: use kmap_local_page()
  watchdog/core: remove sysctl handlers from public header
  nilfs2: use div64_ul() instead of do_div()
  mul_u64_u64_div_u64: increase precision by conditionally swapping a and b
  kexec: copy only happens before uchunk goes to zero
  get_signal: don't initialize ksig->info if SIGNAL_GROUP_EXIT/group_exec_task
  get_signal: hide_si_addr_tag_bits: fix the usage of uninitialized ksig
  get_signal: don't abuse ksig->info.si_signo and ksig->sig
  const_structs.checkpatch: add device_type
  Normalise "name (ad@dr)" MODULE_AUTHORs to "name <ad@dr>"
  dyndbg: replace kstrdup() + strchr() with kstrdup_and_replace()
  list: leverage list_is_head() for list_entry_is_head()
  nilfs2: MAINTAINERS: drop unreachable project mirror site
  smp: make __smp_processor_id() 0-argument macro
  fat: fix uninitialized field in nostale filehandles
  ...
2024-03-14 18:03:09 -07:00
Linus Torvalds
902861e34c - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
from hotplugged memory rather than only from main memory.  Series
   "implement "memmap on memory" feature on s390".
 
 - More folio conversions from Matthew Wilcox in the series
 
 	"Convert memcontrol charge moving to use folios"
 	"mm: convert mm counter to take a folio"
 
 - Chengming Zhou has optimized zswap's rbtree locking, providing
   significant reductions in system time and modest but measurable
   reductions in overall runtimes.  The series is "mm/zswap: optimize the
   scalability of zswap rb-tree".
 
 - Chengming Zhou has also provided the series "mm/zswap: optimize zswap
   lru list" which provides measurable runtime benefits in some
   swap-intensive situations.
 
 - And Chengming Zhou further optimizes zswap in the series "mm/zswap:
   optimize for dynamic zswap_pools".  Measured improvements are modest.
 
 - zswap cleanups and simplifications from Yosry Ahmed in the series "mm:
   zswap: simplify zswap_swapoff()".
 
 - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
   contributed several DAX cleanups as well as adding a sysfs tunable to
   control the memmap_on_memory setting when the dax device is hotplugged
   as system memory.
 
 - Johannes Weiner has added the large series "mm: zswap: cleanups",
   which does that.
 
 - More DAMON work from SeongJae Park in the series
 
 	"mm/damon: make DAMON debugfs interface deprecation unignorable"
 	"selftests/damon: add more tests for core functionalities and corner cases"
 	"Docs/mm/damon: misc readability improvements"
 	"mm/damon: let DAMOS feeds and tame/auto-tune itself"
 
 - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
   extension" Rakie Kim has developed a new mempolicy interleaving policy
   wherein we allocate memory across nodes in a weighted fashion rather
   than uniformly.  This is beneficial in heterogeneous memory environments
   appearing with CXL.
 
 - Christophe Leroy has contributed some cleanup and consolidation work
   against the ARM pagetable dumping code in the series "mm: ptdump:
   Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".
 
 - Luis Chamberlain has added some additional xarray selftesting in the
   series "test_xarray: advanced API multi-index tests".
 
 - Muhammad Usama Anjum has reworked the selftest code to make its
   human-readable output conform to the TAP ("Test Anything Protocol")
   format.  Amongst other things, this opens up the use of third-party
   tools to parse and process out selftesting results.
 
 - Ryan Roberts has added fork()-time PTE batching of THP ptes in the
   series "mm/memory: optimize fork() with PTE-mapped THP".  Mainly
   targeted at arm64, this significantly speeds up fork() when the process
   has a large number of pte-mapped folios.
 
 - David Hildenbrand also gets in on the THP pte batching game in his
   series "mm/memory: optimize unmap/zap with PTE-mapped THP".  It
   implements batching during munmap() and other pte teardown situations.
   The microbenchmark improvements are nice.
 
 - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan
   Roberts further utilizes arm's pte's contiguous bit ("contpte
   mappings").  Kernel build times on arm64 improved nicely.  Ryan's series
   "Address some contpte nits" provides some followup work.
 
 - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
   fixed an obscure hugetlb race which was causing unnecessary page faults.
   He has also added a reproducer under the selftest code.
 
 - In the series "selftests/mm: Output cleanups for the compaction test",
   Mark Brown did what the title claims.
 
 - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring".
 
 - Even more zswap material from Nhat Pham.  The series "fix and extend
   zswap kselftests" does as claimed.
 
 - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
   regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in
   our handling of DAX on archiecctures which have virtually aliasing data
   caches.  The arm architecture is the main beneficiary.
 
 - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic
   improvements in worst-case mmap_lock hold times during certain
   userfaultfd operations.
 
 - Some page_owner enhancements and maintenance work from Oscar Salvador
   in his series
 
 	"page_owner: print stacks and their outstanding allocations"
 	"page_owner: Fixup and cleanup"
 
 - Uladzislau Rezki has contributed some vmalloc scalability improvements
   in his series "Mitigate a vmap lock contention".  It realizes a 12x
   improvement for a certain microbenchmark.
 
 - Some kexec/crash cleanup work from Baoquan He in the series "Split
   crash out from kexec and clean up related config items".
 
 - Some zsmalloc maintenance work from Chengming Zhou in the series
 
 	"mm/zsmalloc: fix and optimize objects/page migration"
 	"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"
 
 - Zi Yan has taught the MM to perform compaction on folios larger than
   order=0.  This a step along the path to implementaton of the merging of
   large anonymous folios.  The series is named "Enable >0 order folio
   memory compaction".
 
 - Christoph Hellwig has done quite a lot of cleanup work in the
   pagecache writeback code in his series "convert write_cache_pages() to
   an iterator".
 
 - Some modest hugetlb cleanups and speedups in Vishal Moola's series
   "Handle hugetlb faults under the VMA lock".
 
 - Zi Yan has changed the page splitting code so we can split huge pages
   into sizes other than order-0 to better utilize large folios.  The
   series is named "Split a folio to any lower order folios".
 
 - David Hildenbrand has contributed the series "mm: remove
   total_mapcount()", a cleanup.
 
 - Matthew Wilcox has sought to improve the performance of bulk memory
   freeing in his series "Rearrange batched folio freeing".
 
 - Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
   provides large improvements in bootup times on large machines which are
   configured to use large numbers of hugetlb pages.
 
 - Matthew Wilcox's series "PageFlags cleanups" does that.
 
 - Qi Zheng's series "minor fixes and supplement for ptdesc" does that
   also.  S390 is affected.
 
 - Cleanups to our pagemap utility functions from Peter Xu in his series
   "mm/treewide: Replace pXd_large() with pXd_leaf()".
 
 - Nico Pache has fixed a few things with our hugepage selftests in his
   series "selftests/mm: Improve Hugepage Test Handling in MM Selftests".
 
 - Also, of course, many singleton patches to many things.  Please see
   the individual changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZfJpPQAKCRDdBJ7gKXxA
 joxeAP9TrcMEuHnLmBlhIXkWbIR4+ki+pA3v+gNTlJiBhnfVSgD9G55t1aBaRplx
 TMNhHfyiHYDTx/GAV9NXW84tasJSDgA=
 =TG55
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
   from hotplugged memory rather than only from main memory. Series
   "implement "memmap on memory" feature on s390".

 - More folio conversions from Matthew Wilcox in the series

	"Convert memcontrol charge moving to use folios"
	"mm: convert mm counter to take a folio"

 - Chengming Zhou has optimized zswap's rbtree locking, providing
   significant reductions in system time and modest but measurable
   reductions in overall runtimes. The series is "mm/zswap: optimize the
   scalability of zswap rb-tree".

 - Chengming Zhou has also provided the series "mm/zswap: optimize zswap
   lru list" which provides measurable runtime benefits in some
   swap-intensive situations.

 - And Chengming Zhou further optimizes zswap in the series "mm/zswap:
   optimize for dynamic zswap_pools". Measured improvements are modest.

 - zswap cleanups and simplifications from Yosry Ahmed in the series
   "mm: zswap: simplify zswap_swapoff()".

 - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
   contributed several DAX cleanups as well as adding a sysfs tunable to
   control the memmap_on_memory setting when the dax device is
   hotplugged as system memory.

 - Johannes Weiner has added the large series "mm: zswap: cleanups",
   which does that.

 - More DAMON work from SeongJae Park in the series

	"mm/damon: make DAMON debugfs interface deprecation unignorable"
	"selftests/damon: add more tests for core functionalities and corner cases"
	"Docs/mm/damon: misc readability improvements"
	"mm/damon: let DAMOS feeds and tame/auto-tune itself"

 - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
   extension" Rakie Kim has developed a new mempolicy interleaving
   policy wherein we allocate memory across nodes in a weighted fashion
   rather than uniformly. This is beneficial in heterogeneous memory
   environments appearing with CXL.

 - Christophe Leroy has contributed some cleanup and consolidation work
   against the ARM pagetable dumping code in the series "mm: ptdump:
   Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".

 - Luis Chamberlain has added some additional xarray selftesting in the
   series "test_xarray: advanced API multi-index tests".

 - Muhammad Usama Anjum has reworked the selftest code to make its
   human-readable output conform to the TAP ("Test Anything Protocol")
   format. Amongst other things, this opens up the use of third-party
   tools to parse and process out selftesting results.

 - Ryan Roberts has added fork()-time PTE batching of THP ptes in the
   series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
   targeted at arm64, this significantly speeds up fork() when the
   process has a large number of pte-mapped folios.

 - David Hildenbrand also gets in on the THP pte batching game in his
   series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
   implements batching during munmap() and other pte teardown
   situations. The microbenchmark improvements are nice.

 - And in the series "Transparent Contiguous PTEs for User Mappings"
   Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte
   mappings"). Kernel build times on arm64 improved nicely. Ryan's
   series "Address some contpte nits" provides some followup work.

 - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
   fixed an obscure hugetlb race which was causing unnecessary page
   faults. He has also added a reproducer under the selftest code.

 - In the series "selftests/mm: Output cleanups for the compaction
   test", Mark Brown did what the title claims.

 - Kinsey Ho has added the series "mm/mglru: code cleanup and
   refactoring".

 - Even more zswap material from Nhat Pham. The series "fix and extend
   zswap kselftests" does as claimed.

 - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
   regression" Mathieu Desnoyers has cleaned up and fixed rather a mess
   in our handling of DAX on archiecctures which have virtually aliasing
   data caches. The arm architecture is the main beneficiary.

 - Lokesh Gidra's series "per-vma locks in userfaultfd" provides
   dramatic improvements in worst-case mmap_lock hold times during
   certain userfaultfd operations.

 - Some page_owner enhancements and maintenance work from Oscar Salvador
   in his series

	"page_owner: print stacks and their outstanding allocations"
	"page_owner: Fixup and cleanup"

 - Uladzislau Rezki has contributed some vmalloc scalability
   improvements in his series "Mitigate a vmap lock contention". It
   realizes a 12x improvement for a certain microbenchmark.

 - Some kexec/crash cleanup work from Baoquan He in the series "Split
   crash out from kexec and clean up related config items".

 - Some zsmalloc maintenance work from Chengming Zhou in the series

	"mm/zsmalloc: fix and optimize objects/page migration"
	"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"

 - Zi Yan has taught the MM to perform compaction on folios larger than
   order=0. This a step along the path to implementaton of the merging
   of large anonymous folios. The series is named "Enable >0 order folio
   memory compaction".

 - Christoph Hellwig has done quite a lot of cleanup work in the
   pagecache writeback code in his series "convert write_cache_pages()
   to an iterator".

 - Some modest hugetlb cleanups and speedups in Vishal Moola's series
   "Handle hugetlb faults under the VMA lock".

 - Zi Yan has changed the page splitting code so we can split huge pages
   into sizes other than order-0 to better utilize large folios. The
   series is named "Split a folio to any lower order folios".

 - David Hildenbrand has contributed the series "mm: remove
   total_mapcount()", a cleanup.

 - Matthew Wilcox has sought to improve the performance of bulk memory
   freeing in his series "Rearrange batched folio freeing".

 - Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
   provides large improvements in bootup times on large machines which
   are configured to use large numbers of hugetlb pages.

 - Matthew Wilcox's series "PageFlags cleanups" does that.

 - Qi Zheng's series "minor fixes and supplement for ptdesc" does that
   also. S390 is affected.

 - Cleanups to our pagemap utility functions from Peter Xu in his series
   "mm/treewide: Replace pXd_large() with pXd_leaf()".

 - Nico Pache has fixed a few things with our hugepage selftests in his
   series "selftests/mm: Improve Hugepage Test Handling in MM
   Selftests".

 - Also, of course, many singleton patches to many things. Please see
   the individual changelogs for details.

* tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits)
  mm/zswap: remove the memcpy if acomp is not sleepable
  crypto: introduce: acomp_is_async to expose if comp drivers might sleep
  memtest: use {READ,WRITE}_ONCE in memory scanning
  mm: prohibit the last subpage from reusing the entire large folio
  mm: recover pud_leaf() definitions in nopmd case
  selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements
  selftests/mm: skip uffd hugetlb tests with insufficient hugepages
  selftests/mm: dont fail testsuite due to a lack of hugepages
  mm/huge_memory: skip invalid debugfs new_order input for folio split
  mm/huge_memory: check new folio order when split a folio
  mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure
  mm: add an explicit smp_wmb() to UFFDIO_CONTINUE
  mm: fix list corruption in put_pages_list
  mm: remove folio from deferred split list before uncharging it
  filemap: avoid unnecessary major faults in filemap_fault()
  mm,page_owner: drop unnecessary check
  mm,page_owner: check for null stack_record before bumping its refcount
  mm: swap: fix race between free_swap_and_cache() and swapoff()
  mm/treewide: align up pXd_leaf() retval across archs
  mm/treewide: drop pXd_large()
  ...
2024-03-14 17:43:30 -07:00
Linus Torvalds
1bbeaf83dd perf tools changes for v6.9
perf stat
 ---------
 * Support new 'cluster' aggregation mode for shared resources depending on the
   hardware configuration.
 
     $ sudo perf stat -a --per-cluster -e cycles,instructions sleep 1
 
      Performance counter stats for 'system wide':
 
     S0-D0-CLS0    2         85,051,822      cycles
     S0-D0-CLS0    2         73,909,908      instructions      #    0.87  insn per cycle
     S0-D0-CLS2    2         93,365,918      cycles
     S0-D0-CLS2    2         83,006,158      instructions      #    0.89  insn per cycle
     S0-D0-CLS4    2        104,157,523      cycles
     S0-D0-CLS4    2         53,234,396      instructions      #    0.51  insn per cycle
     S0-D0-CLS6    2         65,891,079      cycles
     S0-D0-CLS6    2         41,478,273      instructions      #    0.63  insn per cycle
 
            1.002407989 seconds time elapsed
 
 * Various fixes and cleanups for event metrics including NaN handling.
 
 perf script
 -----------
 * Use libcapstone if available to disassemble the instructions.  This enables
   'perf script -F disasm' and 'perf script --insn-trace=disasm' (for Intel-PT).
 
     $ perf script -F event,ip,disasm
     cycles:P:  ffffffffa988d428             wrmsr
     cycles:P:  ffffffffa9839d25             movq %rax, %r14
     cycles:P:  ffffffffa9cdcaf0             endbr64
     cycles:P:  ffffffffa988d428             wrmsr
     cycles:P:  ffffffffa988d428             wrmsr
     cycles:P:  ffffffffaa401f86             iretq
     cycles:P:  ffffffffa99c4de5             movq 0x30(%rcx), %r8
     cycles:P:  ffffffffa988d428             wrmsr
     cycles:P:  ffffffffaa401f86             iretq
     cycles:P:  ffffffffa9907983             movl 0x68(%rbx), %eax
     cycles:P:  ffffffffa988d428             wrmsr
 
 * Expose sample ID / stream ID to python scripts
 
 perf test
 ---------
 * Add more perf test cases from Redhat internal test suites.  This time it adds
   the base infra and a few perf probe tests.  More to come. :)
 
 * Add 'perf test -p' for parallel execution and fix some issues found by the
   parallel test.
 
 * Support symbol test to print symbols in given (active) module:
 
     $ perf test -F -v Symbols --dso /lib/modules/$(uname -r)/kernel/fs/ext4/ext4.ko
     --- start ---
     Testing /lib/modules/6.5.13-1rodete2-amd64/kernel/fs/ext4/ext4.ko
     Overlapping symbols:
      7a990-7a9a0 l __pfx_ext4_exit_fs
      7a990-7a9a0 g __pfx_cleanup_module
     Overlapping symbols:
      7a9a0-7aa1c l ext4_exit_fs
      7a9a0-7aa1c g cleanup_module
     ...
 
 JSON metric updates
 -------------------
 * A new round of Intel metric updates.
 
 * Support Power11 PVR (compatible to Power10).
 
 * Fix cache latency events on Zen 4 to set SliceId properly.
 
 Internal
 --------
 * Fix reference counting for 'map' data structure, tireless work from Ian!
 
 * More memory optimization for struct thread and annotate histogram.  Now,
   'perf report' (TUI) and 'perf annotate' should be much lighter-weight in
   terms of memory footprint.
 
 * Support cross-arch perf register access.  Clean up the build configuration
   so that it can detect arch-register support at runtime.  This can allow to
   parse register data in sample which was recorded in a different arch.
 
 Others
 ------
 * Sync task state in 'perf sched' to kernel using trace event fields.  The
   task states have been changed so tools cannot assume a fixed encoding.
 
 * Clean up 'perf mem' to generalize the arch-specific events.
 
 * Add support for local and global variables to data type profiling.  This
   would increase the success rate of type resolution with DWARF.
 
 * Add short option -H for --hierarchy in 'perf report' and 'perf top'.
 
 Signed-off-by: Namhyung Kim <namhyung@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZfHmfhQcbmFtaHl1bmdA
 a2VybmVsLm9yZwAKCRCMstVUGiXMg5krAP9Es5KEhAHvTHo6y4OX9ktrNGB3j/FB
 YgakrWSuJxJ+UAD8D49wUloO3yVDVOe6MxJrZrHcEDGDV6qVSr0aPwDpyw4=
 =gPPl
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v6.9-2024-03-13' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools updates from Namhyung Kim:
 "perf stat:

   - Support new 'cluster' aggregation mode for shared resources
     depending on the hardware configuration:

        $ sudo perf stat -a --per-cluster -e cycles,instructions sleep 1

         Performance counter stats for 'system wide':

        S0-D0-CLS0    2         85,051,822      cycles
        S0-D0-CLS0    2         73,909,908      instructions      #    0.87  insn per cycle
        S0-D0-CLS2    2         93,365,918      cycles
        S0-D0-CLS2    2         83,006,158      instructions      #    0.89  insn per cycle
        S0-D0-CLS4    2        104,157,523      cycles
        S0-D0-CLS4    2         53,234,396      instructions      #    0.51  insn per cycle
        S0-D0-CLS6    2         65,891,079      cycles
        S0-D0-CLS6    2         41,478,273      instructions      #    0.63  insn per cycle

               1.002407989 seconds time elapsed

   - Various fixes and cleanups for event metrics including NaN handling

  perf script:

   - Use libcapstone if available to disassemble the instructions. This
     enables 'perf script -F disasm' and 'perf script --insn-trace=disasm'
     (for Intel-PT):

        $ perf script -F event,ip,disasm
        cycles:P:  ffffffffa988d428             wrmsr
        cycles:P:  ffffffffa9839d25             movq %rax, %r14
        cycles:P:  ffffffffa9cdcaf0             endbr64
        cycles:P:  ffffffffa988d428             wrmsr
        cycles:P:  ffffffffa988d428             wrmsr
        cycles:P:  ffffffffaa401f86             iretq
        cycles:P:  ffffffffa99c4de5             movq 0x30(%rcx), %r8
        cycles:P:  ffffffffa988d428             wrmsr
        cycles:P:  ffffffffaa401f86             iretq
        cycles:P:  ffffffffa9907983             movl 0x68(%rbx), %eax
        cycles:P:  ffffffffa988d428             wrmsr

   - Expose sample ID / stream ID to python scripts

  perf test:

   - Add more perf test cases from Redhat internal test suites. This
     time it adds the base infra and a few perf probe tests. More to
     come. :)

   - Add 'perf test -p' for parallel execution and fix some issues found
     by the parallel test

   - Support symbol test to print symbols in given (active) module:

        $ perf test -F -v Symbols --dso /lib/modules/$(uname -r)/kernel/fs/ext4/ext4.ko
        --- start ---
        Testing /lib/modules/6.5.13-1rodete2-amd64/kernel/fs/ext4/ext4.ko
        Overlapping symbols:
         7a990-7a9a0 l __pfx_ext4_exit_fs
         7a990-7a9a0 g __pfx_cleanup_module
        Overlapping symbols:
         7a9a0-7aa1c l ext4_exit_fs
         7a9a0-7aa1c g cleanup_module
        ...

  JSON metric updates:

   - A new round of Intel metric updates

   - Support Power11 PVR (compatible to Power10)

   - Fix cache latency events on Zen 4 to set SliceId properly

  Internal:

   - Fix reference counting for 'map' data structure, tireless work from
     Ian!

   - More memory optimization for struct thread and annotate histogram.
     Now, 'perf report' (TUI) and 'perf annotate' should be much
     lighter-weight in terms of memory footprint

   - Support cross-arch perf register access. Clean up the build
     configuration so that it can detect arch-register support at
     runtime. This can allow to parse register data in sample which was
     recorded in a different arch

  Others:

   - Sync task state in 'perf sched' to kernel using trace event fields.
     The task states have been changed so tools cannot assume a fixed
     encoding

   - Clean up 'perf mem' to generalize the arch-specific events

   - Add support for local and global variables to data type profiling.
     This would increase the success rate of type resolution with DWARF

   - Add short option -H for --hierarchy in 'perf report' and 'perf top'"

* tag 'perf-tools-for-v6.9-2024-03-13' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (154 commits)
  perf annotate: Add comments in the data structures
  perf annotate: Remove sym_hist.addr[] array
  perf annotate: Calculate instruction overhead using hashmap
  perf annotate: Add a hashmap for symbol histogram
  perf threads: Reduce table size from 256 to 8
  perf threads: Switch from rbtree to hashmap
  perf threads: Move threads to its own files
  perf machine: Move machine's threads into its own abstraction
  perf machine: Move fprintf to for_each loop and a callback
  perf trace: Ignore thread hashing in summary
  perf report: Sort child tasks by tid
  perf vendor events amd: Fix Zen 4 cache latency events
  perf version: Display availability of OpenCSD support
  perf vendor events intel: Add umasks/occ_sel to PCU events.
  perf map: Fix map reference count issues
  libperf evlist: Avoid out-of-bounds access
  perf lock contention: Account contending locks too
  perf metrics: Fix segv for metrics with no events
  perf metrics: Fix metric matching
  perf pmu: Fix a potential memory leak in perf_pmu__lookup()
  ...
2024-03-14 16:31:23 -07:00
Linus Torvalds
01732755ee Probes updates for v6.9:
- x96/kprobes: Use boolean for some function return instead of 0 and 1.
  - x86/kprobes: Prohibit probing on INT/UD. This prevents user to put kprobe on
     INTn/INT1/INT3/INTO and UD0/UD1/UD2 because these are used for a special
     purpose in the kernel.
  - x86/kprobes: Boost Grp instructions. Because a few percent of kernel
     instructions are Grp 2/3/4/5 and those are safe to be executed without
     ip register fixup, allow those to be boosted (direct execution on the
     trampoline buffer with a JMP).
 
  - tracing/probes: Add function argument access from return events (kretprobe
     and fprobe). This allows user to compare how a data structure field is
     changed after executing a function. With BTF, return event also accepts
     function argument access by name. This also includes below patches;
   . Fix a wrong comment (using "Kretprobe" in fprobe)
   . Cleanup a big probe argument parser function into three parts, type
      parser, post-processing function, and main parser.
   . Cleanup to set nr_args field when initializing trace_probe instead of
      counting up it while parsing.
   . Cleanup a redundant #else block from tracefs/README source code.
   . Update selftests to check entry argument access from return probes.
   . Documentation update about entry argument access from return probes.
 -----BEGIN PGP SIGNATURE-----
 
 iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmXwW4kbHG1hc2FtaS5o
 aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bH80H/3H6JENlDAjaSLi4vYrP
 Qyw/cOGIuGu8cDEzkkOaFMol3TY23M7tQZH1lFefvV92gebZ0ttXnrQhSsKeO5XT
 PCZ6Eoift5rwJCY967W4V6O0DrAkOGHlPtlKs47APJnTXwn8RcFTqWlQmhWg1AfD
 g/FCWV7cs3eewZgV9iQcLydOoLLgRMr3G3rtPYQbCXhPzze0WTu4dSOXxCTjFe04
 riHQy7R+ut6Cur8njpoqZl6bCMkQqAylByXf6wK96HjcS0+ZI7Ivi8Ey3l2aAFen
 EeIViMU2Bl02XzBszj7Xq2cT/ebYAgDonFW3/5ZKD1YMO6F7wPoVH5OHrQ518Xuw
 hQ8=
 =O6l5
 -----END PGP SIGNATURE-----

Merge tag 'probes-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes updates from Masami Hiramatsu:
 "x86 kprobes:

   - Use boolean for some function return instead of 0 and 1

   - Prohibit probing on INT/UD. This prevents user to put kprobe on
     INTn/INT1/INT3/INTO and UD0/UD1/UD2 because these are used for a
     special purpose in the kernel

   - Boost Grp instructions. Because a few percent of kernel
     instructions are Grp 2/3/4/5 and those are safe to be executed
     without ip register fixup, allow those to be boosted (direct
     execution on the trampoline buffer with a JMP)

  tracing:

   - Add function argument access from return events (kretprobe and
     fprobe). This allows user to compare how a data structure field is
     changed after executing a function. With BTF, return event also
     accepts function argument access by name.

   - Fix a wrong comment (using "Kretprobe" in fprobe)

   - Cleanup a big probe argument parser function into three parts, type
     parser, post-processing function, and main parser

   - Cleanup to set nr_args field when initializing trace_probe instead
     of counting up it while parsing

   - Cleanup a redundant #else block from tracefs/README source code

   - Update selftests to check entry argument access from return probes

   - Documentation update about entry argument access from return
     probes"

* tag 'probes-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  Documentation: tracing: Add entry argument access at function exit
  selftests/ftrace: Add test cases for entry args at function exit
  tracing/probes: Support $argN in return probe (kprobe and fprobe)
  tracing: Remove redundant #else block for BTF args from README
  tracing/probes: cleanup: Set trace_probe::nr_args at trace_probe_init
  tracing/probes: Cleanup probe argument parser
  tracing/fprobe-event: cleanup: Fix a wrong comment in fprobe event
  x86/kprobes: Boost more instructions from grp2/3/4/5
  x86/kprobes: Prohibit kprobing on INT and UD
  x86/kprobes: Refactor can_{probe,boost} return type to bool
2024-03-14 16:16:33 -07:00
Linus Torvalds
c0a614e82e lsm/stable-6.9 PR 20240314
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmXzVowUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNHJw/+LJLFacgvuNv6erCQNJoKpIoUVfwl
 HMEWJv3MICSvG7BvqpWMS29tqms1XWP7IzblmMOJ3PF86h8oOf8hg2KbEBvarSW4
 WT0gVkHa+IBn9aaakUM5wDxgRnQyw5Iq+2P3LRC1rDkGgcgC2ETjcgYqnq3fD7SJ
 K1NpyhodaNEJ6ViW4CTjka/XX4mNpPilGJ2jqlBsNONBlHETafxE19njHxDaB4Xc
 AXPlc0atYW9RZXCnJ3Ot89vUdsNLZomDxLbay71O4PTUY6UpwFJHqrjnqhcKP5bQ
 gieX1Z6qdfi2Rb6recPCyWxOelYhvLsnTHD9bxXZfNHi8XnmQzW8rhCbVoD+nEOE
 xSkSk/pgiVhYcPCnKS8Skhr2p/AB/TSLhcnTAcCAD+w5yawFsVn96O54ntg8ljWW
 YVdtUS69AzqqtImedu2iPHBfVpi2DG2NIWI75Febf6NZeTnQemt2m6cY7eH92Noi
 kZgZBFkqRhBMzXKxQoeHVlbGbHGPQ+f7UUDxjzI24KXoDHHiMW5ecoGSomkLzvdS
 PxFVTfvSlvzdqAfKmbfGPpRNPgtGd7CV1glg7MYaKVt4ln1X1L/0jREiD5I/7uGY
 d60bFdFJcYNvod99YwDrlVdX9yCd1AHjy6PDydC//dfOKOChHzIVNFW9NcNPVNBy
 5H7VjBJO5TQpvWY=
 =ugzm
 -----END PGP SIGNATURE-----

Merge tag 'lsm-pr-20240314' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull lsm fixes from Paul Moore:
 "Two fixes to address issues with the LSM syscalls that we shipped in
  Linux v6.8. The first patch might be a bit controversial, but the
  second is a rather straightforward fix; more on both below.

  The first fix from Casey addresses a problem that should have been
  caught during the ~16 month (?) review cycle, but sadly was not. The
  good news is that Dmitry caught it very quickly once Linux v6.8 was
  released. The core issue is the use of size_t parameters to pass
  buffer sizes back and forth in the syscall; while we could have solved
  this with a compat syscall definition, given the newness of the
  syscalls I wanted to attempt to just redefine the size_t parameters as
  u32 types and avoid the work associated with a set of compat syscalls.

  However, this is technically a change in the syscall's signature/API
  so I can understand if you're opposed to this, even if the syscalls
  are less than a week old.

   [ Fingers crossed nobody even notices - Linus ]

  The second fix is a rather trivial fix to allow userspace to call into
  the lsm_get_self_attr() syscall with a NULL buffer to quickly
  determine a minimum required size for the buffer. We do have
  kselftests for this very case, I'm not sure why I didn't notice the
  failure; I'm going to guess stupidity, tired eyes, I dunno. My
  apologies we didn't catch this earlier"

* tag 'lsm-pr-20240314' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  lsm: handle the NULL buffer case in lsm_fill_user_ctx()
  lsm: use 32-bit compatible data types in LSM syscalls
2024-03-14 16:05:20 -07:00
Linus Torvalds
35e886e88c Landlock updates for v6.9-rc1
-----BEGIN PGP SIGNATURE-----
 
 iIYEABYKAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCZfHmqxAcbWljQGRpZ2lr
 b2QubmV0AAoJEOXj0OiMgvbSvbABAIUF7nujsgnE9AykjhTKzg+by86mvXK0fdLG
 WVW0cwfgAP49daJb8JyZP9d6PvcgDfH4vV8E7r5PFeaICPdoOwg2Bg==
 =xJV1
 -----END PGP SIGNATURE-----

Merge tag 'landlock-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux

Pull landlock updates from Mickaël Salaün:
 "Some miscellaneous improvements, including new KUnit tests, extended
  documentation and boot help, and some cosmetic cleanups.

  Additional test changes already went through the net tree"

* tag 'landlock-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
  samples/landlock: Don't error out if a file path cannot be opened
  landlock: Use f_cred in security_file_open() hook
  landlock: Rename "ptrace" files to "task"
  landlock: Simplify current_check_access_socket()
  landlock: Warn once if a Landlock action is requested while disabled
  landlock: Extend documentation for kernel support
  landlock: Add support for KUnit tests
  selftests/landlock: Clean up error logs related to capabilities
2024-03-14 16:00:27 -07:00
Linus Torvalds
6d75c6f40a arm64 updates for 6.9:
* Reorganise the arm64 kernel VA space and add support for LPA2 (at
   stage 1, KVM stage 2 was merged earlier) - 52-bit VA/PA address range
   with 4KB and 16KB pages
 
 * Enable Rust on arm64
 
 * Support for the 2023 dpISA extensions (data processing ISA), host only
 
 * arm64 perf updates:
 
   - StarFive's StarLink (integrates one or more CPU cores with a shared
     L3 memory system) PMU support
 
   - Enable HiSilicon Erratum 162700402 quirk for HIP09
 
   - Several updates for the HiSilicon PCIe PMU driver
 
   - Arm CoreSight PMU support
 
   - Convert all drivers under drivers/perf/ to use .remove_new()
 
 * Miscellaneous:
 
   - Don't enable workarounds for "rare" errata by default
 
   - Clean up the DAIF flags handling for EL0 returns (in preparation for
     NMI support)
 
   - Kselftest update for ptrace()
 
   - Update some of the sysreg field definitions
 
   - Slight improvement in the code generation for inline asm I/O
     accessors to permit offset addressing
 
   - kretprobes: acquire regs via a BRK exception (previously done via a
     trampoline handler)
 
   - SVE/SME cleanups, comment updates
 
   - Allow CALL_OPS+CC_OPTIMIZE_FOR_SIZE with clang (previously disabled
     due to gcc silently ignoring -falign-functions=N)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmXxiSgACgkQa9axLQDI
 XvHd7hAAjQrQqxJogPT2ahM5/gxct8qTrXpIgX0B1Y7bb5R8ztvOUN9MJNuDyRsj
 0s28SSZw387LReM5OUu+U6G/iahcuNAyP/8d9qeac32Tidd255fV3KPEh4C4eC+u
 0HeOqLBZ+stmNoa71tBC2K6SmchizhYyYduvRnri8km8K4OMDawHWqWRTXl0PNRT
 RMVJvZTDJMPfMBFeD4+B7EnSFOoP14tKCw9MZvlbpT2PEV0kINjhCQiojW2jJgqv
 w36vm/dhwsg1avSzT1xhy3KE+m+7n28+IC/wr1HB7c1WumvYKv7Z84ieCp3PlO3Z
 owvVO7dKJC6X3RkoY6Kge5p2RHU6poDerDVHYiAvG+Zi57nrDmHyAubskThsGTGR
 AibSEeJ5nQ0yM6hx7zAIQa5XEo4l0svD1ZM7NynY+5JR44W9cdAH3SnEsvIBMGIf
 /ja+iZ1W4ZQnIESQXD5uDPSxILfqQ8Ebhdorpw+Qg3rB7OhdTdGSSGQCi6V2PcJH
 d/ErFO+i0lFRBPJtBbUAN4EEu3HJcVYEoEnVJYQahC+6KyNGLxO+7L6sH0YO7Pag
 P1LRa6h8ktuBMrbCrOPWdmJYNDYCbb5rRtmcCwO0ItZ4g5tYWp9djFc8pyctCaNB
 MZxxRrUCNwXTOcFTDiYzyk+JCvpf3EvXfvj8AH+P8BMjFWgqHqw=
 =KTD/
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "The major features are support for LPA2 (52-bit VA/PA with 4K and 16K
  pages), the dpISA extension and Rust enabled on arm64. The changes are
  mostly contained within the usual arch/arm64/, drivers/perf, the arm64
  Documentation and kselftests. The exception is the Rust support which
  touches some generic build files.

  Summary:

   - Reorganise the arm64 kernel VA space and add support for LPA2 (at
     stage 1, KVM stage 2 was merged earlier) - 52-bit VA/PA address
     range with 4KB and 16KB pages

   - Enable Rust on arm64

   - Support for the 2023 dpISA extensions (data processing ISA), host
     only

   - arm64 perf updates:

      - StarFive's StarLink (integrates one or more CPU cores with a
        shared L3 memory system) PMU support

      - Enable HiSilicon Erratum 162700402 quirk for HIP09

      - Several updates for the HiSilicon PCIe PMU driver

      - Arm CoreSight PMU support

      - Convert all drivers under drivers/perf/ to use .remove_new()

   - Miscellaneous:

      - Don't enable workarounds for "rare" errata by default

      - Clean up the DAIF flags handling for EL0 returns (in preparation
        for NMI support)

      - Kselftest update for ptrace()

      - Update some of the sysreg field definitions

      - Slight improvement in the code generation for inline asm I/O
        accessors to permit offset addressing

      - kretprobes: acquire regs via a BRK exception (previously done
        via a trampoline handler)

      - SVE/SME cleanups, comment updates

      - Allow CALL_OPS+CC_OPTIMIZE_FOR_SIZE with clang (previously
        disabled due to gcc silently ignoring -falign-functions=N)"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (134 commits)
  Revert "mm: add arch hook to validate mmap() prot flags"
  Revert "arm64: mm: add support for WXN memory translation attribute"
  Revert "ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512"
  ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512
  kselftest/arm64: Add 2023 DPISA hwcap test coverage
  kselftest/arm64: Add basic FPMR test
  kselftest/arm64: Handle FPMR context in generic signal frame parser
  arm64/hwcap: Define hwcaps for 2023 DPISA features
  arm64/ptrace: Expose FPMR via ptrace
  arm64/signal: Add FPMR signal handling
  arm64/fpsimd: Support FEAT_FPMR
  arm64/fpsimd: Enable host kernel access to FPMR
  arm64/cpufeature: Hook new identification registers up to cpufeature
  docs: perf: Fix build warning of hisi-pcie-pmu.rst
  perf: starfive: Only allow COMPILE_TEST for 64-bit architectures
  MAINTAINERS: Add entry for StarFive StarLink PMU
  docs: perf: Add description for StarFive's StarLink PMU
  dt-bindings: perf: starfive: Add JH8100 StarLink PMU
  perf: starfive: Add StarLink PMU support
  docs: perf: Update usage for target filter of hisi-pcie-pmu
  ...
2024-03-14 15:35:42 -07:00
Paolo Bonzini
17193ced2d - Memop selftest rotate fix
- SCLP event bits over indication fix
 - Missing virt_to_phys for the CRYCB fix
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEwGNS88vfc9+v45Yq41TmuOI4ufgFAmXcfYoACgkQ41TmuOI4
 ufhPExAAxdcg3WjnTe/EYe+GnyjKo3nZs4y9dhZk9gf06qEYEawhg0ug5akzRZIH
 SDKeFqOXzl/ZRuL5hvfYBzxpy+IR3rWAYhBKUyxR6aJBl+RZKlf+Xn7l8iIKbNDq
 vAtLh9Hqza5IJiw/jtorw90TmiHDKvMlvft4UMG3t1IppyktUuuH0aujaVpeKtMR
 8qVyGsaTmNHip6Pi7w3WUnvYPkMNLoM7UIPhBAvWrJyYrLxao8pKEGWHaKwbMNHL
 Om4bjykfFCZ1Cs9aLZDLEasuD61Fpp41DnvImYm77yuDOdI4WalIlV7F5NbjQhhd
 IrQdsmlZc+N+HKcYvia6MnzAChTpo25pynvW7xXFQIfl/9VxcMFAfSLqLZMGMKFC
 IwzwI+BA3+bgw6zbN2z2uBShIom7Zzr689U8mbt5q7JborOH38qd5+IX6QFwUtTv
 IPHrgcULdWHWT5TRaIp61cB9YzCx2YU1QrMWEUVehldQqGEt8ANdZU5Ov0KG1BVl
 L9ULBIEnJ2ib1pGA7Xlxl2U0Lr2w/dg/p7EAdnOGes50GfEwEjtBzb7VO9Xfrz/Z
 j927hQO354Y8OYRFjKDjTceENynCiYsbNEhTHE6qFRIwAmeSVk4PT+vIXO6wZlZi
 Ee3LxsvVUnhYuC7sZbBUNhyiEjNn6GG3LxAtPeDoD+HvhqXI2Qg=
 =bwck
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-6.9-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

- Memop selftest rotate fix
- SCLP event bits over indication fix
- Missing virt_to_phys for the CRYCB fix
2024-03-14 14:47:56 -04:00
Casey Schaufler
a5a858f622 lsm: use 32-bit compatible data types in LSM syscalls
Change the size parameters in lsm_list_modules(), lsm_set_self_attr()
and lsm_get_self_attr() from size_t to u32. This avoids the need to
have different interfaces for 32 and 64 bit systems.

Cc: stable@vger.kernel.org
Fixes: a04a119808 ("LSM: syscalls for current process attributes")
Fixes: ad4aff9ec2 ("LSM: Create lsm_list_modules system call")
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reported-and-reviewed-by: Dmitry V. Levin <ldv@strace.io>
[PM: subject and metadata tweaks, syscall.h fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-03-14 11:31:26 -04:00
Linus Torvalds
69afef4af4 gpio updates for v6.9
Serialization rework:
 - use SRCU to serialize access to the global GPIO device list, to GPIO device
   structs themselves and to GPIO descriptors
 - make the GPIO subsystem resilient to the GPIO providers being unbound while
   the API calls are in progress
 - don't dereference the SRCU-protected chip pointer if the information we need
   can be obtained from the GPIO device structure
 - move some of the information contained in struct gpio_chip to struct
   gpio_device to further reduce the need to dereference the former
 - pass the GPIO device struct instead of the GPIO chip to sysfs callback to,
   again, reduce the need for accessing the latter
 - get GPIO descriptors from the GPIO device, not from the chip for the same
   reason
 - allow for mostly lockless operation of the GPIO driver API: assure
   consistency with SRCU and atomic operations
 - remove the global GPIO spinlock
 - remove the character device RW semaphore
 
 Core GPIOLIB:
 - constify pointers in GPIO API where applicable
 - unify the GPIO counting APIs for ACPI and OF
 - provide a macro for iterating over all GPIOs, not only the ones that are
   requested
 - remove leftover typedefs
 - pass the consumer device to GPIO core in devm_fwnode_gpiod_get_index() for
   improved logging
 - constify the GPIO bus type
 - don't warn about removing GPIO chips with descriptors still held by users as
   we can now handle this situation gracefully
 - remove unused logging helpers
 - unexport functions that are only used internally in the GPIO subsystem
 - set the device type (assign the relevant struct device_type) for GPIO devices
 
 New drivers:
 - add the ChromeOS EC GPIO driver
 
 Driver improvements:
 - allow building gpio-vf610 with COMPILE_TEST as well as disabling it in
   menuconfig (before it was always built for i.MX cofigs)
 - count the number of EICs using the device properties instead of hard-coding
   it in gpio-eic-sprd
 - improve the device naming, extend the debugfs output and add lockdep asserts
   to gpio-sim
 
 DT bindings:
 - document the 'label' property for gpio-pca9570
 - convert aspeed,ast2400-gpio bindings to DT schema
 - disallow unevaluated properties for gpio-mvebu
 - document a new model in renesas,rcar-gpio
 
 Documentation:
 - improve the character device kerneldocs in user-space headers
 - add proper documentation for the character device uAPI (both v1 and v2)
 - move the sysfs and gpio-mockup docs into the "obsolete" section
 - improve naming consistency for GPIO terms
 - clarify the line values description for sysfs
 - minor docs improvements
 - improve the driver API contract for setting GPIO direction
 - mark unsafe APIs as deprecated in kerneldocs and suggest replacements
 
 Other:
 - remove an obsolete test from selftests
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAmXu1ecACgkQEacuoBRx
 13JR7w//R3TswZ1uC9qkRjat9eA2KZUaI2QChlS7V/yXVcDHynuTlO/ZQmnnMdYL
 ch7T2cjPcW0OCt0UhcjamUmYtWaxe1e5GU3E42EosWUsojEzgGs0iNKe0R4SHYzv
 whlkFqO8+8IctYhiMpAU1PzP9N4YBqypwgCrTqHIrYuhz3MbPQxtCMkr7g0LTo8u
 Z3K0D3Y0LuwISWNYYhA20Bwemn1fEHXJ9f3pTeNaGh2dGZek9k9xd0zWcCxwhaYD
 CBTBiZXf57TUTJ2u+JG+au1ghEmmvBPlMpza+fazypbcvsiQxdGvv5QH1bTwyt4B
 woGq+biLLvlwfJ8BT7+09uni7gUyNL3wWkixlx/8Slkyti4xWqgZQ3WnhwN8yS4Y
 DbkTtzH/PIsjr1dZw6rnGoXi80lBEaok7LeI0QhybopTXQI+CnIbE/RBhzly8Mf8
 1cAVFjrF2gPuaTuheakRBw4LOhegf4a485fadJVEUeEpeF7/p9rDQWAbgohYUnCE
 gHPwkTOJuOZp+BlsTOyspnqxWnDVMtCnmi+q1o7JvEgXqAtzU7+1gz/wDpfsHgHQ
 oze6V2JvD2R3JkHmdqcIzq5yNwk1rOguOY3saNiwSk95JY+A8vhAe/gVykklKDXX
 oX/DPwlVd/0OR+0HCQ3r0pXK8BSRQ9qm/nUZNnLB+Rts9K1peIU=
 =LX+L
 -----END PGP SIGNATURE-----

Merge tag 'gpio-updates-for-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio updates from Bartosz Golaszewski:
 "The biggest feature is the locking overhaul. Up until now the
  synchronization in the GPIO subsystem was broken. There was a single
  spinlock "protecting" multiple data structures but doing it wrong (as
  evidenced by several places where it would be released when a sleeping
  function was called and then reacquired without checking the protected
  state).

  We tried to use an RW semaphore before but the main issue with GPIO is
  that we have drivers implementing the interfaces in both sleeping and
  non-sleeping ways as well as user-facing interfaces that can be called
  both from process as well as atomic contexts. Both ends converge in
  the same code paths that can use neither spinlocks nor mutexes. The
  only reasonable way out is to use SRCU and go mostly lockless. To that
  end: we add several SRCU structs in relevant places and use them to
  assure consistency between API calls together with atomic reads and
  writes of GPIO descriptor flags where it makes sense.

  This code has spent several weeks in next and has received several
  fixes in the first week or two after which it stabilized nicely. The
  GPIO subsystem is now resilient to providers being suddenly unbound.
  We managed to also remove the existing character device RW semaphore
  and the obsolete global spinlock.

  Other than the locking rework we have one new driver (for Chromebook
  EC), much appreciated documentation improvements from Kent and the
  regular driver improvements, DT-bindings updates and GPIOLIB core
  tweaks.

  Serialization rework:
   - use SRCU to serialize access to the global GPIO device list, to
     GPIO device structs themselves and to GPIO descriptors
   - make the GPIO subsystem resilient to the GPIO providers being
     unbound while the API calls are in progress
   - don't dereference the SRCU-protected chip pointer if the
     information we need can be obtained from the GPIO device structure
   - move some of the information contained in struct gpio_chip to
     struct gpio_device to further reduce the need to dereference the
     former
   - pass the GPIO device struct instead of the GPIO chip to sysfs
     callback to, again, reduce the need for accessing the latter
   - get GPIO descriptors from the GPIO device, not from the chip for
     the same reason
   - allow for mostly lockless operation of the GPIO driver API: assure
     consistency with SRCU and atomic operations
   - remove the global GPIO spinlock
   - remove the character device RW semaphore

  Core GPIOLIB:
   - constify pointers in GPIO API where applicable
   - unify the GPIO counting APIs for ACPI and OF
   - provide a macro for iterating over all GPIOs, not only the ones
     that are requested
   - remove leftover typedefs
   - pass the consumer device to GPIO core in
     devm_fwnode_gpiod_get_index() for improved logging
   - constify the GPIO bus type
   - don't warn about removing GPIO chips with descriptors still held by
     users as we can now handle this situation gracefully
   - remove unused logging helpers
   - unexport functions that are only used internally in the GPIO
     subsystem
   - set the device type (assign the relevant struct device_type) for
     GPIO devices

  New drivers:
   - add the ChromeOS EC GPIO driver

  Driver improvements:
   - allow building gpio-vf610 with COMPILE_TEST as well as disabling it
     in menuconfig (before it was always built for i.MX cofigs)
   - count the number of EICs using the device properties instead of
     hard-coding it in gpio-eic-sprd
   - improve the device naming, extend the debugfs output and add
     lockdep asserts to gpio-sim

  DT bindings:
   - document the 'label' property for gpio-pca9570
   - convert aspeed,ast2400-gpio bindings to DT schema
   - disallow unevaluated properties for gpio-mvebu
   - document a new model in renesas,rcar-gpio

  Documentation:
   - improve the character device kerneldocs in user-space headers
   - add proper documentation for the character device uAPI (both v1 and v2)
   - move the sysfs and gpio-mockup docs into the "obsolete" section
   - improve naming consistency for GPIO terms
   - clarify the line values description for sysfs
   - minor docs improvements
   - improve the driver API contract for setting GPIO direction
   - mark unsafe APIs as deprecated in kerneldocs and suggest
     replacements

  Other:
   - remove an obsolete test from selftests"

* tag 'gpio-updates-for-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (79 commits)
  gpio: sysfs: repair export returning -EPERM on 1st attempt
  selftest: gpio: remove obsolete gpio-mockup test
  gpiolib: Deduplicate cleanup for-loop in gpiochip_add_data_with_key()
  dt-bindings: gpio: aspeed,ast2400-gpio: Convert to DT schema
  gpio: acpi: Make acpi_gpio_count() take firmware node as a parameter
  gpio: of: Make of_gpio_get_count() take firmware node as a parameter
  gpiolib: Pass consumer device through to core in devm_fwnode_gpiod_get_index()
  gpio: sim: use for_each_hwgpio()
  gpio: provide for_each_hwgpio()
  gpio: don't warn about removing GPIO chips with active users anymore
  gpio: sim: delimit the fwnode name with a ":" when generating labels
  gpio: sim: add lockdep asserts
  gpio: Add ChromeOS EC GPIO driver
  gpio: constify of_phandle_args in of_find_gpio_device_by_xlate()
  gpio: fix memory leak in gpiod_request_commit()
  gpio: constify opaque pointer "data" in gpio_device_find()
  gpio: cdev: fix a NULL-pointer dereference with DEBUG enabled
  gpio: uapi: clarify default_values being logical
  gpio: sysfs: fix inverted pointer logic
  gpio: don't let lockdep complain about inherently dangerous RCU usage
  ...
2024-03-13 11:14:55 -07:00
Linus Torvalds
cc4a875cf3 lsm/stable-6.9 PR 20240312
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmXwt3cUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXObOhAAqldn1nbYS/t1D/k/9ZN/PtSQetK4
 S58D8+gB59Sg0daWFaRhCwwShIbXS/6XzhqaVb3iAPptJs0YDFMbWLAW2d+dd69K
 /7C8diguHbuJdEnCJtFYQIVinavaYVRlyoQcO8uwTz8uvTgXPOhr2P9NcOApJXcR
 xqttuADVo/9Zn0O9/+GUPCH0ROL0SMnuUjwdVP3bpPHj9zEk8F1/A6chzTeSLJru
 Y4+cRrN/r0JTkvRqPdnF9LSvxK7mtAEaHkKGeLQbw0O5pv3r3w0EWMJvq+uonGU2
 WX0eR5VMfevkFMUdw8FKOTa+OZ0HJ2KKIb4sB4wDMgeGyov7Z6SxgvFeQiSyD3aB
 QnyfLDzeEuPfousxUd45dUDnsWNnSgFF+JAdi0LSzm5hMuLeQDozTsFmh0orQcX1
 L5A6VtAbSPP0ffl+tuPi48q3P3LlSjMP0B8W20NXFYhXukKXCgXVMr/dEvpwpu1m
 o1glviGIXeLQQSnX3lMWb7Ds2igmCtXPrqkdu2vpRhMp0od6n4R4jH73Aj5MeSQn
 n3sP73dg5sAaMjtI2NOisMeFUp09MMlOumCCM+AIplPXremm1kwgKRTIp0rKsLW9
 VoQPXa43LQc3hAgPrpGuE+4yBfaBUq7Z8I37IFER/2y4K8b9YkduW4kDh7OdRz+d
 iQ4Nnu2lR/+CCH0=
 =0mTM
 -----END PGP SIGNATURE-----

Merge tag 'lsm-pr-20240312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull lsm updates from Paul Moore:

 - Promote IMA/EVM to a proper LSM

   This is the bulk of the diffstat, and the source of all the changes
   in the VFS code. Prior to the start of the LSM stacking work it was
   important that IMA/EVM were separate from the rest of the LSMs,
   complete with their own hooks, infrastructure, etc. as it was the
   only way to enable IMA/EVM at the same time as a LSM.

   However, now that the bulk of the LSM infrastructure supports
   multiple simultaneous LSMs, we can simplify things greatly by
   bringing IMA/EVM into the LSM infrastructure as proper LSMs. This is
   something I've wanted to see happen for quite some time and Roberto
   was kind enough to put in the work to make it happen.

 - Use the LSM hook default values to simplify the call_int_hook() macro

   Previously the call_int_hook() macro required callers to supply a
   default return value, despite a default value being specified when
   the LSM hook was defined.

   This simplifies the macro by using the defined default return value
   which makes life easier for callers and should also reduce the number
   of return value bugs in the future (we've had a few pop up recently,
   hence this work).

 - Use the KMEM_CACHE() macro instead of kmem_cache_create()

   The guidance appears to be to use the KMEM_CACHE() macro when
   possible and there is no reason why we can't use the macro, so let's
   use it.

 - Fix a number of comment typos in the LSM hook comment blocks

   Not much to say here, we fixed some questionable grammar decisions in
   the LSM hook comment blocks.

* tag 'lsm-pr-20240312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (28 commits)
  cred: Use KMEM_CACHE() instead of kmem_cache_create()
  lsm: use default hook return value in call_int_hook()
  lsm: fix typos in security/security.c comment headers
  integrity: Remove LSM
  ima: Make it independent from 'integrity' LSM
  evm: Make it independent from 'integrity' LSM
  evm: Move to LSM infrastructure
  ima: Move IMA-Appraisal to LSM infrastructure
  ima: Move to LSM infrastructure
  integrity: Move integrity_kernel_module_request() to IMA
  security: Introduce key_post_create_or_update hook
  security: Introduce inode_post_remove_acl hook
  security: Introduce inode_post_set_acl hook
  security: Introduce inode_post_create_tmpfile hook
  security: Introduce path_post_mknod hook
  security: Introduce file_release hook
  security: Introduce file_post_open hook
  security: Introduce inode_post_removexattr hook
  security: Introduce inode_post_setattr hook
  security: Align inode_setattr hook definition with EVM
  ...
2024-03-12 20:03:34 -07:00
Linus Torvalds
9187210eee Networking changes for 6.9.
Core & protocols
 ----------------
 
  - Large effort by Eric to lower rtnl_lock pressure and remove locks:
 
    - Make commonly used parts of rtnetlink (address, route dumps etc.)
      lockless, protected by RCU instead of rtnl_lock.
 
    - Add a netns exit callback which already holds rtnl_lock,
      allowing netns exit to take rtnl_lock once in the core
      instead of once for each driver / callback.
 
    - Remove locks / serialization in the socket diag interface.
 
    - Remove 6 calls to synchronize_rcu() while holding rtnl_lock.
 
    - Remove the dev_base_lock, depend on RCU where necessary.
 
  - Support busy polling on a per-epoll context basis. Poll length
    and budget parameters can be set independently of system defaults.
 
  - Introduce struct net_hotdata, to make sure read-mostly global config
    variables fit in as few cache lines as possible.
 
  - Add optional per-nexthop statistics to ease monitoring / debug
    of ECMP imbalance problems.
 
  - Support TCP_NOTSENT_LOWAT in MPTCP.
 
  - Ensure that IPv6 temporary addresses' preferred lifetimes are long
    enough, compared to other configured lifetimes, and at least 2 sec.
 
  - Support forwarding of ICMP Error messages in IPSec, per RFC 4301.
 
  - Add support for the independent control state machine for bonding
    per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled
    control state machine.
 
  - Add "network ID" to MCTP socket APIs to support hosts with multiple
    disjoint MCTP networks.
 
  - Re-use the mono_delivery_time skbuff bit for packets which user
    space wants to be sent at a specified time. Maintain the timing
    information while traversing veth links, bridge etc.
 
  - Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets.
 
  - Simplify many places iterating over netdevs by using an xarray
    instead of a hash table walk (hash table remains in place, for
    use on fastpaths).
 
  - Speed up scanning for expired routes by keeping a dedicated list.
 
  - Speed up "generic" XDP by trying harder to avoid large allocations.
 
  - Support attaching arbitrary metadata to netconsole messages.
 
 Things we sprinkled into general kernel code
 --------------------------------------------
 
  - Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce
    VM_SPARSE kind and vm_area_[un]map_pages (used by bpf_arena).
 
  - Rework selftest harness to enable the use of the full range of
    ksft exit code (pass, fail, skip, xfail, xpass).
 
 Netfilter
 ---------
 
  - Allow userspace to define a table that is exclusively owned by a daemon
    (via netlink socket aliveness) without auto-removing this table when
    the userspace program exits. Such table gets marked as orphaned and
    a restarting management daemon can re-attach/regain ownership.
 
  - Speed up element insertions to nftables' concatenated-ranges set type.
    Compact a few related data structures.
 
 BPF
 ---
 
  - Add BPF token support for delegating a subset of BPF subsystem
    functionality from privileged system-wide daemons such as systemd
    through special mount options for userns-bound BPF fs to a trusted
    & unprivileged application.
 
  - Introduce bpf_arena which is sparse shared memory region between BPF
    program and user space where structures inside the arena can have
    pointers to other areas of the arena, and pointers work seamlessly
    for both user-space programs and BPF programs.
 
  - Introduce may_goto instruction that is a contract between the verifier
    and the program. The verifier allows the program to loop assuming it's
    behaving well, but reserves the right to terminate it.
 
  - Extend the BPF verifier to enable static subprog calls in spin lock
    critical sections.
 
  - Support registration of struct_ops types from modules which helps
    projects like fuse-bpf that seeks to implement a new struct_ops type.
 
  - Add support for retrieval of cookies for perf/kprobe multi links.
 
  - Support arbitrary TCP SYN cookie generation / validation in the TC
    layer with BPF to allow creating SYN flood handling in BPF firewalls.
 
  - Add code generation to inline the bpf_kptr_xchg() helper which
    improves performance when stashing/popping the allocated BPF objects.
 
 Wireless
 --------
 
  - Add SPP (signaling and payload protected) AMSDU support.
 
  - Support wider bandwidth OFDMA, as required for EHT operation.
 
 Driver API
 ----------
 
  - Major overhaul of the Energy Efficient Ethernet internals to support
    new link modes (2.5GE, 5GE), share more code between drivers
    (especially those using phylib), and encourage more uniform behavior.
    Convert and clean up drivers.
 
  - Define an API for querying per netdev queue statistics from drivers.
 
  - IPSec: account in global stats for fully offloaded sessions.
 
  - Create a concept of Ethernet PHY Packages at the Device Tree level,
    to allow parameterizing the existing PHY package code.
 
  - Enable Rx hashing (RSS) on GTP protocol fields.
 
 Misc
 ----
 
  - Improvements and refactoring all over networking selftests.
 
  - Create uniform module aliases for TC classifiers, actions,
    and packet schedulers to simplify creating modprobe policies.
 
  - Address all missing MODULE_DESCRIPTION() warnings in networking.
 
  - Extend the Netlink descriptions in YAML to cover message encapsulation
    or "Netlink polymorphism", where interpretation of nested attributes
    depends on link type, classifier type or some other "class type".
 
 Drivers
 -------
 
  - Ethernet high-speed NICs:
    - Add a new driver for Marvell's Octeon PCI Endpoint NIC VF.
    - Intel (100G, ice, idpf):
      - support E825-C devices
    - nVidia/Mellanox:
      - support devices with one port and multiple PCIe links
    - Broadcom (bnxt):
      - support n-tuple filters
      - support configuring the RSS key
    - Wangxun (ngbe/txgbe):
      - implement irq_domain for TXGBE's sub-interrupts
    - Pensando/AMD:
      - support XDP
      - optimize queue submission and wakeup handling (+17% bps)
      - optimize struct layout, saving 28% of memory on queues
 
  - Ethernet NICs embedded and virtual:
    - Google cloud vNIC:
      - refactor driver to perform memory allocations for new queue
        config before stopping and freeing the old queue memory
    - Synopsys (stmmac):
      - obey queueMaxSDU and implement counters required by 802.1Qbv
    - Renesas (ravb):
      - support packet checksum offload
      - suspend to RAM and runtime PM support
 
  - Ethernet switches:
    - nVidia/Mellanox:
      - support for nexthop group statistics
    - Microchip:
      - ksz8: implement PHY loopback
      - add support for KSZ8567, a 7-port 10/100Mbps switch
 
  - PTP:
    - New driver for RENESAS FemtoClock3 Wireless clock generator.
    - Support OCP PTP cards designed and built by Adva.
 
  - CAN:
    - Support recvmsg() flags for own, local and remote traffic
      on CAN BCM sockets.
    - Support for esd GmbH PCIe/402 CAN device family.
    - m_can:
      - Rx/Tx submission coalescing
      - wake on frame Rx
 
  - WiFi:
    - Intel (iwlwifi):
      - enable signaling and payload protected A-MSDUs
      - support wider-bandwidth OFDMA
      - support for new devices
      - bump FW API to 89 for AX devices; 90 for BZ/SC devices
    - MediaTek (mt76):
      - mt7915: newer ADIE version support
      - mt7925: radio temperature sensor support
    - Qualcomm (ath11k):
      - support 6 GHz station power modes: Low Power Indoor (LPI),
        Standard Power) SP and Very Low Power (VLP)
      - QCA6390 & WCN6855: support 2 concurrent station interfaces
      - QCA2066 support
    - Qualcomm (ath12k):
      - refactoring in preparation for Multi-Link Operation (MLO) support
      - 1024 Block Ack window size support
      - firmware-2.bin support
      - support having multiple identical PCI devices (firmware needs to
        have ATH12K_FW_FEATURE_MULTI_QRTR_ID)
      - QCN9274: support split-PHY devices
      - WCN7850: enable Power Save Mode in station mode
      - WCN7850: P2P support
    - RealTek:
      - rtw88: support for more rtw8811cu and rtw8821cu devices
      - rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL
      - rtlwifi: speed up USB firmware initialization
      - rtwl8xxxu:
        - RTL8188F: concurrent interface support
        - Channel Switch Announcement (CSA) support in AP mode
    - Broadcom (brcmfmac):
      - per-vendor feature support
      - per-vendor SAE password setup
      - DMI nvram filename quirk for ACEPC W5 Pro
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmXv0mgACgkQMUZtbf5S
 IrtgMxAAuRd+WJW++SENr4KxIWhYO1q6Xcxnai43wrNkan9swD24icG8TYALt4f3
 yoT6idQvWReAb5JNlh9rUQz8R7E0nJXlvEFn5MtJwcthx2C6wFo/XkJlddlRrT+j
 c2xGILwLjRhW65LaC0MZ2ECbEERkFz8xcGfK2SWzUgh6KYvPjcRfKFxugpM7xOQK
 P/Wnqhs4fVRS/Mj/bCcXcO+yhwC121Q3qVeQVjGS0AzEC65hAW87a/kc2BfgcegD
 EyI9R7mf6criQwX+0awubjfoIdr4oW/8oDVNvUDczkJkbaEVaLMQk9P5x/0XnnVS
 UHUchWXyI80Q8Rj12uN1/I0h3WtwNQnCRBuLSmtm6GLfCAwbLvp2nGWDnaXiqryW
 DVKUIHGvqPKjkOOMOVfSvfB3LvkS3xsFVVYiQBQCn0YSs/gtu4CoF2Nty9CiLPbK
 tTuxUnLdPDZDxU//l0VArZmP8p2JM7XQGJ+JH8GFH4SBTyBR23e0iyPSoyaxjnYn
 RReDnHMVsrS1i7GPhbqDJWn+uqMSs7N149i0XmmyeqwQHUVSJN3J2BApP2nCaDfy
 H2lTuYly5FfEezt61NvCE4qr/VsWeEjm1fYlFQ9dFn4pGn+HghyCpw+xD1ZN56DN
 lujemau5B3kk1UTtAT4ypPqvuqjkRFqpNV2LzsJSk/Js+hApw8Y=
 =oY52
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core & protocols:

   - Large effort by Eric to lower rtnl_lock pressure and remove locks:

      - Make commonly used parts of rtnetlink (address, route dumps
        etc) lockless, protected by RCU instead of rtnl_lock.

      - Add a netns exit callback which already holds rtnl_lock,
        allowing netns exit to take rtnl_lock once in the core instead
        of once for each driver / callback.

      - Remove locks / serialization in the socket diag interface.

      - Remove 6 calls to synchronize_rcu() while holding rtnl_lock.

      - Remove the dev_base_lock, depend on RCU where necessary.

   - Support busy polling on a per-epoll context basis. Poll length and
     budget parameters can be set independently of system defaults.

   - Introduce struct net_hotdata, to make sure read-mostly global
     config variables fit in as few cache lines as possible.

   - Add optional per-nexthop statistics to ease monitoring / debug of
     ECMP imbalance problems.

   - Support TCP_NOTSENT_LOWAT in MPTCP.

   - Ensure that IPv6 temporary addresses' preferred lifetimes are long
     enough, compared to other configured lifetimes, and at least 2 sec.

   - Support forwarding of ICMP Error messages in IPSec, per RFC 4301.

   - Add support for the independent control state machine for bonding
     per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled
     control state machine.

   - Add "network ID" to MCTP socket APIs to support hosts with multiple
     disjoint MCTP networks.

   - Re-use the mono_delivery_time skbuff bit for packets which user
     space wants to be sent at a specified time. Maintain the timing
     information while traversing veth links, bridge etc.

   - Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets.

   - Simplify many places iterating over netdevs by using an xarray
     instead of a hash table walk (hash table remains in place, for use
     on fastpaths).

   - Speed up scanning for expired routes by keeping a dedicated list.

   - Speed up "generic" XDP by trying harder to avoid large allocations.

   - Support attaching arbitrary metadata to netconsole messages.

  Things we sprinkled into general kernel code:

   - Enforce VM_IOREMAP flag and range in ioremap_page_range and
     introduce VM_SPARSE kind and vm_area_[un]map_pages (used by
     bpf_arena).

   - Rework selftest harness to enable the use of the full range of ksft
     exit code (pass, fail, skip, xfail, xpass).

  Netfilter:

   - Allow userspace to define a table that is exclusively owned by a
     daemon (via netlink socket aliveness) without auto-removing this
     table when the userspace program exits. Such table gets marked as
     orphaned and a restarting management daemon can re-attach/regain
     ownership.

   - Speed up element insertions to nftables' concatenated-ranges set
     type. Compact a few related data structures.

  BPF:

   - Add BPF token support for delegating a subset of BPF subsystem
     functionality from privileged system-wide daemons such as systemd
     through special mount options for userns-bound BPF fs to a trusted
     & unprivileged application.

   - Introduce bpf_arena which is sparse shared memory region between
     BPF program and user space where structures inside the arena can
     have pointers to other areas of the arena, and pointers work
     seamlessly for both user-space programs and BPF programs.

   - Introduce may_goto instruction that is a contract between the
     verifier and the program. The verifier allows the program to loop
     assuming it's behaving well, but reserves the right to terminate
     it.

   - Extend the BPF verifier to enable static subprog calls in spin lock
     critical sections.

   - Support registration of struct_ops types from modules which helps
     projects like fuse-bpf that seeks to implement a new struct_ops
     type.

   - Add support for retrieval of cookies for perf/kprobe multi links.

   - Support arbitrary TCP SYN cookie generation / validation in the TC
     layer with BPF to allow creating SYN flood handling in BPF
     firewalls.

   - Add code generation to inline the bpf_kptr_xchg() helper which
     improves performance when stashing/popping the allocated BPF
     objects.

  Wireless:

   - Add SPP (signaling and payload protected) AMSDU support.

   - Support wider bandwidth OFDMA, as required for EHT operation.

  Driver API:

   - Major overhaul of the Energy Efficient Ethernet internals to
     support new link modes (2.5GE, 5GE), share more code between
     drivers (especially those using phylib), and encourage more
     uniform behavior. Convert and clean up drivers.

   - Define an API for querying per netdev queue statistics from
     drivers.

   - IPSec: account in global stats for fully offloaded sessions.

   - Create a concept of Ethernet PHY Packages at the Device Tree level,
     to allow parameterizing the existing PHY package code.

   - Enable Rx hashing (RSS) on GTP protocol fields.

  Misc:

   - Improvements and refactoring all over networking selftests.

   - Create uniform module aliases for TC classifiers, actions, and
     packet schedulers to simplify creating modprobe policies.

   - Address all missing MODULE_DESCRIPTION() warnings in networking.

   - Extend the Netlink descriptions in YAML to cover message
     encapsulation or "Netlink polymorphism", where interpretation of
     nested attributes depends on link type, classifier type or some
     other "class type".

  Drivers:

   - Ethernet high-speed NICs:
      - Add a new driver for Marvell's Octeon PCI Endpoint NIC VF.
      - Intel (100G, ice, idpf):
         - support E825-C devices
      - nVidia/Mellanox:
         - support devices with one port and multiple PCIe links
      - Broadcom (bnxt):
         - support n-tuple filters
         - support configuring the RSS key
      - Wangxun (ngbe/txgbe):
         - implement irq_domain for TXGBE's sub-interrupts
      - Pensando/AMD:
         - support XDP
         - optimize queue submission and wakeup handling (+17% bps)
         - optimize struct layout, saving 28% of memory on queues

   - Ethernet NICs embedded and virtual:
      - Google cloud vNIC:
         - refactor driver to perform memory allocations for new queue
           config before stopping and freeing the old queue memory
      - Synopsys (stmmac):
         - obey queueMaxSDU and implement counters required by 802.1Qbv
      - Renesas (ravb):
         - support packet checksum offload
         - suspend to RAM and runtime PM support

   - Ethernet switches:
      - nVidia/Mellanox:
         - support for nexthop group statistics
      - Microchip:
         - ksz8: implement PHY loopback
         - add support for KSZ8567, a 7-port 10/100Mbps switch

   - PTP:
      - New driver for RENESAS FemtoClock3 Wireless clock generator.
      - Support OCP PTP cards designed and built by Adva.

   - CAN:
      - Support recvmsg() flags for own, local and remote traffic on CAN
        BCM sockets.
      - Support for esd GmbH PCIe/402 CAN device family.
      - m_can:
         - Rx/Tx submission coalescing
         - wake on frame Rx

   - WiFi:
      - Intel (iwlwifi):
         - enable signaling and payload protected A-MSDUs
         - support wider-bandwidth OFDMA
         - support for new devices
         - bump FW API to 89 for AX devices; 90 for BZ/SC devices
      - MediaTek (mt76):
         - mt7915: newer ADIE version support
         - mt7925: radio temperature sensor support
      - Qualcomm (ath11k):
         - support 6 GHz station power modes: Low Power Indoor (LPI),
           Standard Power) SP and Very Low Power (VLP)
         - QCA6390 & WCN6855: support 2 concurrent station interfaces
         - QCA2066 support
      - Qualcomm (ath12k):
         - refactoring in preparation for Multi-Link Operation (MLO)
           support
         - 1024 Block Ack window size support
         - firmware-2.bin support
         - support having multiple identical PCI devices (firmware needs
           to have ATH12K_FW_FEATURE_MULTI_QRTR_ID)
         - QCN9274: support split-PHY devices
         - WCN7850: enable Power Save Mode in station mode
         - WCN7850: P2P support
      - RealTek:
         - rtw88: support for more rtw8811cu and rtw8821cu devices
         - rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL
         - rtlwifi: speed up USB firmware initialization
         - rtwl8xxxu:
             - RTL8188F: concurrent interface support
             - Channel Switch Announcement (CSA) support in AP mode
      - Broadcom (brcmfmac):
         - per-vendor feature support
         - per-vendor SAE password setup
         - DMI nvram filename quirk for ACEPC W5 Pro"

* tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits)
  nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y
  nexthop: Fix out-of-bounds access during attribute validation
  nexthop: Only parse NHA_OP_FLAGS for dump messages that require it
  nexthop: Only parse NHA_OP_FLAGS for get messages that require it
  bpf: move sleepable flag from bpf_prog_aux to bpf_prog
  bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes()
  selftests/bpf: Add kprobe multi triggering benchmarks
  ptp: Move from simple ida to xarray
  vxlan: Remove generic .ndo_get_stats64
  vxlan: Do not alloc tstats manually
  devlink: Add comments to use netlink gen tool
  nfp: flower: handle acti_netdevs allocation failure
  net/packet: Add getsockopt support for PACKET_COPY_THRESH
  net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID
  selftests/bpf: Add bpf_arena_htab test.
  selftests/bpf: Add bpf_arena_list test.
  selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages
  bpf: Add helper macro bpf_addr_space_cast()
  libbpf: Recognize __arena global variables.
  bpftool: Recognize arena map type
  ...
2024-03-12 17:44:08 -07:00
Linus Torvalds
7f1a277409 seccomp updates for v6.9-rc1
- Improve reliability of selftests (Terry Tritton, Kees Cook)
 
 - Fix strict-aliasing warning in samples (Arnd Bergmann)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmXvlj8WHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJs6fD/9jcb5/9RNObVjSb19XSDfmlm6t
 +0u+aHKBZCuIu8WgrlU2rtMUJ8BiiWKVQIOViIJJJ6xz4uXIo+cRV3o5oRNKRrbj
 TDBoX3aiaFN8AcJL8TXHRMWpjOod99VLAazMD5ZbMI97/kWS1eFyZiyewea24NpS
 uMSBQFbfZUs44nvYLv+6zIVsBZ1AOhUJwnjgY/3cV2MleVPyb81EJRtJBiOlmTBY
 i2Vhk61Vgb4Ab/NruMmsBMCEMZzGNKeO1HaatmFBPhEZYh9vkeynCQC9MciBgVBB
 jzdhsxWVbBhkYc1GJUzXNGDavGuay/OIuuihA58JDQGHFUxJGR3hkDXSaXLCmGxt
 dPln+WFItfQHCJStfa+9m/muuoCJkKCZu6TkCVZC+n8fbaCPNSvvEjGfi2b7jL/x
 QKYto9AgWA/FJU+Z572davaMcCk84gcm8FNpQzm0KoMkfRVqz6XCSoYQ8sxNXrQp
 9+XzwAkLrqsVRZQK8rTzfGJ7G+7yMShBlyCgM8BhvUmGE6KS2c1J3AEe+ejumK1V
 a3jlmj6cO4IXVDZdE585NYTO6dMKxQ659VMErqYt0tzvmyPh6BqNyJkga9l/CHTA
 LHpYGImmQLtWx5uuh5jM+YLFSNktBhDeVMaqJOmalUfqdD8NeMuub7GciPxF6qAz
 u13iYmRsw4KZ66tE/w==
 =lqxR
 -----END PGP SIGNATURE-----

Merge tag 'seccomp-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp updates from Kees Cook:
 "There are no core kernel changes here; it's entirely selftests and
  samples:

   - Improve reliability of selftests (Terry Tritton, Kees Cook)

   - Fix strict-aliasing warning in samples (Arnd Bergmann)"

* tag 'seccomp-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  samples: user-trap: fix strict-aliasing warning
  selftests/seccomp: Pin benchmark to single CPU
  selftests/seccomp: user_notification_addfd check nextfd is available
  selftests/seccomp: Change the syscall used in KILL_THREAD test
  selftests/seccomp: Handle EINVAL on unshare(CLONE_NEWPID)
2024-03-12 15:05:27 -07:00
Linus Torvalds
216532e147 hardening updates for v6.9-rc1
- string.h and related header cleanups (Tanzir Hasan, Andy Shevchenko)
 
 - VMCI memcpy() usage and struct_size() cleanups (Vasiliy Kovalev, Harshit
   Mogalapalli)
 
 - selftests/powerpc: Fix load_unaligned_zeropad build failure (Michael
   Ellerman)
 
 - hardened Kconfig fragment updates (Marco Elver, Lukas Bulwahn)
 
 - Handle tail call optimization better in LKDTM (Douglas Anderson)
 
 - Use long form types in overflow.h (Andy Shevchenko)
 
 - Add flags param to string_get_size() (Andy Shevchenko)
 
 - Add Coccinelle script for potential struct_size() use (Jacob Keller)
 
 - Fix objtool corner case under KCFI (Josh Poimboeuf)
 
 - Drop 13 year old backward compat CAP_SYS_ADMIN check (Jingzi Meng)
 
 - Add str_plural() helper (Michal Wajdeczko, Kees Cook)
 
 - Ignore relocations in .notes section
 
 - Add comments to explain how __is_constexpr() works
 
 - Fix m68k stack alignment expectations in stackinit Kunit test
 
 - Convert string selftests to KUnit
 
 - Add KUnit tests for fortified string functions
 
 - Improve reporting during fortified string warnings
 
 - Allow non-type arg to type_max() and type_min()
 
 - Allow strscpy() to be called with only 2 arguments
 
 - Add binary mode to leaking_addresses scanner
 
 - Various small cleanups to leaking_addresses scanner
 
 - Adding wrapping_*() arithmetic helper
 
 - Annotate initial signed integer wrap-around in refcount_t
 
 - Add explicit UBSAN section to MAINTAINERS
 
 - Fix UBSAN self-test warnings
 
 - Simplify UBSAN build via removal of CONFIG_UBSAN_SANITIZE_ALL
 
 - Reintroduce UBSAN's signed overflow sanitizer
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmXvm5kWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJiQqD/4mM6SWZpYHKlR1nEiqIyz7Hqr9
 g4oguuw6HIVNJXLyeBI5Hd43CTeHPA0e++EETqhUAt7HhErxfYJY+JB221nRYmu+
 zhhQ7N/xbTMV/Je7AR03kQjhiMm8LyEcM2X4BNrsAcoCieQzmO3g0zSp8ISzLUE0
 PEEmf1lOzMe3gK2KOFCPt5Hiz9sGWyN6at+BQubY18tQGtjEXYAQNXkpD5qhGn4a
 EF693r/17wmc8hvSsjf4AGaWy1k8crG0WfpMCZsaqftjj0BbvOC60IDyx4eFjpcy
 tGyAJKETq161AkCdNweIh2Q107fG3tm0fcvw2dv8Wt1eQCko6M8dUGCBinQs/thh
 TexjJFS/XbSz+IvxLqgU+C5qkOP23E0M9m1dbIbOFxJAya/5n16WOBlGr3ae2Wdq
 /+t8wVSJw3vZiku5emWdFYP1VsdIHUjVa5QizFaaRhzLGRwhxVV49SP4IQC/5oM5
 3MAgNOFTP6yRQn9Y9wP+SZs+SsfaIE7yfKa9zOi4S+Ve+LI2v4YFhh8NCRiLkeWZ
 R1dhp8Pgtuq76f/v0qUaWcuuVeGfJ37M31KOGIhi1sI/3sr7UMrngL8D1+F8UZMi
 zcLu+x4GtfUZCHl6znx1rNUBqE5S/5ndVhLpOqfCXKaQ+RAm7lkOJ3jXE2VhNkhp
 yVEmeSOLnlCaQjZvXQ==
 =OP+o
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:
 "As is pretty normal for this tree, there are changes all over the
  place, especially for small fixes, selftest improvements, and improved
  macro usability.

  Some header changes ended up landing via this tree as they depended on
  the string header cleanups. Also, a notable set of changes is the work
  for the reintroduction of the UBSAN signed integer overflow sanitizer
  so that we can continue to make improvements on the compiler side to
  make this sanitizer a more viable future security hardening option.

  Summary:

   - string.h and related header cleanups (Tanzir Hasan, Andy
     Shevchenko)

   - VMCI memcpy() usage and struct_size() cleanups (Vasiliy Kovalev,
     Harshit Mogalapalli)

   - selftests/powerpc: Fix load_unaligned_zeropad build failure
     (Michael Ellerman)

   - hardened Kconfig fragment updates (Marco Elver, Lukas Bulwahn)

   - Handle tail call optimization better in LKDTM (Douglas Anderson)

   - Use long form types in overflow.h (Andy Shevchenko)

   - Add flags param to string_get_size() (Andy Shevchenko)

   - Add Coccinelle script for potential struct_size() use (Jacob
     Keller)

   - Fix objtool corner case under KCFI (Josh Poimboeuf)

   - Drop 13 year old backward compat CAP_SYS_ADMIN check (Jingzi Meng)

   - Add str_plural() helper (Michal Wajdeczko, Kees Cook)

   - Ignore relocations in .notes section

   - Add comments to explain how __is_constexpr() works

   - Fix m68k stack alignment expectations in stackinit Kunit test

   - Convert string selftests to KUnit

   - Add KUnit tests for fortified string functions

   - Improve reporting during fortified string warnings

   - Allow non-type arg to type_max() and type_min()

   - Allow strscpy() to be called with only 2 arguments

   - Add binary mode to leaking_addresses scanner

   - Various small cleanups to leaking_addresses scanner

   - Adding wrapping_*() arithmetic helper

   - Annotate initial signed integer wrap-around in refcount_t

   - Add explicit UBSAN section to MAINTAINERS

   - Fix UBSAN self-test warnings

   - Simplify UBSAN build via removal of CONFIG_UBSAN_SANITIZE_ALL

   - Reintroduce UBSAN's signed overflow sanitizer"

* tag 'hardening-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (51 commits)
  selftests/powerpc: Fix load_unaligned_zeropad build failure
  string: Convert helpers selftest to KUnit
  string: Convert selftest to KUnit
  sh: Fix build with CONFIG_UBSAN=y
  compiler.h: Explain how __is_constexpr() works
  overflow: Allow non-type arg to type_max() and type_min()
  VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler()
  lib/string_helpers: Add flags param to string_get_size()
  x86, relocs: Ignore relocations in .notes section
  objtool: Fix UNWIND_HINT_{SAVE,RESTORE} across basic blocks
  overflow: Use POD in check_shl_overflow()
  lib: stackinit: Adjust target string to 8 bytes for m68k
  sparc: vdso: Disable UBSAN instrumentation
  kernel.h: Move lib/cmdline.c prototypes to string.h
  leaking_addresses: Provide mechanism to scan binary files
  leaking_addresses: Ignore input device status lines
  leaking_addresses: Use File::Temp for /tmp files
  MAINTAINERS: Update LEAKING_ADDRESSES details
  fortify: Improve buffer overflow reporting
  fortify: Add KUnit tests for runtime overflows
  ...
2024-03-12 14:49:30 -07:00
Linus Torvalds
b32273ee89 execve updates for v6.9-rc1
- Drop needless error path code in remove_arg_zero() (Li kunyu, Kees Cook)
 
 - binfmt_elf_efpic: Don't use missing interpreter's properties (Max Filippov)
 
 - Use /bin/bash for execveat selftests
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmXvlWUWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJueMEACVrxXuXlpozupTtixMzWkvoUjo
 bDmsyuX55PEmKwZXppD7cyxzHM0cdOzQmwMTBB8RWlMzZDMB/U6A8vxwKdoqGNT6
 8nQ7/+GkeZLL32BSf8rtMsCrnFx58elOzEuiogkUwz73G/fBe+tbbZAFsR7q5cvr
 6sHT9gP2Topycr01fHUwL41yDLZReCasxWdR+kYfn2akmpBGHpw12auHmZcVmWCc
 /uJTF4FUBt6Fa2h2OmQ3IByNZ50UoORfFkpP93ZaL1MUlILWMXo3DHOAM9vhowut
 PMa/9Blw86hZBIjKEkeeCIU83LSnI5PQCd7V+zCJmaslxkNPvoeH09rqHfGL37Pv
 DAOPpTEEm0l6ifunIAruSRmislBzQgO6n5ALPmMp4PcdBi5bbsk9PCLDEFwaTCeV
 9H4kZnPl00Q7yyEXwHSJi1FFF3/DM0ntXVND2KQJVzqrszB51lALkI8fypWvTb9h
 POmU7PrYEXdjiTcMsWarajHYeV/VjmY7vwzjl8lXiw5nWnLJYQua8TAx4dEhpM3z
 qwa5K2L724ncsgKkwDZPDA3DsUAN9jYK+eqRRi6kD5zWdTkBHVvdLQrBjkUhndw/
 DL2FkcLDewbHInEdbbIFOJUUmBxbRLcXEqb2nzQtiYIBQm4VqZFKTQqZVDWHF1UP
 +VeLTdDf6piwoP0cvQ==
 =MLV7
 -----END PGP SIGNATURE-----

Merge tag 'execve-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull execve updates from Kees Cook:

 - Drop needless error path code in remove_arg_zero() (Li kunyu, Kees
   Cook)

 - binfmt_elf_efpic: Don't use missing interpreter's properties (Max
   Filippov)

 - Use /bin/bash for execveat selftests

* tag 'execve-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  exec: Simplify remove_arg_zero() error path
  selftests/exec: Perform script checks with /bin/bash
  exec: Delete unnecessary statements in remove_arg_zero()
  fs: binfmt_elf_efpic: don't use missing interpreter's properties
2024-03-12 14:45:12 -07:00
Nico Pache
84d147df13 selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements
Now that run_vmtests.sh does not guarantee that the correct hugepage count
is available, skip the hugetlb-madvise test if the requirements are not
met rather than failing.

Link: https://lkml.kernel.org/r/20240306223714.320681-4-npache@redhat.com
Signed-off-by: Nico Pache <npache@redhat.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-12 13:07:18 -07:00
Nico Pache
5a6aa60d18 selftests/mm: skip uffd hugetlb tests with insufficient hugepages
Now that run_vmtests.sh does not guarantee that the correct hugepage count
is available, add a check inside the userfaultfd hugetlb test to verify
the nr_hugepages count before continuing.

Link: https://lkml.kernel.org/r/20240306223714.320681-3-npache@redhat.com
Signed-off-by: Nico Pache <npache@redhat.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-12 13:07:18 -07:00
Nico Pache
2fd570c1d8 selftests/mm: dont fail testsuite due to a lack of hugepages
Patch series "selftests/mm: Improve Hugepage Test Handling in MM
Selftests", v2.

This series addresses issues related to hugepage requirements in the MM
selftests, ensuring tests are skipped rather than failing when the
necessary hugepage count is not met.

This adjustment allows for a more graceful handling for systems with
insufficient hugepages, preventing unnecessary test failures and improving
the overall robustness of the test suite.


This patch (of 3):

On systems that have large core counts and large page sizes, but limited
memory, the userfaultfd test hugepage requirement is too large.

Exiting early due to missing one test's requirements is a rather
aggressive strategy, and prevents a lot of other tests from running. 
Remove the early exit to prevent this.

Link: https://lkml.kernel.org/r/20240306223714.320681-1-npache@redhat.com
Link: https://lkml.kernel.org/r/20240306223714.320681-2-npache@redhat.com
Fixes: ee00479d67 ("selftests: vm: Try harder to allocate huge pages")
Signed-off-by: Nico Pache <npache@redhat.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-12 13:07:18 -07:00
Linus Torvalds
691632f0e8 s390 updates for 6.9 merge window
- Various virtual vs physical address usage fixes
 
 - Fix error handling in Processor Activity Instrumentation device driver, and
   export number of counters with a sysfs file
 
 - Allow for multiple events when Processor Activity Instrumentation counters
   are monitored in system wide sampling
 
 - Change multiplier and shift values of the Time-of-Day clock source to improve
   steering precision
 
 - Remove a couple of unneeded GFP_DMA flags from allocations
 
 - Disable mmap alignment if randomize_va_space is also disabled, to avoid a too
   small heap
 
 - Various changes to allow s390 to be compiled with LLVM=1, since ld.lld and
   llvm-objcopy will have proper s390 support witch clang 19
 
 - Add __uninitialized macro to Compiler Attributes. This is helpful with s390's
   FPU code where some users have up to 520 byte stack frames. Clearing such
   stack frames (if INIT_STACK_ALL_PATTERN or INIT_STACK_ALL_ZERO is enabled)
   before they are used contradicts the intention (performance improvement) of
   such code sections.
 
 - Convert switch_to() to an out-of-line function, and use the generic switch_to
   header file
 
 - Replace the usage of s390's debug feature with pr_debug() calls within the
   zcrypt device driver
 
 - Improve hotplug support of the Adjunct Processor device driver
 
 - Improve retry handling in the zcrypt device driver
 
 - Various changes to the in-kernel FPU code:
 
   - Make in-kernel FPU sections preemptible
 
   - Convert various larger inline assemblies and assembler files to C, mainly
     by using singe instruction inline assemblies. This increases readability,
     but also allows makes it easier to add proper instrumentation hooks
 
   - Cleanup of the header files
 
 - Provide fast variants of csum_partial() and csum_partial_copy_nocheck() based
   on vector instructions
 
 - Introduce and use a lock to synchronize accesses to zpci device data
   structures to avoid inconsistent states caused by concurrent accesses
 
 - Compile the kernel without -fPIE. This addresses the following problems if
   the kernel is compiled with -fPIE:
 
   - It uses dynamic symbols (.dynsym), for which the linker refuses to allow
     more than 64k sections. This can break features which use
     '-ffunction-sections' and '-fdata-sections', including kpatch-build and
     function granular KASLR
 
   - It unnecessarily uses GOT relocations, adding an extra layer of indirection
     for many memory accesses
 
 - Fix shared_cpu_list for CPU private L2 caches, which incorrectly were
   reported as globally shared
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmXu3jEACgkQIg7DeRsp
 bsJC8A/9Gi9JSMKWpIDR4WE2MQGwP/PnYdEamtK6c9ewOjIR/UzRIyIM3J1pyV0L
 RwL8k7EBuv3f7shTcwfPzZWlnAwNwqr1UdcafjFNtHTig50YtdP5fBL33frKHBrm
 ATedlCjagojOuVbh1gB45WUgzjSSkPyn0vqwjjo4h6uEAQ35zMEWwCs5Hpajlkhi
 GCdJaiBLJcnhT96QGurQdke+MsrpGCzeBVBnA0qopQEWaQo8OdiAJ1uMD2WKbgPR
 817kNzvmE6nXnfd5JevYbaiLjK/HQUSw2dZUS6/fjuIrzTsZEUhSg4ECaprKXDg7
 5qiVVPNg4WbJAp0SsB+w7c4U99VxhbS7IVHXju18GrXw6SSAupdxIo7R7YiaT8vC
 YIXZ1uIQ4Vbts3w/UqWUczIl/ooQt2DdrWT5NDNA+84OlOM42rthzA3vznTWuPTb
 U21R7cZmN++hAUjR6s4aO2LfS7HQdnKL8nvJW2y99qSfrOXm+M973W2pDhYEVXQh
 ixQ/lxfQpbBT1yUGlquIErokCPB85VY6ZTdGu6Erziywf4CWGsT5CspyaQnX2KTJ
 s4CpFPnilrW3OnxmIkrM+pNJDun1nnkGA388Xq1NEKX8Oe65OMXEFNCb0kAHQ1ua
 vb6534Ib/iuPnxsGpz1sX9iRqtUd06aBovPcbwIvatHCSfkWws8=
 =KZ31
 -----END PGP SIGNATURE-----

Merge tag 's390-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Heiko Carstens:

 - Various virtual vs physical address usage fixes

 - Fix error handling in Processor Activity Instrumentation device
   driver, and export number of counters with a sysfs file

 - Allow for multiple events when Processor Activity Instrumentation
   counters are monitored in system wide sampling

 - Change multiplier and shift values of the Time-of-Day clock source to
   improve steering precision

 - Remove a couple of unneeded GFP_DMA flags from allocations

 - Disable mmap alignment if randomize_va_space is also disabled, to
   avoid a too small heap

 - Various changes to allow s390 to be compiled with LLVM=1, since
   ld.lld and llvm-objcopy will have proper s390 support witch clang 19

 - Add __uninitialized macro to Compiler Attributes. This is helpful
   with s390's FPU code where some users have up to 520 byte stack
   frames. Clearing such stack frames (if INIT_STACK_ALL_PATTERN or
   INIT_STACK_ALL_ZERO is enabled) before they are used contradicts the
   intention (performance improvement) of such code sections.

 - Convert switch_to() to an out-of-line function, and use the generic
   switch_to header file

 - Replace the usage of s390's debug feature with pr_debug() calls
   within the zcrypt device driver

 - Improve hotplug support of the Adjunct Processor device driver

 - Improve retry handling in the zcrypt device driver

 - Various changes to the in-kernel FPU code:

     - Make in-kernel FPU sections preemptible

     - Convert various larger inline assemblies and assembler files to
       C, mainly by using singe instruction inline assemblies. This
       increases readability, but also allows makes it easier to add
       proper instrumentation hooks

     - Cleanup of the header files

 - Provide fast variants of csum_partial() and
   csum_partial_copy_nocheck() based on vector instructions

 - Introduce and use a lock to synchronize accesses to zpci device data
   structures to avoid inconsistent states caused by concurrent accesses

 - Compile the kernel without -fPIE. This addresses the following
   problems if the kernel is compiled with -fPIE:

     - It uses dynamic symbols (.dynsym), for which the linker refuses
       to allow more than 64k sections. This can break features which
       use '-ffunction-sections' and '-fdata-sections', including
       kpatch-build and function granular KASLR

     - It unnecessarily uses GOT relocations, adding an extra layer of
       indirection for many memory accesses

 - Fix shared_cpu_list for CPU private L2 caches, which incorrectly were
   reported as globally shared

* tag 's390-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (117 commits)
  s390/tools: handle rela R_390_GOTPCDBL/R_390_GOTOFF64
  s390/cache: prevent rebuild of shared_cpu_list
  s390/crypto: remove retry loop with sleep from PAES pkey invocation
  s390/pkey: improve pkey retry behavior
  s390/zcrypt: improve zcrypt retry behavior
  s390/zcrypt: introduce retries on in-kernel send CPRB functions
  s390/ap: introduce mutex to lock the AP bus scan
  s390/ap: rework ap_scan_bus() to return true on config change
  s390/ap: clarify AP scan bus related functions and variables
  s390/ap: rearm APQNs bindings complete completion
  s390/configs: increase number of LOCKDEP_BITS
  s390/vfio-ap: handle hardware checkstop state on queue reset operation
  s390/pai: change sampling event assignment for PMU device driver
  s390/boot: fix minor comment style damages
  s390/boot: do not check for zero-termination relocation entry
  s390/boot: make type of __vmlinux_relocs_64_start|end consistent
  s390/boot: sanitize kaslr_adjust_relocs() function prototype
  s390/boot: simplify GOT handling
  s390: vmlinux.lds.S: fix .got.plt assertion
  s390/boot: workaround current 'llvm-objdump -t -j ...' behavior
  ...
2024-03-12 10:14:22 -07:00
Ido Schimmel
d8a21070b6 nexthop: Fix out-of-bounds access during attribute validation
Passing a maximum attribute type to nlmsg_parse() that is larger than
the size of the passed policy will result in an out-of-bounds access [1]
when the attribute type is used as an index into the policy array.

Fix by setting the maximum attribute type according to the policy size,
as is already done for RTM_NEWNEXTHOP messages. Add a test case that
triggers the bug.

No regressions in fib nexthops tests:

 # ./fib_nexthops.sh
 [...]
 Tests passed: 236
 Tests failed:   0

[1]
BUG: KASAN: global-out-of-bounds in __nla_validate_parse+0x1e53/0x2940
Read of size 1 at addr ffffffff99ab4d20 by task ip/610

CPU: 3 PID: 610 Comm: ip Not tainted 6.8.0-rc7-custom-gd435d6e3e161 #9
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x8f/0xe0
 print_report+0xcf/0x670
 kasan_report+0xd8/0x110
 __nla_validate_parse+0x1e53/0x2940
 __nla_parse+0x40/0x50
 rtm_del_nexthop+0x1bd/0x400
 rtnetlink_rcv_msg+0x3cc/0xf20
 netlink_rcv_skb+0x170/0x440
 netlink_unicast+0x540/0x820
 netlink_sendmsg+0x8d3/0xdb0
 ____sys_sendmsg+0x31f/0xa60
 ___sys_sendmsg+0x13a/0x1e0
 __sys_sendmsg+0x11c/0x1f0
 do_syscall_64+0xc5/0x1d0
 entry_SYSCALL_64_after_hwframe+0x63/0x6b
[...]

The buggy address belongs to the variable:
 rtm_nh_policy_del+0x20/0x40

Fixes: 2118f9390d ("net: nexthop: Adjust netlink policy parsing for a new attribute")
Reported-by: Eric Dumazet <edumazet@google.com>
Closes: https://lore.kernel.org/netdev/CANn89i+UNcG0PJMW5X7gOMunF38ryMh=L1aeZUKH3kL4UdUqag@mail.gmail.com/
Reported-by: syzbot+65bb09a7208ce3d4a633@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/00000000000088981b06133bc07b@google.com/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240311162307.545385-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 20:35:20 -07:00
Jakub Kicinski
5f20e6ab1f for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmXvm7IACgkQ6rmadz2v
 bTqdMA//VMHNHVLb4oROoXyQD9fw2mCmIUEKzP88RXfqcxsfEX7HF+k8B5ZTk0ro
 CHXTAnc79+Qqg0j24bkQKxup/fKBQVw9D+Ia4b3ytlm1I2MtyU/16xNEzVhAPU2D
 iKk6mVBsEdCbt/GjpWORy/VVnZlZpC7BOpZLxsbbxgXOndnCegyjXzSnLGJGxdvi
 zkrQTn2SrFzLi6aNpVLqrv6Nks6HJusfCKsIrtlbkQ85dulasHOtwK9s6GF60nte
 aaho+MPx3L+lWEgapsm8rR779pHaYIB/GbZUgEPxE/xUJ/V8BzDgFNLMzEiIBRMN
 a0zZam11BkBzCfcO9gkvDRByaei/dZz2jdqfU4GlHklFj1WFfz8Q7fRLEPINksvj
 WXLgJADGY5mtGbjG21FScThxzj+Ruqwx0a13ddlyI/W+P3y5yzSWsLwJG5F9p0oU
 6nlkJ4U8yg+9E1ie5ae0TibqvRJzXPjfOERZGwYDSVvfQGzv1z+DGSOPMmgNcWYM
 dIaO+A/+NS3zdbk8+1PP2SBbhHPk6kWyCUByWc7wMzCPTiwriFGY/DD2sN+Fsufo
 zorzfikUQOlTfzzD5jbmT49U8hUQUf6QIWsu7BijSiHaaC7am4S8QB2O6ibJMqdv
 yNiwvuX+ThgVIY3QKrLLqL0KPGeKMR5mtfq6rrwSpfp/b4g27FE=
 =eFgA
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2024-03-11

We've added 59 non-merge commits during the last 9 day(s) which contain
a total of 88 files changed, 4181 insertions(+), 590 deletions(-).

The main changes are:

1) Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce
   VM_SPARSE kind and vm_area_[un]map_pages to be used in bpf_arena,
   from Alexei.

2) Introduce bpf_arena which is sparse shared memory region between bpf
   program and user space where structures inside the arena can have
   pointers to other areas of the arena, and pointers work seamlessly for
   both user-space programs and bpf programs, from Alexei and Andrii.

3) Introduce may_goto instruction that is a contract between the verifier
   and the program. The verifier allows the program to loop assuming it's
   behaving well, but reserves the right to terminate it, from Alexei.

4) Use IETF format for field definitions in the BPF standard
   document, from Dave.

5) Extend struct_ops libbpf APIs to allow specify version suffixes for
   stuct_ops map types, share the same BPF program between several map
   definitions, and other improvements, from Eduard.

6) Enable struct_ops support for more than one page in trampolines,
   from Kui-Feng.

7) Support kCFI + BPF on riscv64, from Puranjay.

8) Use bpf_prog_pack for arm64 bpf trampoline, from Puranjay.

9) Fix roundup_pow_of_two undefined behavior on 32-bit archs, from Toke.
====================

Link: https://lore.kernel.org/r/20240312003646.8692-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 18:06:04 -07:00
Jiri Olsa
379b97bbf0 selftests/bpf: Add kprobe multi triggering benchmarks
Adding kprobe multi triggering benchmarks. It's useful now to bench
new fprobe implementation and might be useful later as well.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240311211023.590321-1-jolsa@kernel.org
2024-03-11 16:06:48 -07:00
Alexei Starovoitov
8df839ae23 selftests/bpf: Add bpf_arena_htab test.
bpf_arena_htab.h - hash table implemented as bpf program

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240308010812.89848-15-alexei.starovoitov@gmail.com
2024-03-11 15:43:43 -07:00
Alexei Starovoitov
9f2c156f90 selftests/bpf: Add bpf_arena_list test.
bpf_arena_alloc.h - implements page_frag allocator as a bpf program.
bpf_arena_list.h - doubly linked link list as a bpf program.

Compiled as a bpf program and as native C code.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240308010812.89848-14-alexei.starovoitov@gmail.com
2024-03-11 15:43:43 -07:00
Alexei Starovoitov
80a4129fcf selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages
Add unit tests for bpf_arena_alloc/free_pages() functionality
and bpf_arena_common.h with a set of common helpers and macros that
is used in this test and the following patches.

Also modify test_loader that didn't support running bpf_prog_type_syscall
programs.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240308010812.89848-13-alexei.starovoitov@gmail.com
2024-03-11 15:43:43 -07:00
Alexei Starovoitov
204c628730 bpf: Add helper macro bpf_addr_space_cast()
Introduce helper macro bpf_addr_space_cast() that emits:
rX = rX
instruction with off = BPF_ADDR_SPACE_CAST
and encodes dest and src address_space-s into imm32.

It's useful with older LLVM that doesn't emit this insn automatically.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20240308010812.89848-12-alexei.starovoitov@gmail.com
2024-03-11 15:43:42 -07:00
Geliang Tang
8f7a69a8e7 selftests: mptcp: use KSFT_SKIP/KSFT_PASS/KSFT_FAIL
This patch uses the public var KSFT_SKIP in mptcp_lib.sh instead of
ksft_skip, and drop 'ksft_skip=4' in mptcp_join.sh.

Use KSFT_PASS and KSFT_FAIL macros instead of 0 and 1 after 'exit '
and 'ret=' in all scripts:

        exit 0 -> exit ${KSFT_PASS}
        exit 1 -> exit ${KSFT_FAIL}
         ret=0 ->  ret=${KSFT_PASS}
         ret=1 ->  ret=${KSFT_FAIL}

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-15-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:27 -07:00
Geliang Tang
23a0485d1c selftests: mptcp: declare event macros in mptcp_lib
MPTCP event macros (SUB_ESTABLISHED, LISTENER_CREATED, LISTENER_CLOSED),
and the protocol family macros (AF_INET, AF_INET6) are defined in both
mptcp_join.sh and userspace_pm.sh. In order not to duplicate code, this
patch declares them all in mptcp_lib.sh with MPTCP_LIB_ prefixs.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-14-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:27 -07:00
Geliang Tang
7f0782ca1c selftests: mptcp: add mptcp_lib_verify_listener_events
To avoid duplicated code in different MPTCP selftests, we can add and use
helpers defined in mptcp_lib.sh.

The helper verify_listener_events() is defined both in mptcp_join.sh and
userspace_pm.sh, export it into mptcp_lib.sh and rename it with mptcp_lib_
prefix. Use this new helper in both scripts.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-13-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:27 -07:00
Geliang Tang
8ebb441965 selftests: mptcp: print_test out of verify_listener_events
verify_listener_events() helper will be exported into mptcp_lib.sh as a
public function, but print_test() is invoked in it, which is a private
function in userspace_pm.sh only. So this patch moves print_test() out of
verify_listener_events().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-12-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:27 -07:00
Geliang Tang
663260e146 selftests: mptcp: extract mptcp_lib_check_expected
Extract the main part of check_expected() in userspace_pm.sh to a new
function mptcp_lib_check_expected() in mptcp_lib.sh. It will be used
in both mptcp_john.sh and userspace_pm.sh. check_expected_one() is
moved into mptcp_lib.sh too as mptcp_lib_check_expected_one().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-11-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:26 -07:00
Geliang Tang
339c225e2e selftests: mptcp: call test_fail without argument
This patch modifies test_fail() to call mptcp_lib_pr_fail() only if there
are arguments (if [ ${#} -gt 0 ]) in userspace_pm.sh, add arguments
"unexpected type: ${type}" when calling test_fail() from test_remove().
Then mptcp_lib_pr_fail() can be used in check_expected_one() instead of
test_fail().

The same in mptcp_join.sh, calling fail_test() without argument, and adapt
this helper not to call print_fail() in this case.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-10-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:26 -07:00
Geliang Tang
747ba8783a selftests: mptcp: print test results with colors
To unify the output formats of all test scripts, this patch adds
four more helpers:

	mptcp_lib_pr_ok()
	mptcp_lib_pr_skip()
	mptcp_lib_pr_fail()
	mptcp_lib_pr_info()

to print out [ OK ], [SKIP], [FAIL] and 'INFO: ' with colors. Use them
in all scripts to print the "ok/skip/fail/info' using the same 'format'.

Having colors helps to quickly identify issues when looking at a long
list of output logs and results.

Note that now all print the same keywords, which was not the case
before, but it is good to uniform that.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-9-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:26 -07:00
Geliang Tang
e7c42bf4d3 selftests: mptcp: use += operator to append strings
This patch uses addition assignment operator (+=) to append strings
instead of duplicating the variable name in mptcp_connect.sh and
mptcp_join.sh.

This can make the statements shorter.

Note: in mptcp_connect.sh, add a local variable extra in do_transfer to
save the various extra warning logs, using += to append it. And add a
new variable tc_info to save various tc info, also using += to append it.
This can make the code more readable and prepare for the next commit.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-8-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:26 -07:00
Geliang Tang
aa7694766f selftests: mptcp: print test results with counters
This patch adds a new helper mptcp_lib_print_title(), a wrapper of
mptcp_lib_inc_test_counter() and mptcp_lib_pr_title_counter(), to
print out test counter in each test result and increase the counter.
Use this helper to print out test counters for every tests in diag.sh,
mptcp_connect.sh, mptcp_sockopt.sh, pm_netlink.sh, simult_flows.sh,
and userspace_pm.sh.

diag.sh:

01 no msk on netns creation                          [  ok  ]
02 listen match for dport 10000                      [  ok  ]
03 listen match for sport 10000                      [  ok  ]
04 listen match for saddr and sport                  [  ok  ]
05 all listen sockets                                [  ok  ]

mptcp_connect.sh:

01 New MPTCP socket can be blocked via sysctl                       [ OK ]
02 Validating network environment with pings                        [ OK ]
INFO: Using loss of 0.85% delay 31 ms reorder .. with delay 7ms on ns3eth4
03 ns1 MPTCP -> ns1 (10.0.1.1:10000  ) MPTCP     (duration    69ms) [ OK ]
04 ns1 MPTCP -> ns1 (10.0.1.1:10001  ) TCP       (duration    20ms) [ OK ]
05 ns1 TCP   -> ns1 (10.0.1.1:10002  ) MPTCP     (duration    16ms) [ OK ]

mptcp_sockopt.sh:

01 Transfer v4                                       [ OK ]
02 Mark v4                                           [ OK ]
03 Transfer v6                                       [ OK ]
04 Mark v6                                           [ OK ]
05 SOL_MPTCP sockopt v4                              [ OK ]

pm_netlink.sh:

01 defaults addr list                                [ OK ]
02 simple add/get addr                               [ OK ]
03 dump addrs                                        [ OK ]
04 simple del addr                                   [ OK ]
05 dump addrs after del                              [ OK ]

simult_flows.sh:

01 balanced bwidth                                     7391 max 8456 [ OK ]
02 balanced bwidth - reverse direction                 7403 max 8456 [ OK ]
03 balanced bwidth with unbalanced delay               7429 max 8456 [ OK ]
04 balanced bwidth with unbalanced delay - reverse ... 7485 max 8456 [ OK ]
05 unbalanced bwidth                                   7549 max 8456 [ OK ]

userspace_pm.sh:

01 Created network namespaces ns1, ns2                               [ OK ]
INFO: Make connections
02 Established IPv4 MPTCP Connection ns2 => ns1                      [ OK ]
03 Established IPv6 MPTCP Connection ns2 => ns1                      [ OK ]
INFO: Announce tests
04 ADD_ADDR 10.0.2.2 (ns2) => ns1, invalid token                     [ OK ]
05 ADD_ADDR id:67 10.0.2.2 (ns2) => ns1, reuse port                  [ OK ]

Having test counters helps to quickly identify issues when looking at a
long list of output logs and results.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-7-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:26 -07:00
Geliang Tang
3382bb0970 selftests: mptcp: add print_title in mptcp_lib
This patch adds a new variable MPTCP_LIB_TEST_FORMAT as the test title
printing format. Also add a helper mptcp_lib_print_title() to use this
format to print the test title with test counters. They are used in
mptcp_join.sh first.

Each MPTCP selftest is having subtests, and it helps to give them a
number to quickly identify them. This can be managed by mptcp_lib.sh,
reusing what has been done here. The following commit will use these
new helpers in the other tests.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-6-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:26 -07:00
Geliang Tang
9e6a39ecb9 selftests: mptcp: export TEST_COUNTER variable
Variable TEST_COUNT are used in mptcp_connect.sh and mptcp_join.sh as
test counters, which are initialized to 0, while variable test_cnt are used
in diag.sh and simult_flows.sh, which are initialized to 1. To maintain
consistency, this patch renames them all as MPTCP_LIB_TEST_COUNTER,
initializes it to 1, and exports it into mptcp_lib.sh.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-5-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:26 -07:00
Geliang Tang
fd959262c1 selftests: mptcp: sockopt: print every test result
Only total test results are printed out in mptcp_sockopt.sh:

PASS: all packets had packet mark set
PASS: SOL_MPTCP getsockopt has expected information
PASS: TCP_INQ cmsg/ioctl -t tcp
PASS: TCP_INQ cmsg/ioctl -6 -t tcp
PASS: TCP_INQ cmsg/ioctl -r tcp
PASS: TCP_INQ cmsg/ioctl -6 -r tcp
PASS: TCP_INQ cmsg/ioctl -r tcp -t tcp

They mismatch with the test results:

ok 1 - mptcp_sockopt: mark ipv4
ok 2 - mptcp_sockopt: transfer ipv4
ok 3 - mptcp_sockopt: mark ipv6
ok 4 - mptcp_sockopt: transfer ipv6
ok 5 - mptcp_sockopt: sockopt v4
ok 6 - mptcp_sockopt: sockopt v6
ok 7 - mptcp_sockopt: TCP_INQ: -t tcp
ok 8 - mptcp_sockopt: TCP_INQ: -6 -t tcp
ok 9 - mptcp_sockopt: TCP_INQ: -r tcp
ok 10 - mptcp_sockopt: TCP_INQ: -6 -r tcp
ok 11 - mptcp_sockopt: TCP_INQ: -r tcp -t tcp

'mptcp_sockopt.sh' now display more detailed results + why (what you had
in a former patch from v6, merged here). It no longer displays 'PASS:',
because it is duplicated info now that the detailed are displayed:

Transfer v4                                       [ OK ]
Mark v4                                           [ OK ]
Transfer v6                                       [ OK ]
Mark v6                                           [ OK ]
SOL_MPTCP sockopt v4                              [ OK ]
SOL_MPTCP sockopt v6                              [ OK ]
TCP_INQ cmsg/ioctl -t tcp                         [ OK ]
TCP_INQ cmsg/ioctl -6 -t tcp                      [ OK ]
TCP_INQ cmsg/ioctl -r tcp                         [ OK ]
TCP_INQ cmsg/ioctl -6 -r tcp                      [ OK ]
TCP_INQ cmsg/ioctl -r tcp -t tcp                  [ OK ]

Also fix the TAP output:

ok 1 - mptcp_sockopt: transfer ipv4
ok 2 - mptcp_sockopt: mark ipv4
ok 3 - mptcp_sockopt: transfer ipv6
ok 4 - mptcp_sockopt: mark ipv6
ok 5 - mptcp_sockopt: sockopt v4
ok 6 - mptcp_sockopt: sockopt v6
ok 7 - mptcp_sockopt: TCP_INQ: -t tcp
ok 8 - mptcp_sockopt: TCP_INQ: -6 -t tcp
ok 9 - mptcp_sockopt: TCP_INQ: -r tcp
ok 10 - mptcp_sockopt: TCP_INQ: -6 -r tcp
ok 11 - mptcp_sockopt: TCP_INQ: -r tcp -t tcp

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-4-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:25 -07:00
Geliang Tang
c9161a0f8f selftests: mptcp: connect: fix misaligned output
The first [ OK ] in the output of mptcp_connect.sh misaligns with the
others:

New MPTCP socket can be blocked via sysctl              [ OK ]
INFO: validating network environment with pings
INFO: Using loss of 0.85% delay 16 ms reorder 95% 70% with delay 4ms on
ns1 MPTCP -> ns1 (10.0.1.1:10000      ) MPTCP   (duration   184ms) [ OK ]
ns1 MPTCP -> ns1 (10.0.1.1:10001      ) TCP     (duration    50ms) [ OK ]
ns1 TCP   -> ns1 (10.0.1.1:10002      ) MPTCP   (duration    55ms) [ OK ]

This patch aligns them by using 69 chars to display the first two lines,
and 50 chars for the other. Since 19 chars are used to display duration
time. Also print out a [ OK ] at the end of the 2nd line for consistency.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-3-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:25 -07:00
Geliang Tang
01ed983810 selftests: mptcp: connect: add dedicated port counter
This patch adds a new dedicated counter 'port' instead of TEST_COUNT
to increase port numbers in mptcp_connect.sh.

This can avoid outputting discontinuous test counters.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-2-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:25 -07:00
Geliang Tang
6215df11b9 selftests: mptcp: print all error messages to stdout
Some error messages are printed to stderr while the others are printed
to 'stdout'. As part of the unification, this patch drop "1>&2" to let
all errors messages are printed to 'stdout'.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-1-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:25 -07:00
Linus Torvalds
d08c407f71 A large set of updates and features for timers and timekeeping:
- The hierarchical timer pull model
 
     When timer wheel timers are armed they are placed into the timer wheel
     of a CPU which is likely to be busy at the time of expiry. This is done
     to avoid wakeups on potentially idle CPUs.
 
     This is wrong in several aspects:
 
      1) The heuristics to select the target CPU are wrong by
         definition as the chance to get the prediction right is close
         to zero.
 
      2) Due to #1 it is possible that timers are accumulated on a
         single target CPU
 
      3) The required computation in the enqueue path is just overhead for
      	dubious value especially under the consideration that the vast
      	majority of timer wheel timers are either canceled or rearmed
      	before they expire.
 
     The timer pull model avoids the above by removing the target
     computation on enqueue and queueing timers always on the CPU on which
     they get armed.
 
     This is achieved by having separate wheels for CPU pinned timers and
     global timers which do not care about where they expire.
 
     As long as a CPU is busy it handles both the pinned and the global
     timers which are queued on the CPU local timer wheels.
 
     When a CPU goes idle it evaluates its own timer wheels:
 
       - If the first expiring timer is a pinned timer, then the global
       	timers can be ignored as the CPU will wake up before they expire.
 
       - If the first expiring timer is a global timer, then the expiry time
         is propagated into the timer pull hierarchy and the CPU makes sure
         to wake up for the first pinned timer.
 
     The timer pull hierarchy organizes CPUs in groups of eight at the
     lowest level and at the next levels groups of eight groups up to the
     point where no further aggregation of groups is required, i.e. the
     number of levels is log8(NR_CPUS). The magic number of eight has been
     established by experimention, but can be adjusted if needed.
 
     In each group one busy CPU acts as the migrator. It's only one CPU to
     avoid lock contention on remote timer wheels.
 
     The migrator CPU checks in its own timer wheel handling whether there
     are other CPUs in the group which have gone idle and have global timers
     to expire. If there are global timers to expire, the migrator locks the
     remote CPU timer wheel and handles the expiry.
 
     Depending on the group level in the hierarchy this handling can require
     to walk the hierarchy downwards to the CPU level.
 
     Special care is taken when the last CPU goes idle. At this point the
     CPU is the systemwide migrator at the top of the hierarchy and it
     therefore cannot delegate to the hierarchy. It needs to arm its own
     timer device to expire either at the first expiring timer in the
     hierarchy or at the first CPU local timer, which ever expires first.
 
     This completely removes the overhead from the enqueue path, which is
     e.g. for networking a true hotpath and trades it for a slightly more
     complex idle path.
 
     This has been in development for a couple of years and the final series
     has been extensively tested by various teams from silicon vendors and
     ran through extensive CI.
 
     There have been slight performance improvements observed on network
     centric workloads and an Intel team confirmed that this allows them to
     power down a die completely on a mult-die socket for the first time in
     a mostly idle scenario.
 
     There is only one outstanding ~1.5% regression on a specific overloaded
     netperf test which is currently investigated, but the rest is either
     positive or neutral performance wise and positive on the power
     management side.
 
   - Fixes for the timekeeping interpolation code for cross-timestamps:
 
     cross-timestamps are used for PTP to get snapshots from hardware timers
     and interpolated them back to clock MONOTONIC. The changes address a
     few corner cases in the interpolation code which got the math and logic
     wrong.
 
   - Simplifcation of the clocksource watchdog retry logic to automatically
     adjust to handle larger systems correctly instead of having more
     incomprehensible command line parameters.
 
   - Treewide consolidation of the VDSO data structures.
 
   - The usual small improvements and cleanups all over the place.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmXuAN0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoVKXEADIR45rjR1Xtz32js7B53Y65O4WNoOQ
 6/ycWcswuGzg/h4QUpPSJ6gOGVmKSWwZi4n0P/VadCiXGSPPm0aUKsoRUt9DZsPY
 mtj2wjCSXKXiyhTl9OtrZME86ZAIGO1dQXa/sOHsiP5PCjgQkD0b5CYi1+B6eHDt
 1/Uo2Tb9g8VAPppq20V5Uo93GrPf642oyi3FCFrR1M112Uuak5DmqHJYiDpreNcG
 D5SgI+ykSiaUaVyHifvqijoJk0rYXkqEC6evl02477lJ/X0vVo2/M8XPS95BxHST
 s5Iruo4rP+qeAy8QvhZpoPX59fO0m/AgA7cf77XXAtOpVdLH+bs4ILsEbouAIOtv
 lsmRkcYt+TpvrZFHPAxks+6g3afuROiDtxD5sXXpVWxvofi8FwWqubdlqdsbw9MP
 ZCTNyzNyKL47QeDwBfSynYUL1RSyqsphtIwk4oeQklH9rwMAnW21hi30z15hQ0pQ
 FOVkmcwi79JNvl/G+jRkDzw7r8/zcHshWdSjyUM04CDjjnCDjQOFWSIjEPwbQjjz
 S4HXpJKJW963dBgs9Z84/Ctw1GwoBk1qedDWDJE1257Qvmo/Wpe/7GddWcazOGnN
 RRFMzGPbOqBDbjtErOKGU+iCisgNEvz2XK+TI16uRjWde7DxZpiTVYgNDrZ+/Pyh
 rQ23UBms6ZRR+A==
 =iQlu
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
 "A large set of updates and features for timers and timekeeping:

   - The hierarchical timer pull model

     When timer wheel timers are armed they are placed into the timer
     wheel of a CPU which is likely to be busy at the time of expiry.
     This is done to avoid wakeups on potentially idle CPUs.

     This is wrong in several aspects:

       1) The heuristics to select the target CPU are wrong by
          definition as the chance to get the prediction right is
          close to zero.

       2) Due to #1 it is possible that timers are accumulated on
          a single target CPU

       3) The required computation in the enqueue path is just overhead
          for dubious value especially under the consideration that the
          vast majority of timer wheel timers are either canceled or
          rearmed before they expire.

     The timer pull model avoids the above by removing the target
     computation on enqueue and queueing timers always on the CPU on
     which they get armed.

     This is achieved by having separate wheels for CPU pinned timers
     and global timers which do not care about where they expire.

     As long as a CPU is busy it handles both the pinned and the global
     timers which are queued on the CPU local timer wheels.

     When a CPU goes idle it evaluates its own timer wheels:

       - If the first expiring timer is a pinned timer, then the global
         timers can be ignored as the CPU will wake up before they
         expire.

       - If the first expiring timer is a global timer, then the expiry
         time is propagated into the timer pull hierarchy and the CPU
         makes sure to wake up for the first pinned timer.

     The timer pull hierarchy organizes CPUs in groups of eight at the
     lowest level and at the next levels groups of eight groups up to
     the point where no further aggregation of groups is required, i.e.
     the number of levels is log8(NR_CPUS). The magic number of eight
     has been established by experimention, but can be adjusted if
     needed.

     In each group one busy CPU acts as the migrator. It's only one CPU
     to avoid lock contention on remote timer wheels.

     The migrator CPU checks in its own timer wheel handling whether
     there are other CPUs in the group which have gone idle and have
     global timers to expire. If there are global timers to expire, the
     migrator locks the remote CPU timer wheel and handles the expiry.

     Depending on the group level in the hierarchy this handling can
     require to walk the hierarchy downwards to the CPU level.

     Special care is taken when the last CPU goes idle. At this point
     the CPU is the systemwide migrator at the top of the hierarchy and
     it therefore cannot delegate to the hierarchy. It needs to arm its
     own timer device to expire either at the first expiring timer in
     the hierarchy or at the first CPU local timer, which ever expires
     first.

     This completely removes the overhead from the enqueue path, which
     is e.g. for networking a true hotpath and trades it for a slightly
     more complex idle path.

     This has been in development for a couple of years and the final
     series has been extensively tested by various teams from silicon
     vendors and ran through extensive CI.

     There have been slight performance improvements observed on network
     centric workloads and an Intel team confirmed that this allows them
     to power down a die completely on a mult-die socket for the first
     time in a mostly idle scenario.

     There is only one outstanding ~1.5% regression on a specific
     overloaded netperf test which is currently investigated, but the
     rest is either positive or neutral performance wise and positive on
     the power management side.

   - Fixes for the timekeeping interpolation code for cross-timestamps:

     cross-timestamps are used for PTP to get snapshots from hardware
     timers and interpolated them back to clock MONOTONIC. The changes
     address a few corner cases in the interpolation code which got the
     math and logic wrong.

   - Simplifcation of the clocksource watchdog retry logic to
     automatically adjust to handle larger systems correctly instead of
     having more incomprehensible command line parameters.

   - Treewide consolidation of the VDSO data structures.

   - The usual small improvements and cleanups all over the place"

* tag 'timers-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
  timer/migration: Fix quick check reporting late expiry
  tick/sched: Fix build failure for CONFIG_NO_HZ_COMMON=n
  vdso/datapage: Quick fix - use asm/page-def.h for ARM64
  timers: Assert no next dyntick timer look-up while CPU is offline
  tick: Assume timekeeping is correctly handed over upon last offline idle call
  tick: Shut down low-res tick from dying CPU
  tick: Split nohz and highres features from nohz_mode
  tick: Move individual bit features to debuggable mask accesses
  tick: Move got_idle_tick away from common flags
  tick: Assume the tick can't be stopped in NOHZ_MODE_INACTIVE mode
  tick: Move broadcast cancellation up to CPUHP_AP_TICK_DYING
  tick: Move tick cancellation up to CPUHP_AP_TICK_DYING
  tick: Start centralizing tick related CPU hotplug operations
  tick/sched: Don't clear ts::next_tick again in can_stop_idle_tick()
  tick/sched: Rename tick_nohz_stop_sched_tick() to tick_nohz_full_stop_tick()
  tick: Use IS_ENABLED() whenever possible
  tick/sched: Remove useless oneshot ifdeffery
  tick/nohz: Remove duplicate between lowres and highres handlers
  tick/nohz: Remove duplicate between tick_nohz_switch_to_nohz() and tick_setup_sched_timer()
  hrtimer: Select housekeeping CPU during migration
  ...
2024-03-11 14:38:26 -07:00
Petr Machata
a22b042660 selftests: forwarding: Add a test for NH group stats
Add to lib.sh support for fetching NH stats, and a new library,
router_mpath_nh_lib.sh, with the common code for testing NH stats.
Use the latter from router_mpath_nh.sh and router_mpath_nh_res.sh.

The test works by sending traffic through a NH group, and checking that the
reported values correspond to what the link that ultimately receives the
traffic reports having seen.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/2a424c54062a5f1efd13b9ec5b2b0e29c6af2574.1709901020.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 14:14:10 -07:00
Linus Torvalds
b5683a37c8 vfs-6.9.pidfd
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem4/wAKCRCRxhvAZXjc
 opnBAQCaQWwxjT0VLHebPniw6tel/KYlZ9jH9kBQwLrk1pembwEA+BsCY2C8YS4a
 75v9jOPxr+Z8j1SjxwwubcONPyqYXwQ=
 =+Wa3
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.9.pidfd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull pdfd updates from Christian Brauner:

 - Until now pidfds could only be created for thread-group leaders but
   not for threads. There was no technical reason for this. We simply
   had no users that needed support for this. Now we do have users that
   need support for this.

   This introduces a new PIDFD_THREAD flag for pidfd_open(). If that
   flag is set pidfd_open() creates a pidfd that refers to a specific
   thread.

   In addition, we now allow clone() and clone3() to be called with
   CLONE_PIDFD | CLONE_THREAD which wasn't possible before.

   A pidfd that refers to an individual thread differs from a pidfd that
   refers to a thread-group leader:

    (1) Pidfds are pollable. A task may poll a pidfd and get notified
        when the task has exited.

        For thread-group leader pidfds the polling task is woken if the
        thread-group is empty. In other words, if the thread-group
        leader task exits when there are still threads alive in its
        thread-group the polling task will not be woken when the
        thread-group leader exits but rather when the last thread in the
        thread-group exits.

        For thread-specific pidfds the polling task is woken if the
        thread exits.

    (2) Passing a thread-group leader pidfd to pidfd_send_signal() will
        generate thread-group directed signals like kill(2) does.

        Passing a thread-specific pidfd to pidfd_send_signal() will
        generate thread-specific signals like tgkill(2) does.

        The default scope of the signal is thus determined by the type
        of the pidfd.

        Since use-cases exist where the default scope of the provided
        pidfd needs to be overriden the following flags are added to
        pidfd_send_signal():

         - PIDFD_SIGNAL_THREAD
           Send a thread-specific signal.

         - PIDFD_SIGNAL_THREAD_GROUP
           Send a thread-group directed signal.

         - PIDFD_SIGNAL_PROCESS_GROUP
           Send a process-group directed signal.

        The scope change will only work if the struct pid is actually
        used for this scope.

        For example, in order to send a thread-group directed signal the
        provided pidfd must be used as a thread-group leader and
        similarly for PIDFD_SIGNAL_PROCESS_GROUP the struct pid must be
        used as a process group leader.

 - Move pidfds from the anonymous inode infrastructure to a tiny pseudo
   filesystem. This will unblock further work that we weren't able to do
   simply because of the very justified limitations of anonymous inodes.
   Moving pidfds to a tiny pseudo filesystem allows for statx on pidfds
   to become useful for the first time. They can now be compared by
   inode number which are unique for the system lifetime.

   Instead of stashing struct pid in file->private_data we can now stash
   it in inode->i_private. This makes it possible to introduce concepts
   that operate on a process once all file descriptors have been closed.
   A concrete example is kill-on-last-close. Another side-effect is that
   file->private_data is now freed up for per-file options for pidfds.

   Now, each struct pid will refer to a different inode but the same
   struct pid will refer to the same inode if it's opened multiple
   times. In contrast to now where each struct pid refers to the same
   inode.

   The tiny pseudo filesystem is not visible anywhere in userspace
   exactly like e.g., pipefs and sockfs. There's no lookup, there's no
   complex inode operations, nothing. Dentries and inodes are always
   deleted when the last pidfd is closed.

   We allocate a new inode and dentry for each struct pid and we reuse
   that inode and dentry for all pidfds that refer to the same struct
   pid. The code is entirely optional and fairly small. If it's not
   selected we fallback to anonymous inodes. Heavily inspired by nsfs.

   The dentry and inode allocation mechanism is moved into generic
   infrastructure that is now shared between nsfs and pidfs. The
   path_from_stashed() helper must be provided with a stashing location,
   an inode number, a mount, and the private data that is supposed to be
   used and it will provide a path that can be passed to dentry_open().

   The helper will try retrieve an existing dentry from the provided
   stashing location. If a valid dentry is found it is reused. If not a
   new one is allocated and we try to stash it in the provided location.
   If this fails we retry until we either find an existing dentry or the
   newly allocated dentry could be stashed. Subsequent openers of the
   same namespace or task are then able to reuse it.

 - Currently it is only possible to get notified when a task has exited,
   i.e., become a zombie and userspace gets notified with EPOLLIN. We
   now also support waiting until the task has been reaped, notifying
   userspace with EPOLLHUP.

 - Ensure that ESRCH is reported for getfd if a task is exiting instead
   of the confusing EBADF.

 - Various smaller cleanups to pidfd functions.

* tag 'vfs-6.9.pidfd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (23 commits)
  libfs: improve path_from_stashed()
  libfs: add stashed_dentry_prune()
  libfs: improve path_from_stashed() helper
  pidfs: convert to path_from_stashed() helper
  nsfs: convert to path_from_stashed() helper
  libfs: add path_from_stashed()
  pidfd: add pidfs
  pidfd: move struct pidfd_fops
  pidfd: allow to override signal scope in pidfd_send_signal()
  pidfd: change pidfd_send_signal() to respect PIDFD_THREAD
  signal: fill in si_code in prepare_kill_siginfo()
  selftests: add ESRCH tests for pidfd_getfd()
  pidfd: getfd should always report ESRCH if a task is exiting
  pidfd: clone: allow CLONE_THREAD | CLONE_PIDFD together
  pidfd: exit: kill the no longer used thread_group_exited()
  pidfd: change do_notify_pidfd() to use __wake_up(poll_to_key(EPOLLIN))
  pid: kill the obsolete PIDTYPE_PID code in transfer_pid()
  pidfd: kill the no longer needed do_notify_pidfd() in de_thread()
  pidfd_poll: report POLLHUP when pid_task() == NULL
  pidfd: implement PIDFD_THREAD flag for pidfd_open()
  ...
2024-03-11 10:21:06 -07:00
Linus Torvalds
7ea65c89d8 vfs-6.9.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem3wQAKCRCRxhvAZXjc
 otRMAQDeo8qsuuIAcS2KUicKqZR5yMVvrY9r4sQzf7YRcJo5HQD+NQXkKwQuv1VO
 OUeScsic/+I+136AgdjWnlEYO5dp0go=
 =4WKU
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "Misc features, cleanups, and fixes for vfs and individual filesystems.

  Features:

   - Support idmapped mounts for hugetlbfs.

   - Add RWF_NOAPPEND flag for pwritev2(). This allows us to fix a bug
     where the passed offset is ignored if the file is O_APPEND. The new
     flag allows a caller to enforce that the offset is honored to
     conform to posix even if the file was opened in append mode.

   - Move i_mmap_rwsem in struct address_space to avoid false sharing
     between i_mmap and i_mmap_rwsem.

   - Convert efs, qnx4, and coda to use the new mount api.

   - Add a generic is_dot_dotdot() helper that's used by various
     filesystems and the VFS code instead of open-coding it multiple
     times.

   - Recently we've added stable offsets which allows stable ordering
     when iterating directories exported through NFS on e.g., tmpfs
     filesystems. Originally an xarray was used for the offset map but
     that caused slab fragmentation issues over time. This switches the
     offset map to the maple tree which has a dense mode that handles
     this scenario a lot better. Includes tests.

   - Finally merge the case-insensitive improvement series Gabriel has
     been working on for a long time. This cleanly propagates case
     insensitive operations through ->s_d_op which in turn allows us to
     remove the quite ugly generic_set_encrypted_ci_d_ops() operations.
     It also improves performance by trying a case-sensitive comparison
     first and then fallback to case-insensitive lookup if that fails.
     This also fixes a bug where overlayfs would be able to be mounted
     over a case insensitive directory which would lead to all sort of
     odd behaviors.

  Cleanups:

   - Make file_dentry() a simple accessor now that ->d_real() is
     simplified because of the backing file work we did the last two
     cycles.

   - Use the dedicated file_mnt_idmap helper in ntfs3.

   - Use smp_load_acquire/store_release() in the i_size_read/write
     helpers and thus remove the hack to handle i_size reads in the
     filemap code.

   - The SLAB_MEM_SPREAD is a nop now. Remove it from various places in
     fs/

   - It's no longer necessary to perform a second built-in initramfs
     unpack call because we retain the contents of the previous
     extraction. Remove it.

   - Now that we have removed various allocators kfree_rcu() always
     works with kmem caches and kmalloc(). So simplify various places
     that only use an rcu callback in order to handle the kmem cache
     case.

   - Convert the pipe code to use a lockdep comparison function instead
     of open-coding the nesting making lockdep validation easier.

   - Move code into fs-writeback.c that was located in a header but can
     be made static as it's only used in that one file.

   - Rewrite the alignment checking iterators for iovec and bvec to be
     easier to read, and also significantly more compact in terms of
     generated code. This saves 270 bytes of text on x86-64 (with
     clang-18) and 224 bytes on arm64 (with gcc-13). In profiles it also
     saves a bit of time for the same workload.

   - Switch various places to use KMEM_CACHE instead of
     kmem_cache_create().

   - Use inode_set_ctime_to_ts() in inode_set_ctime_current()

   - Use kzalloc() in name_to_handle_at() to avoid kernel infoleak.

   - Various smaller cleanups for eventfds.

  Fixes:

   - Fix various comments and typos, and unneeded initializations.

   - Fix stack allocation hack for clang in the select code.

   - Improve dump_mapping() debug code on a best-effort basis.

   - Fix build errors in various selftests.

   - Avoid wrap-around instrumentation in various places.

   - Don't allow user namespaces without an idmapping to be used for
     idmapped mounts.

   - Fix sysv sb_read() call.

   - Fix fallback implementation of the get_name() export operation"

* tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (70 commits)
  hugetlbfs: support idmapped mounts
  qnx4: convert qnx4 to use the new mount api
  fs: use inode_set_ctime_to_ts to set inode ctime to current time
  libfs: Drop generic_set_encrypted_ci_d_ops
  ubifs: Configure dentry operations at dentry-creation time
  f2fs: Configure dentry operations at dentry-creation time
  ext4: Configure dentry operations at dentry-creation time
  libfs: Add helper to choose dentry operations at mount-time
  libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops
  fscrypt: Drop d_revalidate once the key is added
  fscrypt: Drop d_revalidate for valid dentries during lookup
  fscrypt: Factor out a helper to configure the lookup dentry
  ovl: Always reject mounting over case-insensitive directories
  libfs: Attempt exact-match comparison first during casefolded lookup
  efs: remove SLAB_MEM_SPREAD flag usage
  jfs: remove SLAB_MEM_SPREAD flag usage
  minix: remove SLAB_MEM_SPREAD flag usage
  openpromfs: remove SLAB_MEM_SPREAD flag usage
  proc: remove SLAB_MEM_SPREAD flag usage
  qnx6: remove SLAB_MEM_SPREAD flag usage
  ...
2024-03-11 09:38:17 -07:00
Linus Torvalds
d451b075f7 linux_kselftest-next-6.9-rc1
This kselftest next update for Linux 6.9-rc1 consists of:
 
 -- livepatch restructuring to move the module out of lib to be
    built as a out-of-tree modules during kselftest build. This
    change makes it easier change, debug and rebuild the tests by
    running make on the selftests/livepatch directory, which is not
    currently possible since the modules on lib/livepatch are build
    and installed using the main makefile modules target.
 
 -- livepatch restructuring fixes for problems found by kernel test
    robot. The change skips the test if kernel-devel isn't installed
    (default value of KDIR), or if KDIR variable passed doesn't exists.
 
 -- resctrl test restructuring and new non-contiguous CBMs CAT test
 
 -- new ktap_helpers to print diagnostic messages, pass/fail tests
    based on exit code, abort test, and finish the test.
 
 -- a new test verify power supply properties.
 
 -- a new ftrace to exercise function tracer across cpu hotplug.
 
 -- timeout increase for mqueue test to allow the test to run on
    i3.metal AWS instances.
 
 -- minor spelling corrections in several tests.
 
 -- missing gitignore files and changes to existing gitignore files.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmXo/kUACgkQCwJExA0N
 Qxy0aBAAk0SLA1ZIAdlNjo5B13C7GC7rFRrtaai9ReXSvU/X5TX9sD5T9DIULKdj
 Mcqi+oaP88GPSUZS+bn7DyVxKyuvHg/f4jWQwqZ34WxK4K1K+yt+3YhTnHZx7ezU
 6WIbUsD1Zs7tXXI2v76riHFbD3pfxZ+AXQaf/1cXDi4SpIpLkiqyeYWoWN5Z2rtJ
 BwMzrI2RBiLMox4g8F3Ey4BX+bOIYiiJq5bdl7gJVKcp74VdU3S7IyOuXFbSdcFR
 xxmFMxWGFOgRzexW0fmDWLudD2dII0XQAExSsl5xMnR/lmSh+lHWheoNgphQl050
 VcLmrPugWVJSioe0fHEgmDQXe3lPqDtepUg921tIlWvCmtR3Ur6+GpILTbSvQ4qp
 SK+2pt7nGSAT2UkRO/6/TYFG3mELADvj6tglj0b1SkIXmNiF+7OZ+hJ2XqyM7peo
 Z7gtmSmpbAotxp64Jj8HsNZLpCX0xdaxoTMEWPoG09fwTXY7Hy03yoWDKBKB4MZ9
 jBtNXDolhpEQ/ppSGFnRPzXuNVapYX28UY0cwBBVgke5jwB8SUnBEr2dbNnVU1q0
 y5uxtj/EFQzxSynB3eM1us2OuXvr5TfAWmKVpyE/cNC3WreHeA+Y2kN1dzv8hgpw
 o4NbltdF8F+a9qQF9B1XvjVhqa5By1esS1jOg96cJgGseAVWiQs=
 =G+DO
 -----END PGP SIGNATURE-----

Merge tag 'linux_kselftest-next-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest update from Shuah Khan:

 - livepatch restructuring to move the module out of lib to be built as
   a out-of-tree modules during kselftest build. This makes it easier
   change, debug and rebuild the tests by running make on the
   selftests/livepatch directory, which is not currently possible since
   the modules on lib/livepatch are build and installed using the main
   makefile modules target.

 - livepatch restructuring fixes for problems found by kernel test
   robot. The change skips the test if kernel-devel isn't installed
   (default value of KDIR), or if KDIR variable passed doesn't exists.

 - resctrl test restructuring and new non-contiguous CBMs CAT test

 - new ktap_helpers to print diagnostic messages, pass/fail tests based
   on exit code, abort test, and finish the test.

 - a new test verify power supply properties.

 - a new ftrace to exercise function tracer across cpu hotplug.

 - timeout increase for mqueue test to allow the test to run on i3.metal
   AWS instances.

 - minor spelling corrections in several tests.

 - missing gitignore files and changes to existing gitignore files.

* tag 'linux_kselftest-next-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (57 commits)
  kselftest: Add basic test for probing the rust sample modules
  selftests: lib.mk: Do not process TEST_GEN_MODS_DIR
  selftests: livepatch: Avoid running the tests if kernel-devel is missing
  selftests: livepatch: Add initial .gitignore
  selftests/resctrl: Add non-contiguous CBMs CAT test
  selftests/resctrl: Add resource_info_file_exists()
  selftests/resctrl: Split validate_resctrl_feature_request()
  selftests/resctrl: Add a helper for the non-contiguous test
  selftests/resctrl: Add test groups and name L3 CAT test L3_CAT
  selftests: sched: Fix spelling mistake "hiearchy" -> "hierarchy"
  selftests/mqueue: Set timeout to 180 seconds
  selftests/ftrace: Add test to exercize function tracer across cpu hotplug
  selftest: ftrace: fix minor typo in log
  selftests: thermal: intel: workload_hint: add missing gitignore
  selftests: thermal: intel: power_floor: add missing gitignore
  selftests: uevent: add missing gitignore
  selftests: Add test to verify power supply properties
  selftests: ktap_helpers: Add a helper to finish the test
  selftests: ktap_helpers: Add a helper to abort the test
  selftests: ktap_helpers: Add helper to pass/fail test based on exit code
  ...
2024-03-11 09:25:33 -07:00
Andrii Nakryiko
365c2b3279 selftests/bpf: Add fexit and kretprobe triggering benchmarks
We already have kprobe and fentry benchmarks. Let's add kretprobe and
fexit ones for completeness.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240309005124.3004446-1-andrii@kernel.org
2024-03-11 17:00:00 +01:00