linux/tools/testing/selftests
Linus Torvalds 8cb1ae19bf x86/fpu updates:
- Cleanup of extable fixup handling to be more robust, which in turn
    allows to make the FPU exception fixups more robust as well.
 
  - Change the return code for signal frame related failures from explicit
    error codes to a boolean fail/success as that's all what the calling
    code evaluates.
 
  - A large refactoring of the FPU code to prepare for adding AMX support:
 
    - Distangle the public header maze and remove especially the misnomed
      kitchen sink internal.h which is despite it's name included all over
      the place.
 
    - Add a proper abstraction for the register buffer storage (struct
      fpstate) which allows to dynamically size the buffer at runtime by
      flipping the pointer to the buffer container from the default
      container which is embedded in task_struct::tread::fpu to a
      dynamically allocated container with a larger register buffer.
 
    - Convert the code over to the new fpstate mechanism.
 
    - Consolidate the KVM FPU handling by moving the FPU related code into
      the FPU core which removes the number of exports and avoids adding
      even more export when AMX has to be supported in KVM. This also
      removes duplicated code which was of course unnecessary different and
      incomplete in the KVM copy.
 
    - Simplify the KVM FPU buffer handling by utilizing the new fpstate
      container and just switching the buffer pointer from the user space
      buffer to the KVM guest buffer when entering vcpu_run() and flipping
      it back when leaving the function. This cuts the memory requirements
      of a vCPU for FPU buffers in half and avoids pointless memory copy
      operations.
 
      This also solves the so far unresolved problem of adding AMX support
      because the current FPU buffer handling of KVM inflicted a circular
      dependency between adding AMX support to the core and to KVM.  With
      the new scheme of switching fpstate AMX support can be added to the
      core code without affecting KVM.
 
    - Replace various variables with proper data structures so the extra
      information required for adding dynamically enabled FPU features (AMX)
      can be added in one place
 
  - Add AMX (Advanved Matrix eXtensions) support (finally):
 
     AMX is a large XSTATE component which is going to be available with
     Saphire Rapids XEON CPUs. The feature comes with an extra MSR (MSR_XFD)
     which allows to trap the (first) use of an AMX related instruction,
     which has two benefits:
 
     1) It allows the kernel to control access to the feature
 
     2) It allows the kernel to dynamically allocate the large register
        state buffer instead of burdening every task with the the extra 8K
        or larger state storage.
 
     It would have been great to gain this kind of control already with
     AVX512.
 
     The support comes with the following infrastructure components:
 
     1) arch_prctl() to
        - read the supported features (equivalent to XGETBV(0))
        - read the permitted features for a task
        - request permission for a dynamically enabled feature
 
        Permission is granted per process, inherited on fork() and cleared
        on exec(). The permission policy of the kernel is restricted to
        sigaltstack size validation, but the syscall obviously allows
        further restrictions via seccomp etc.
 
     2) A stronger sigaltstack size validation for sys_sigaltstack(2) which
        takes granted permissions and the potentially resulting larger
        signal frame into account. This mechanism can also be used to
        enforce factual sigaltstack validation independent of dynamic
        features to help with finding potential victims of the 2K
        sigaltstack size constant which is broken since AVX512 support was
        added.
 
     3) Exception handling for #NM traps to catch first use of a extended
        feature via a new cause MSR. If the exception was caused by the use
        of such a feature, the handler checks permission for that
        feature. If permission has not been granted, the handler sends a
        SIGILL like the #UD handler would do if the feature would have been
        disabled in XCR0. If permission has been granted, then a new fpstate
        which fits the larger buffer requirement is allocated.
 
        In the unlikely case that this allocation fails, the handler sends
        SIGSEGV to the task. That's not elegant, but unavoidable as the
        other discussed options of preallocation or full per task
        permissions come with their own set of horrors for kernel and/or
        userspace. So this is the lesser of the evils and SIGSEGV caused by
        unexpected memory allocation failures is not a fundamentally new
        concept either.
 
        When allocation succeeds, the fpstate properties are filled in to
        reflect the extended feature set and the resulting sizes, the
        fpu::fpstate pointer is updated accordingly and the trap is disarmed
        for this task permanently.
 
     4) Enumeration and size calculations
 
     5) Trap switching via MSR_XFD
 
        The XFD (eXtended Feature Disable) MSR is context switched with the
        same life time rules as the FPU register state itself. The mechanism
        is keyed off with a static key which is default disabled so !AMX
        equipped CPUs have zero overhead. On AMX enabled CPUs the overhead
        is limited by comparing the tasks XFD value with a per CPU shadow
        variable to avoid redundant MSR writes. In case of switching from a
        AMX using task to a non AMX using task or vice versa, the extra MSR
        write is obviously inevitable.
 
        All other places which need to be aware of the variable feature sets
        and resulting variable sizes are not affected at all because they
        retrieve the information (feature set, sizes) unconditonally from
        the fpstate properties.
 
     6) Enable the new AMX states
 
   Note, this is relatively new code despite the fact that AMX support is in
   the works for more than a year now.
 
   The big refactoring of the FPU code, which allowed to do a proper
   integration has been started exactly 3 weeks ago. Refactoring of the
   existing FPU code and of the original AMX patches took a week and has
   been subject to extensive review and testing. The only fallout which has
   not been caught in review and testing right away was restricted to AMX
   enabled systems, which is completely irrelevant for anyone outside Intel
   and their early access program. There might be dragons lurking as usual,
   but so far the fine grained refactoring has held up and eventual yet
   undetected fallout is bisectable and should be easily addressable before
   the 5.16 release. Famous last words...
 
   Many thanks to Chang Bae and Dave Hansen for working hard on this and
   also to the various test teams at Intel who reserved extra capacity to
   follow the rapid development of this closely which provides the
   confidence level required to offer this rather large update for inclusion
   into 5.16-rc1.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmF/NkITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYodDkEADH4+/nN/QoSUHIuuha5Zptj3g2b16a
 /3TxT9fhwPen/kzMGsUk70s3iWJMA+I5dCfkSZexJ2hfhcRe9cBzZIa1HCawKwf3
 YCISTsO/M+LpeORuZ+TpfFLJKnxNr1SEOl+EYffGhq0AkCjifb9Cnr0JZuoMUzGU
 jpfJZ2bj28ri5lG812DtzSMBM9E3SAwgJv+GNjmZbxZKb9mAfhbAMdBUXHirX7Ej
 jmx6koQjYOKwYIW8w1BrdC270lUKQUyJTbQgdRkN9Mh/HnKyFixQ18JqGlgaV2cT
 EtYePUfTEdaHdAhUINLIlEug1MfOslHU+HyGsdywnoChNB4GHPQuePC5Tz60VeFN
 RbQ9aKcBUu8r95rjlnKtAtBijNMA4bjGwllVxNwJ/ZoA9RPv1SbDZ07RX3qTaLVY
 YhVQl8+shD33/W24jUTJv1kMMexpHXIlv0gyfMryzpwI7uzzmGHRPAokJdbYKctC
 dyMPfdE90rxTiMUdL/1IQGhnh3awjbyfArzUhHyQ++HyUyzCFh0slsO0CD18vUy8
 FofhCugGBhjuKw3XwLNQ+KsWURz5qHctSzBc3qMOSyqFHbAJCVRANkhsFvWJo2qL
 75+Z7OTRebtsyOUZIdq26r4roSxHrps3dupWTtN70HWx2NhQG1nLEw986QYiQu1T
 hcKvDmehQLrUvg==
 =x3WL
 -----END PGP SIGNATURE-----

Merge tag 'x86-fpu-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fpu updates from Thomas Gleixner:

 - Cleanup of extable fixup handling to be more robust, which in turn
   allows to make the FPU exception fixups more robust as well.

 - Change the return code for signal frame related failures from
   explicit error codes to a boolean fail/success as that's all what the
   calling code evaluates.

 - A large refactoring of the FPU code to prepare for adding AMX
   support:

      - Distangle the public header maze and remove especially the
        misnomed kitchen sink internal.h which is despite it's name
        included all over the place.

      - Add a proper abstraction for the register buffer storage (struct
        fpstate) which allows to dynamically size the buffer at runtime
        by flipping the pointer to the buffer container from the default
        container which is embedded in task_struct::tread::fpu to a
        dynamically allocated container with a larger register buffer.

      - Convert the code over to the new fpstate mechanism.

      - Consolidate the KVM FPU handling by moving the FPU related code
        into the FPU core which removes the number of exports and avoids
        adding even more export when AMX has to be supported in KVM.
        This also removes duplicated code which was of course
        unnecessary different and incomplete in the KVM copy.

      - Simplify the KVM FPU buffer handling by utilizing the new
        fpstate container and just switching the buffer pointer from the
        user space buffer to the KVM guest buffer when entering
        vcpu_run() and flipping it back when leaving the function. This
        cuts the memory requirements of a vCPU for FPU buffers in half
        and avoids pointless memory copy operations.

        This also solves the so far unresolved problem of adding AMX
        support because the current FPU buffer handling of KVM inflicted
        a circular dependency between adding AMX support to the core and
        to KVM. With the new scheme of switching fpstate AMX support can
        be added to the core code without affecting KVM.

      - Replace various variables with proper data structures so the
        extra information required for adding dynamically enabled FPU
        features (AMX) can be added in one place

 - Add AMX (Advanced Matrix eXtensions) support (finally):

   AMX is a large XSTATE component which is going to be available with
   Saphire Rapids XEON CPUs. The feature comes with an extra MSR
   (MSR_XFD) which allows to trap the (first) use of an AMX related
   instruction, which has two benefits:

    1) It allows the kernel to control access to the feature

    2) It allows the kernel to dynamically allocate the large register
       state buffer instead of burdening every task with the the extra
       8K or larger state storage.

   It would have been great to gain this kind of control already with
   AVX512.

   The support comes with the following infrastructure components:

    1) arch_prctl() to
        - read the supported features (equivalent to XGETBV(0))
        - read the permitted features for a task
        - request permission for a dynamically enabled feature

       Permission is granted per process, inherited on fork() and
       cleared on exec(). The permission policy of the kernel is
       restricted to sigaltstack size validation, but the syscall
       obviously allows further restrictions via seccomp etc.

    2) A stronger sigaltstack size validation for sys_sigaltstack(2)
       which takes granted permissions and the potentially resulting
       larger signal frame into account. This mechanism can also be used
       to enforce factual sigaltstack validation independent of dynamic
       features to help with finding potential victims of the 2K
       sigaltstack size constant which is broken since AVX512 support
       was added.

    3) Exception handling for #NM traps to catch first use of a extended
       feature via a new cause MSR. If the exception was caused by the
       use of such a feature, the handler checks permission for that
       feature. If permission has not been granted, the handler sends a
       SIGILL like the #UD handler would do if the feature would have
       been disabled in XCR0. If permission has been granted, then a new
       fpstate which fits the larger buffer requirement is allocated.

       In the unlikely case that this allocation fails, the handler
       sends SIGSEGV to the task. That's not elegant, but unavoidable as
       the other discussed options of preallocation or full per task
       permissions come with their own set of horrors for kernel and/or
       userspace. So this is the lesser of the evils and SIGSEGV caused
       by unexpected memory allocation failures is not a fundamentally
       new concept either.

       When allocation succeeds, the fpstate properties are filled in to
       reflect the extended feature set and the resulting sizes, the
       fpu::fpstate pointer is updated accordingly and the trap is
       disarmed for this task permanently.

    4) Enumeration and size calculations

    5) Trap switching via MSR_XFD

       The XFD (eXtended Feature Disable) MSR is context switched with
       the same life time rules as the FPU register state itself. The
       mechanism is keyed off with a static key which is default
       disabled so !AMX equipped CPUs have zero overhead. On AMX enabled
       CPUs the overhead is limited by comparing the tasks XFD value
       with a per CPU shadow variable to avoid redundant MSR writes. In
       case of switching from a AMX using task to a non AMX using task
       or vice versa, the extra MSR write is obviously inevitable.

       All other places which need to be aware of the variable feature
       sets and resulting variable sizes are not affected at all because
       they retrieve the information (feature set, sizes) unconditonally
       from the fpstate properties.

    6) Enable the new AMX states

   Note, this is relatively new code despite the fact that AMX support
   is in the works for more than a year now.

   The big refactoring of the FPU code, which allowed to do a proper
   integration has been started exactly 3 weeks ago. Refactoring of the
   existing FPU code and of the original AMX patches took a week and has
   been subject to extensive review and testing. The only fallout which
   has not been caught in review and testing right away was restricted
   to AMX enabled systems, which is completely irrelevant for anyone
   outside Intel and their early access program. There might be dragons
   lurking as usual, but so far the fine grained refactoring has held up
   and eventual yet undetected fallout is bisectable and should be
   easily addressable before the 5.16 release. Famous last words...

   Many thanks to Chang Bae and Dave Hansen for working hard on this and
   also to the various test teams at Intel who reserved extra capacity
   to follow the rapid development of this closely which provides the
   confidence level required to offer this rather large update for
   inclusion into 5.16-rc1

* tag 'x86-fpu-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (110 commits)
  Documentation/x86: Add documentation for using dynamic XSTATE features
  x86/fpu: Include vmalloc.h for vzalloc()
  selftests/x86/amx: Add context switch test
  selftests/x86/amx: Add test cases for AMX state management
  x86/fpu/amx: Enable the AMX feature in 64-bit mode
  x86/fpu: Add XFD handling for dynamic states
  x86/fpu: Calculate the default sizes independently
  x86/fpu/amx: Define AMX state components and have it used for boot-time checks
  x86/fpu/xstate: Prepare XSAVE feature table for gaps in state component numbers
  x86/fpu/xstate: Add fpstate_realloc()/free()
  x86/fpu/xstate: Add XFD #NM handler
  x86/fpu: Update XFD state where required
  x86/fpu: Add sanity checks for XFD
  x86/fpu: Add XFD state to fpstate
  x86/msr-index: Add MSRs for XFD
  x86/cpufeatures: Add eXtended Feature Disabling (XFD) feature bit
  x86/fpu: Reset permission and fpstate on exec()
  x86/fpu: Prepare fpu_clone() for dynamically enabled features
  x86/fpu/signal: Prepare for variable sigframe length
  x86/signal: Use fpu::__state_user_size for sigalt stack validation
  ...
2021-11-01 14:03:56 -07:00
..
arm64 kselftest/arm64: signal: Skip tests if required features are missing 2021-09-21 18:12:03 +01:00
bpf selftests/bpf: Use recv_timeout() instead of retries 2021-10-26 12:29:33 -07:00
breakpoints selftests: breakpoints: Use correct error messages in breakpoint_test_arm64.c 2021-02-08 17:04:41 -07:00
capabilities
cgroup tests/cgroup: test cgroup.kill 2021-05-10 10:41:11 -04:00
clone3
core selftests/core: add regression test for CLOSE_RANGE_UNSHARE | CLOSE_RANGE_CLOEXEC 2020-12-19 16:23:19 +01:00
cpu-hotplug
cpufreq selftests/cpufreq: Rename DEBUG_PI_LIST to DEBUG_PLIST 2021-08-31 11:00:02 -06:00
damon mm/damon: add user space selftests 2021-09-08 11:50:25 -07:00
dma dma-mapping: benchmark: Add support for multi-pages map/unmap 2021-04-02 16:41:08 +02:00
dmabuf-heaps kselftests: dmabuf-heaps: Add extra checking that allocated buffers are zeroed 2021-02-08 16:25:53 -07:00
drivers linux-kselftest-fixes-5.15-rc5 2021-10-04 14:33:30 -07:00
efivarfs
exec tools/testing/selftests/exec: fix link error 2021-05-22 15:09:07 -10:00
filesystems selftests/binderfs: add test for feature files 2021-07-21 13:46:36 +02:00
firmware selftests: firmware: Fix ignored return val of asprintf() warn 2021-07-21 16:11:42 +02:00
fpu
ftrace selftests/ftrace: Update test for more eprobe removal process 2021-10-13 19:27:53 -04:00
futex selftests: futex: Test sys_futex_waitv() wouldblock 2021-10-07 13:51:12 +02:00
gpio selftests: gpio: update .gitignore 2021-03-08 11:59:16 +01:00
ia64
intel_pstate
ipc selftests/ipc: remove unneeded semicolon 2021-02-08 16:32:43 -07:00
ir
kcmp
kexec
kmod
kselftest
kvm Small x86 fixes. 2021-10-01 11:08:07 -07:00
landlock landlock: Enable user space to infer supported features 2021-04-22 12:22:11 -07:00
lib selftests: lib: Add wrapper script for test_scanf 2021-05-19 15:05:11 +02:00
livepatch
lkdtm lkdtm/fortify: Consolidate FORTIFY_SOURCE tests 2021-08-18 22:28:51 +02:00
locking
media_tests
membarrier
memfd selftests/memfd: remove unused variable 2021-09-08 11:50:28 -07:00
memory-hotplug selftests: memory-hotplug: avoid spamming logs with dump_page(), ratio limit hot-remove error test 2021-07-12 14:20:01 -06:00
mincore selftests: remove duplicate include 2021-05-07 00:26:33 -07:00
mount
mount_setattr tests: test MOUNT_ATTR_NOSYMFOLLOW with mount_setattr() 2021-06-01 15:06:51 +02:00
move_mount_set_group tests: add move_mount(MOVE_MOUNT_SET_GROUP) selftest 2021-07-26 14:45:19 +02:00
mqueue
nci selftests: nci: replace unsigned int with int 2021-09-16 13:55:51 +01:00
net fcnal-test: kill hanging ping/nettest binaries on cleanup 2021-10-22 14:03:18 -07:00
netfilter selftests: netfilter: remove stray bash debug line 2021-10-14 23:08:35 +02:00
nsfs
ntb
openat2 selftests: openat2: Fix testing failure for O_LARGEFILE flag 2021-08-25 13:46:13 -06:00
perf_events signal: Deliver all of the siginfo perf data in _perf 2021-05-18 16:20:54 -05:00
pid_namespace
pidfd
powerpc selftests/powerpc: Add scv versions of the basic TM syscall tests 2021-09-13 22:34:11 +10:00
prctl
proc proc: add .gitignore for proc-subset-pid selftest 2021-06-05 08:58:11 -07:00
pstore
ptp
ptrace
rcutorture torture: Make kvm-test-1-run-qemu.sh check for reboot loops 2021-07-27 11:41:33 -07:00
resctrl selftests/resctrl: Fix incorrect parsing of option "-t" 2021-06-07 18:38:58 -06:00
rlimits kselftests: Add test to check for rlimit changes in different user namespaces 2021-04-30 14:14:03 -05:00
rseq
rtc
safesetid selftests: safesetid: Fix spelling mistake "cant" -> "can't" 2021-08-26 15:15:24 -06:00
sched kselftests/sched: cleanup the child processes 2021-10-05 15:51:43 +02:00
seccomp seccomp updates for v5.14-rc1 2021-06-28 19:49:37 -07:00
sgx selftests/sgx: Fix Q1 and Q2 calculation in sigstruct.c 2021-07-30 17:20:01 -06:00
sigaltstack selftest/sigaltstack: Use the AT_MINSIGSTKSZ aux vector if available 2021-05-19 12:38:17 +02:00
size
sparc64
splice selftests: splice: Adjust for handler fallback removal 2021-06-07 18:39:43 -06:00
static_keys
sync selftests/sync: Remove the deprecated config SYNC 2021-08-31 10:58:00 -06:00
syscall_user_dispatch entry: Use different define for selector variable in SUD 2021-02-06 00:21:42 +01:00
sysctl
tc-testing tc-testing: Add control-plane selftests for sch_mq 2021-08-04 12:42:27 +01:00
timens selftests/timens: Fix gettime_perf to work on powerpc 2021-04-21 22:52:32 +10:00
timers selftests: timers: rtcpie: skip test if default RTC device does not exist 2021-06-07 19:18:52 -06:00
tmpfs
tpm2
uevent
user
vDSO selftests/vDSO: fix ABI selftest on riscv 2021-02-08 16:38:34 -07:00
vm tools/testing/selftests/vm/split_huge_page_test.c: fix application of sizeof to pointer 2021-10-28 17:18:55 -07:00
watchdog
wireguard wireguard: selftests: make sure rp_filter is disabled on vethc 2021-06-04 14:25:14 -07:00
x86 selftests/x86/amx: Add context switch test 2021-10-28 14:35:27 +02:00
zram
.gitignore
gen_kselftest_tar.sh
kselftest_deps.sh selftests: remove obsolete gpio references from kselftest_deps.sh 2021-02-15 11:43:28 +01:00
kselftest_harness.h selftests: kselftest_harness.h: partially fix kernel-doc markups 2021-01-21 14:06:00 -07:00
kselftest_install.sh
kselftest_module.h kselftest: add support for skipped tests 2021-02-15 11:07:42 +01:00
kselftest.h
lib.mk selftests: be sure to make khdr before other targets 2021-09-15 10:34:21 -06:00
Makefile Core: 2021-08-31 16:43:06 -07:00
run_kselftest.sh