Commit Graph

399 Commits

Author SHA1 Message Date
Linus Torvalds
a0d445f70c * Allow clearcpuid= to accept multiple bits, by Arvind Sankar.
* Move clearcpuid= parameter handling earlier in the boot, away from the
 FPU init code and to a generic location, by Mike Hommey.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl+ENF4ACgkQEsHwGGHe
 VUqZ/BAAvRNhqmvYznS7V1FGJka7noumkxSg/E3ecUnQ9mLXuJlp/k4HVDVq/sEa
 8tOXF10snObOy9BlUhpWsT+jUyPz5kNe1h67x9wYOIWq2wO2rkw54Bi7/ZKar0Ik
 E93TfiBf8xeas/96Cb6/JgwPzleeLMA6GOcd0cCmFjp++DBw1ydU2ouH2/llKhdT
 76eff4ZDvPMwuIaDRSQHY9XHX9mr55TmkVXoxkIilt3YOVE5Ap/Pz+sI0AiD6fJu
 DS82EubaCCtaMe2K+lFpq9v3l9sVzR0TTS9hf2uG+M8YpdQdzEU63WPZlXr5tVTH
 VReDJEo23Vp3Yy+4u+Ph4CPCa24vjyG5bg1eAYVwyAperelLhsW81tnz6DMaYT3u
 NBK5dL0qFZJkIM07rn5Cg/1lGARmidovvYcojwroq/bfNUK5Xxu7Dh+5IIW865Jr
 RrqwZkeWR1xXj4G97ICeKZsr1tGRHiLqPow/BXVeFRQxu7qHAQbdyF6tGFJpB/Nh
 QaVVeeUr2r+Qyhghd/vg8d3EQDbSYgfxx/8nLB2G4vyNpIIBGeNUo9OptgMnNgr1
 44nkQ3qX8VXSz9scTGtW5M2FrO7OfwD5V2V737gsmAb63MHCSORxcFMhKahd63l1
 teji0jtpZuOaguCwn+ZAjaDb8FqIOn26vybtd8jC4tzD9Qngt3w=
 =lvdt
 -----END PGP SIGNATURE-----

Merge tag 'x86_fpu_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fpu updates from Borislav Petkov:

 - Allow clearcpuid= to accept multiple bits (Arvind Sankar)

 - Move clearcpuid= parameter handling earlier in the boot, away from
   the FPU init code and to a generic location (Mike Hommey)

* tag 'x86_fpu_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Handle FPU-related and clearcpuid command line arguments earlier
  x86/fpu: Allow multiple bits in clearcpuid= parameter
2020-10-12 10:49:46 -07:00
Mike Hommey
1ef5423a55 x86/fpu: Handle FPU-related and clearcpuid command line arguments earlier
FPU initialization handles them currently. However, in the case
of clearcpuid=, some other early initialization code may check for
features before the FPU initialization code is called. Handling the
argument earlier allows the command line to influence those early
initializations.

Signed-off-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200921215638.37980-1-mh@glandium.org
2020-09-22 00:24:27 +02:00
Fenghua Yu
20f0afd1fb x86/mmu: Allocate/free a PASID
A PASID is allocated for an "mm" the first time any thread binds to an
SVA-capable device and is freed from the "mm" when the SVA is unbound
by the last thread. It's possible for the "mm" to have different PASID
values in different binding/unbinding SVA cycles.

The mm's PASID (non-zero for valid PASID or 0 for invalid PASID) is
propagated to a per-thread PASID MSR for all threads within the mm
through IPI, context switch, or inherited. This is done to ensure that a
running thread has the right PASID in the MSR matching the mm's PASID.

 [ bp: s/SVM/SVA/g; massage. ]

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/1600187413-163670-10-git-send-email-fenghua.yu@intel.com
2020-09-17 20:22:15 +02:00
Yu-cheng Yu
b454feb9ab x86/fpu/xstate: Add supervisor PASID state for ENQCMD
The ENQCMD instruction reads a PASID from the IA32_PASID MSR. The
MSR is stored in the task's supervisor XSAVE* PASID state and is
context-switched by XSAVES/XRSTORS.

 [ bp: Add (in-)definite articles and massage. ]

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/1600187413-163670-6-git-send-email-fenghua.yu@intel.com
2020-09-17 20:22:10 +02:00
Arvind Sankar
0a4bb5e550 x86/fpu: Allow multiple bits in clearcpuid= parameter
Commit

  0c2a3913d6 ("x86/fpu: Parse clearcpuid= as early XSAVE argument")

changed clearcpuid parsing from __setup() to cmdline_find_option().
While the __setup() function would have been called for each clearcpuid=
parameter on the command line, cmdline_find_option() will only return
the last one, so the change effectively made it impossible to disable
more than one bit.

Allow a comma-separated list of bit numbers as the argument for
clearcpuid to allow multiple bits to be disabled again. Log the bits
being disabled for informational purposes.

Also fix the check on the return value of cmdline_find_option(). It
returns -1 when the option is not found, so testing as a boolean is
incorrect.

Fixes: 0c2a3913d6 ("x86/fpu: Parse clearcpuid= as early XSAVE argument")
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907213919.2423441-1-nivedita@alum.mit.edu
2020-09-10 18:32:05 +02:00
Linus Torvalds
50f6c7dbd9 Misc fixes and small updates all around the place:
- Fix mitigation state sysfs output
  - Fix an FPU xstate/sxave code assumption bug triggered by Architectural LBR support
  - Fix Lightning Mountain SoC TSC frequency enumeration bug
  - Fix kexec debug output
  - Fix kexec memory range assumption bug
  - Fix a boundary condition in the crash kernel code
 
  - Optimize porgatory.ro generation a bit
  - Enable ACRN guests to use X2APIC mode
  - Reduce a __text_poke() IRQs-off critical section for the benefit of PREEMPT_RT
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl83ybgRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iJnQ/+OAkE5hiQ+F1ikQ4rKyjaT6FjvynReNUA
 ysQjcCypGB4x+slR8o3k5yrzYJ9WbDfOz7a0uekZtNHvJ80+3yheV5Yvf+Uz3EYM
 Jj/OubCNMNnvS5cJMNXs196SGd/ELLWBbCjwUWPsiWJ0ZMTgKmpZz1LgB1QZjhyw
 fbAc1WgTLVO+emE5FwBrmFzvgBxn5EtiFoLhegFtACHadNcJLiKpXpiK3NKkEirO
 owF1/Qg6mn6MowKDBDkWgmwi0HVYbraqu0hXRrCq9o105CVwgwUdORTwjK3rnUNs
 et10Zz2UmSpjXJOhKZdZLFCtYOmrADmS4pnoXF6W6cLLFvkq4b2ducnlFBtNKqMh
 ljPkIT04sF99gIKijEYWsru+MgS4qO1VNHtJxkr/ZCUjqahsa1nN9F0lP0QOXjwf
 hbK4h1NrML3UiCGAe2hjIh9zY2c8s2Q90PyCvZkKNKquSQ1E011hzcEE2RIoBBYB
 mc1d6lgfCFWVkbgRA5sx1CVtgnAvHk2wu9w/8N9XTGjPgiQJRr3I8cNUZw59gaMH
 43auWyvpVAA4vdfbKJrPVrTLhTTnQYv0A966l7/i0d8MkGN4u09sAiB3ZevZMEK9
 45b7IXWluCi0ikBAmCvQ+qEzhg7pApCziVKuaZ/4j+qPLTDAutGwz7YuaXyOKrUX
 Aj/uCev6D6c=
 =fvpv
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
 "Misc fixes and small updates all around the place:

   - Fix mitigation state sysfs output

   - Fix an FPU xstate/sxave code assumption bug triggered by
     Architectural LBR support

   - Fix Lightning Mountain SoC TSC frequency enumeration bug

   - Fix kexec debug output

   - Fix kexec memory range assumption bug

   - Fix a boundary condition in the crash kernel code

   - Optimize porgatory.ro generation a bit

   - Enable ACRN guests to use X2APIC mode

   - Reduce a __text_poke() IRQs-off critical section for the benefit of
     PREEMPT_RT"

* tag 'x86-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/alternatives: Acquire pte lock with interrupts enabled
  x86/bugs/multihit: Fix mitigation reporting when VMX is not in use
  x86/fpu/xstate: Fix an xstate size check warning with architectural LBRs
  x86/purgatory: Don't generate debug info for purgatory.ro
  x86/tsr: Fix tsc frequency enumeration bug on Lightning Mountain SoC
  kexec_file: Correctly output debugging information for the PT_LOAD ELF header
  kexec: Improve & fix crash_exclude_mem_range() to handle overlapping ranges
  x86/crash: Correct the address boundary of function parameters
  x86/acrn: Remove redundant chars from ACRN signature
  x86/acrn: Allow ACRN guest to use X2APIC mode
2020-08-15 10:38:03 -07:00
Linus Torvalds
19b39c38ab Merge branch 'work.regset' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull ptrace regset updates from Al Viro:
 "Internal regset API changes:

   - regularize copy_regset_{to,from}_user() callers

   - switch to saner calling conventions for ->get()

   - kill user_regset_copyout()

  The ->put() side of things will have to wait for the next cycle,
  unfortunately.

  The balance is about -1KLoC and replacements for ->get() instances are
  a lot saner"

* 'work.regset' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits)
  regset: kill user_regset_copyout{,_zero}()
  regset(): kill ->get_size()
  regset: kill ->get()
  csky: switch to ->regset_get()
  xtensa: switch to ->regset_get()
  parisc: switch to ->regset_get()
  nds32: switch to ->regset_get()
  nios2: switch to ->regset_get()
  hexagon: switch to ->regset_get()
  h8300: switch to ->regset_get()
  openrisc: switch to ->regset_get()
  riscv: switch to ->regset_get()
  c6x: switch to ->regset_get()
  ia64: switch to ->regset_get()
  arc: switch to ->regset_get()
  arm: switch to ->regset_get()
  sh: convert to ->regset_get()
  arm64: switch to ->regset_get()
  mips: switch to ->regset_get()
  sparc: switch to ->regset_get()
  ...
2020-08-07 09:29:25 -07:00
Kan Liang
76d10256a9 x86/fpu/xstate: Fix an xstate size check warning with architectural LBRs
An xstate size check warning is triggered on machines which support
Architectural LBRs.

    XSAVE consistency problem, dumping leaves
    WARNING: CPU: 0 PID: 0 at arch/x86/kernel/fpu/xstate.c:649 fpu__init_system_xstate+0x4d4/0xd0e
    Modules linked in:
    CPU: 0 PID: 0 Comm: swapper Not tainted intel-arch_lbr+
    RIP: 0010:fpu__init_system_xstate+0x4d4/0xd0e

The xstate size check routine, init_xstate_size(), compares the size
retrieved from the hardware with the size of task->fpu, which is
calculated by the software.

The size from the hardware is the total size of the enabled xstates in
XCR0 | IA32_XSS. Architectural LBR state is a dynamic supervisor
feature, which sets the corresponding bit in the IA32_XSS at boot time.
The size from the hardware includes the size of the Architectural LBR
state.

However, a dynamic supervisor feature doesn't allocate a buffer in the
task->fpu. The size of task->fpu doesn't include the size of the
Architectural LBR state. The mismatch will trigger the warning.

Three options as below were considered to fix the issue:

- Correct the size from the hardware by subtracting the size of the
  dynamic supervisor features.
  The purpose of the check is to compare the size CPU told with the size
  of the XSAVE buffer, which is calculated by the software. If the
  software mucks with the number from hardware, it removes the value of
  the check.
  This option is not a good option.

- Prevent the hardware from counting the size of the dynamic supervisor
  feature by temporarily removing the corresponding bits in IA32_XSS.
  Two extra MSR writes are required to flip the IA32_XSS. The option is
  not pretty, but it is workable. The check is only called once at early
  boot time. The synchronization or context-switching doesn't need to be
  worried.
  This option is implemented here.

- Remove the check entirely, because the check hasn't found any real
  problems. The option may be an alternative as option 2.
  This option is not implemented here.

Add a new function, get_xsaves_size_no_dynamic(), which retrieves the
total size without the dynamic supervisor features from the hardware.
The size will be used to compare with the size of task->fpu.

Fixes: f0dccc9da4 ("x86/fpu/xstate: Support dynamic supervisor feature for LBR")
Reported-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lore.kernel.org/r/1595253051-75374-1-git-send-email-kan.liang@linux.intel.com
2020-08-07 01:32:00 +02:00
Ingo Molnar
e89d4ca1df Linux 5.8-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl8d8h4eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGd0sH/2iktYhMwPxzzpnb
 eI3OuTX/mRn4vUFOfpx9dmGVleMfKkpbvnn3IY7wA62Qfv7J7lkFRa1Bd1DlqXfW
 yyGTGDSKG5chiRCOU3s9ni92M4xIzFlrojyt/dIK2lUGMzUPI9FGlZRGQLKqqwLh
 2syOXRWbcQ7e52IHtDSy3YBNveKRsP4NyqV+GxGiex18SMB/M3Pw9EMH614eDPsE
 QAGQi5uGv4hPJtFHgXgUyBPLFHIyFAiVxhFRIj7u2DSEKY79+wO1CGWFiFvdTY4B
 CbqKXLffY3iQdFsLJkj9Dl8cnOQnoY44V0EBzhhORxeOp71StUVaRwQMFa5tp48G
 171s5Hs=
 =BQIl
 -----END PGP SIGNATURE-----

Merge tag 'v5.8-rc7' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-07-28 13:18:01 +02:00
Al Viro
0557d64d98 x86: switch to ->regset_get()
All instances of ->get() in arch/x86 switched; that might or might
not be worth splitting up.  Notes:

	* for xstateregs_get() the amount we want to store is determined at
the boot time; see init_xstate_size() and update_regset_xstate_info() for
details.  task->thread.fpu.state.xsave ends with a flexible array member and
the amount of data in it depends upon the FPU features supported/enabled.

	* fpregs_get() writes slightly less than full ->thread.fpu.state.fsave
(the last word is not copied); we pass the full size of state.fsave and let
membuf_write() trim to the amount declared by regset - __regset_get() will
make sure that the space in buffer is no more than that.

	* copy_xstate_to_user() and its helpers are gone now.

	* fpregs_soft_get() was getting user_regset_copyout() arguments
wrong.  Since "x86: x86 user_regset math_emu" back in 2008...  I really
doubt that it's worth splitting out for -stable, though - you need
a 486SX box for that to trigger...

[Kevin's braino fix for copy_xstate_to_kernel() essentially duplicated here]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-07-27 14:31:07 -04:00
Kevin Buettner
5714ee50bb copy_xstate_to_kernel: Fix typo which caused GDB regression
This fixes a regression encountered while running the
gdb.base/corefile.exp test in GDB's test suite.

In my testing, the typo prevented the sw_reserved field of struct
fxregs_state from being output to the kernel XSAVES area.  Thus the
correct mask corresponding to XCR0 was not present in the core file for
GDB to interrogate, resulting in the following behavior:

   [kev@f32-1 gdb]$ ./gdb -q testsuite/outputs/gdb.base/corefile/corefile testsuite/outputs/gdb.base/corefile/corefile.core
   Reading symbols from testsuite/outputs/gdb.base/corefile/corefile...
   [New LWP 232880]

   warning: Unexpected size of section `.reg-xstate/232880' in core file.

With the typo fixed, the test works again as expected.

Signed-off-by: Kevin Buettner <kevinb@redhat.com>
Fixes: 9e46365459 ("copy_xstate_to_kernel(): don't leave parts of destination uninitialized")
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Airlie <airlied@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-19 17:09:10 -07:00
Kan Liang
ce711ea3ca perf/x86/intel/lbr: Support XSAVES/XRSTORS for LBR context switch
In the LBR call stack mode, LBR information is used to reconstruct a
call stack. To get the complete call stack, perf has to save/restore
all LBR registers during a context switch. Due to a large number of the
LBR registers, this process causes a high CPU overhead. To reduce the
CPU overhead during a context switch, use the XSAVES/XRSTORS
instructions.

Every XSAVE area must follow a canonical format: the legacy region, an
XSAVE header and the extended region. Although the LBR information is
only kept in the extended region, a space for the legacy region and
XSAVE header is still required. Add a new dedicated structure for LBR
XSAVES support.

Before enabling XSAVES support, the size of the LBR state has to be
sanity checked, because:
- the size of the software structure is calculated from the max number
of the LBR depth, which is enumerated by the CPUID leaf for Arch LBR.
The size of the LBR state is enumerated by the CPUID leaf for XSAVE
support of Arch LBR. If the values from the two CPUID leaves are not
consistent, it may trigger a buffer overflow. For example, a hypervisor
may unconsciously set inconsistent values for the two emulated CPUID.
- unlike other state components, the size of an LBR state depends on the
max number of LBRs, which may vary from generation to generation.

Expose the function xfeature_size() for the sanity check.
The LBR XSAVES support will be disabled if the size of the LBR state
enumerated by CPUID doesn't match with the size of the software
structure.

The XSAVE instruction requires 64-byte alignment for state buffers. A
new macro is added to reflect the alignment requirement. A 64-byte
aligned kmem_cache is created for architecture LBR.

Currently, the structure for each state component is maintained in
fpu/types.h. The structure for the new LBR state component should be
maintained in the same place. Move structure lbr_entry to fpu/types.h as
well for broader sharing.

Add dedicated lbr_save/lbr_restore functions for LBR XSAVES support,
which invokes the corresponding xstate helpers to XSAVES/XRSTORS LBR
information at the context switch when the call stack mode is enabled.
Since the XSAVES/XRSTORS instructions will be eventually invoked, the
dedicated functions is named with '_xsaves'/'_xrstors' postfix.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/1593780569-62993-23-git-send-email-kan.liang@linux.intel.com
2020-07-08 11:38:56 +02:00
Kan Liang
50f408d96d x86/fpu/xstate: Add helpers for LBR dynamic supervisor feature
The perf subsystem will only need to save/restore the LBR state.
However, the existing helpers save all supported supervisor states to a
kernel buffer, which will be unnecessary. Two helpers are introduced to
only save/restore requested dynamic supervisor states. The supervisor
features in XFEATURE_MASK_SUPERVISOR_SUPPORTED and
XFEATURE_MASK_SUPERVISOR_UNSUPPORTED mask cannot be saved/restored using
these helpers.

The helpers will be used in the following patch.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/1593780569-62993-22-git-send-email-kan.liang@linux.intel.com
2020-07-08 11:38:56 +02:00
Kan Liang
f0dccc9da4 x86/fpu/xstate: Support dynamic supervisor feature for LBR
Last Branch Records (LBR) registers are used to log taken branches and
other control flows. In perf with call stack mode, LBR information is
used to reconstruct a call stack. To get the complete call stack, perf
has to save/restore all LBR registers during a context switch. Due to
the large number of the LBR registers, e.g., the current platform has
96 LBR registers, this process causes a high CPU overhead. To reduce
the CPU overhead during a context switch, an LBR state component that
contains all the LBR related registers is introduced in hardware. All
LBR registers can be saved/restored together using one XSAVES/XRSTORS
instruction.

However, the kernel should not save/restore the LBR state component at
each context switch, like other state components, because of the
following unique features of LBR:
- The LBR state component only contains valuable information when LBR
  is enabled in the perf subsystem, but for most of the time, LBR is
  disabled.
- The size of the LBR state component is huge. For the current
  platform, it's 808 bytes.
If the kernel saves/restores the LBR state at each context switch, for
most of the time, it is just a waste of space and cycles.

To efficiently support the LBR state component, it is desired to have:
- only context-switch the LBR when the LBR feature is enabled in perf.
- only allocate an LBR-specific XSAVE buffer on demand.
  (Besides the LBR state, a legacy region and an XSAVE header have to be
   included in the buffer as well. There is a total of (808+576) byte
   overhead for the LBR-specific XSAVE buffer. The overhead only happens
   when the perf is actively using LBRs. There is still a space-saving,
   on average, when it replaces the constant 808 bytes of overhead for
   every task, all the time on the systems that support architectural
   LBR.)
- be able to use XSAVES/XRSTORS for accessing LBR at run time.
  However, the IA32_XSS should not be adjusted at run time.
  (The XCR0 | IA32_XSS are used to determine the requested-feature
  bitmap (RFBM) of XSAVES.)

A solution, called dynamic supervisor feature, is introduced to address
this issue, which
- does not allocate a buffer in each task->fpu;
- does not save/restore a state component at each context switch;
- sets the bit corresponding to the dynamic supervisor feature in
  IA32_XSS at boot time, and avoids setting it at run time.
- dynamically allocates a specific buffer for a state component
  on demand, e.g. only allocates LBR-specific XSAVE buffer when LBR is
  enabled in perf. (Note: The buffer has to include the LBR state
  component, a legacy region and a XSAVE header space.)
  (Implemented in a later patch)
- saves/restores a state component on demand, e.g. manually invokes
  the XSAVES/XRSTORS instruction to save/restore the LBR state
  to/from the buffer when perf is active and a call stack is required.
  (Implemented in a later patch)

A new mask XFEATURE_MASK_DYNAMIC and a helper xfeatures_mask_dynamic()
are introduced to indicate the dynamic supervisor feature. For the
systems which support the Architecture LBR, LBR is the only dynamic
supervisor feature for now. For the previous systems, there is no
dynamic supervisor feature available.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/1593780569-62993-21-git-send-email-kan.liang@linux.intel.com
2020-07-08 11:38:56 +02:00
Kan Liang
a063bf249b x86/fpu: Use proper mask to replace full instruction mask
When saving xstate to a kernel/user XSAVE area with the XSAVE family of
instructions, the current code applies the 'full' instruction mask (-1),
which tries to XSAVE all possible features. This method relies on
hardware to trim 'all possible' down to what is enabled in the
hardware. The code works well for now. However, there will be a
problem, if some features are enabled in hardware, but are not suitable
to be saved into all kernel XSAVE buffers, like task->fpu, due to
performance consideration.

One such example is the Last Branch Records (LBR) state. The LBR state
only contains valuable information when LBR is explicitly enabled by
the perf subsystem, and the size of an LBR state is large (808 bytes
for now). To avoid both CPU overhead and space overhead at each context
switch, the LBR state should not be saved into task->fpu like other
state components. It should be saved/restored on demand when LBR is
enabled in the perf subsystem. Current copy_xregs_to_* will trigger a
buffer overflow for such cases.

Three sites use the '-1' instruction mask which must be updated.

Two are saving/restoring the xstate to/from a kernel-allocated XSAVE
buffer and can use 'xfeatures_mask_all', which will save/restore all of
the features present in a normal task FPU buffer.

The last one saves the register state directly to a user buffer. It
could
also use 'xfeatures_mask_all'. Just as it was with the '-1' argument,
any supervisor states in the mask will be filtered out by the hardware
and not saved to the buffer.  But, to be more explicit about what is
expected to be saved, use xfeatures_mask_user() for the instruction
mask.

KVM includes the header file fpu/internal.h. To avoid 'undefined
xfeatures_mask_all' compiling issue, move copy_fpregs_to_fpstate() to
fpu/core.c and export it, because:
- The xfeatures_mask_all is indirectly used via copy_fpregs_to_fpstate()
  by KVM. The function which is directly used by other modules should be
  exported.
- The copy_fpregs_to_fpstate() is a function, while xfeatures_mask_all
  is a variable for the "internal" FPU state. It's safer to export a
  function than a variable, which may be implicitly changed by others.
- The copy_fpregs_to_fpstate() is a big function with many checks. The
  removal of the inline keyword should not impact the performance.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lkml.kernel.org/r/1593780569-62993-20-git-send-email-kan.liang@linux.intel.com
2020-07-08 11:38:56 +02:00
Petteri Aimonen
7ad816762f x86/fpu: Reset MXCSR to default in kernel_fpu_begin()
Previously, kernel floating point code would run with the MXCSR control
register value last set by userland code by the thread that was active
on the CPU core just before kernel call. This could affect calculation
results if rounding mode was changed, or a crash if a FPU/SIMD exception
was unmasked.

Restore MXCSR to the kernel's default value.

 [ bp: Carve out from a bigger patch by Petteri, add feature check, add
   FNINIT call too (amluto). ]

Signed-off-by: Petteri Aimonen <jpa@git.mail.kapsi.fi>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=207979
Link: https://lkml.kernel.org/r/20200624114646.28953-2-bp@alien8.de
2020-06-29 10:02:00 +02:00
Al Viro
4dfa103e82 x86: kill dump_fpu()
dead since the removal of aout coredump support...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-06-26 01:01:33 -04:00
Al Viro
36c8673f90 x86: copy_fpstate_to_sigframe(): have fpregs_soft_get() use kernel buffer
... then copy_to_user() the results

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-06-26 01:01:33 -04:00
Linus Torvalds
0a319ef75d Most of the changes here related to 'XSAVES supervisor state' support,
which is a feature that allows kernel-only data to be automatically
 saved/restored by the FPU context switching code.
 
 CPU features that can be supported this way are Intel PT, 'PASID' and
 CET features.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7VMZgRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jmAQ/7BJpyAHUjFJdChtkvUmLcBgI2qnxP7rc8
 Eh/tSo4PKh484Uqb4WY6XAHIAPBzEt3rHJG3fdaavzlUl98YJCdD9tstfwMPcCQ4
 L4c2Ru+h+mPQCMOZUctOphPjDzGWPzR4IhceH6gqhoS4vg9EqgN4o158x4jW6KFN
 Jlocp9CMfIaGSmaMlRrIUZ4Dj3mgboqqHsuCaibtaKAMK6LqZQDViTEal4mNbESX
 KQPOFpKrhoq6Jtzzer7fLPY2qb6kkLrL03X5IUGFP5UxigSejnfrI9SZpAuPP9S0
 kdN04Jo0T2aBIAikBTVhDWdLMJk19qeu7YXBrFEVbyhZHl1HdDqOhMdWPOp1GH9W
 CtGUalbIvz/5FbXuUImiiNh/bw2FxYjHsrDguW96IvMVFteucrFg9QyL+taYb1cV
 WqWdpIC0VoMuQxQI5FBWu4Bb/cLNV9VCxWAZjZQ806kwmyDxldsw5mucMGmH3+bO
 LD6bwRShSMRzI9bzcJSG+Z3y7Fe8b5IGNjCjzgPb88ezffBEFHzIEKdCL6QTNlRF
 6UgSGbRs41SqXwNw5tdQQNwPpDO73p+KVRGoEzyMJvojLKRGTcOHHUDriGZ30MNX
 3oHvLf5+dNrLC/frbOqUmQ7doBQOplR5VxlZVwwqkdpPw13Jf5zn4ewzriTOmKCq
 mEHMQmbkyi4=
 =M+BC
 -----END PGP SIGNATURE-----

Merge tag 'x86-fpu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 FPU updates from Ingo Molnar:
 "Most of the changes here related to 'XSAVES supervisor state' support,
  which is a feature that allows kernel-only data to be automatically
  saved/restored by the FPU context switching code.

  CPU features that can be supported this way are Intel PT, 'PASID' and
  CET features"

* tag 'x86-fpu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu/xstate: Restore supervisor states for signal return
  x86/fpu/xstate: Preserve supervisor states for the slow path in __fpu__restore_sig()
  x86/fpu: Introduce copy_supervisor_to_kernel()
  x86/fpu/xstate: Update copy_kernel_to_xregs_err() for supervisor states
  x86/fpu/xstate: Update sanitize_restored_xstate() for supervisor xstates
  x86/fpu/xstate: Define new functions for clearing fpregs and xstates
  x86/fpu/xstate: Introduce XSAVES supervisor states
  x86/fpu/xstate: Separate user and supervisor xfeatures mask
  x86/fpu/xstate: Define new macros for supervisor and user xstates
  x86/fpu/xstate: Rename validate_xstate_header() to validate_user_xstate_header()
2020-06-01 14:09:26 -07:00
Al Viro
9e46365459 copy_xstate_to_kernel(): don't leave parts of destination uninitialized
copy the corresponding pieces of init_fpstate into the gaps instead.

Cc: stable@kernel.org
Tested-by: Alexander Potapenko <glider@google.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-27 17:06:31 -04:00
Yu-cheng Yu
55e00fb66f x86/fpu/xstate: Restore supervisor states for signal return
The signal return fast path directly restores user states from the user
buffer. Once that succeeds, restore supervisor states (but only when
they are not yet restored).

For the slow path, save supervisor states to preserve them across context
switches, and restore after the user states are restored.

The previous version has the overhead of an XSAVES in both the fast and the
slow paths.  It is addressed as the following:

- In the fast path, only do an XRSTORS.
- In the slow path, do a supervisor-state-only XSAVES, and relocate the
  buffer contents.

Some thoughts in the implementation:

- In the slow path, can any supervisor state become stale between
  save/restore?

  Answer: set_thread_flag(TIF_NEED_FPU_LOAD) protects the xstate buffer.

- In the slow path, can any code reference a stale supervisor state
  register between save/restore?

  Answer: In the current lazy-restore scheme, any reference to xstate
  registers needs fpregs_lock()/fpregs_unlock() and __fpregs_load_activate().

- Are there other options?

  One other option is eagerly restoring all supervisor states.

  Currently, CET user-mode states and ENQCMD's PASID do not need to be
  eagerly restored.  The upcoming CET kernel-mode states (24 bytes) need
  to be eagerly restored.  To me, eagerly restoring all supervisor states
  adds more overhead then benefit at this point.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20200512145444.15483-11-yu-cheng.yu@intel.com
2020-05-16 12:20:50 +02:00
Yu-cheng Yu
98265c17ef x86/fpu/xstate: Preserve supervisor states for the slow path in __fpu__restore_sig()
The signal return code is responsible for taking an XSAVE buffer
present in user memory and loading it into the hardware registers. This
operation only affects user XSAVE state and never affects supervisor
state.

The fast path through this code simply points XRSTOR directly at the
user buffer. However, since user memory is not guaranteed to be always
mapped, this XRSTOR can fail. If it fails, the signal return code falls
back to a slow path which can tolerate page faults.

That slow path copies the xfeatures one by one out of the user buffer
into the task's fpu state area. However, by being in a context where it
can handle page faults, the code can also schedule.

The lazy-fpu-load code would think it has an up-to-date fpstate and
would fail to save the supervisor state when scheduling the task out.
When scheduling back in, it would likely restore stale supervisor state.

To fix that, preserve supervisor state before the slow path.  Modify
copy_user_to_fpregs_zeroing() so that if it fails, fpregs are not zeroed,
and there is no need for fpregs_deactivate() and supervisor states are
preserved.

Move set_thread_flag(TIF_NEED_FPU_LOAD) to the slow path.  Without doing
this, the fast path also needs supervisor states to be saved first.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200512145444.15483-10-yu-cheng.yu@intel.com
2020-05-16 12:09:11 +02:00
Yu-cheng Yu
eeedf15336 x86/fpu: Introduce copy_supervisor_to_kernel()
The XSAVES instruction takes a mask and saves only the features specified
in that mask.  The kernel normally specifies that all features be saved.

XSAVES also unconditionally uses the "compacted format" which means that
all specified features are saved next to each other in memory.  If a
feature is removed from the mask, all the features after it will "move
up" into earlier locations in the buffer.

Introduce copy_supervisor_to_kernel(), which saves only supervisor states
and then moves those states into the standard location where they are
normally found.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200512145444.15483-9-yu-cheng.yu@intel.com
2020-05-16 11:24:14 +02:00
Yu-cheng Yu
5d6b6a6f9b x86/fpu/xstate: Update sanitize_restored_xstate() for supervisor xstates
The function sanitize_restored_xstate() sanitizes user xstates of an XSAVE
buffer by clearing bits not in the input 'xfeatures' from the buffer's
header->xfeatures, effectively resetting those features back to the init
state.

When supervisor xstates are introduced, it is necessary to make sure only
user xstates are sanitized.  Ensure supervisor bits in header->xfeatures
stay set and supervisor states are not modified.

To make names clear, also:

- Rename the function to sanitize_restored_user_xstate().
- Rename input parameter 'xfeatures' to 'user_xfeatures'.
- In __fpu__restore_sig(), rename 'xfeatures' to 'user_xfeatures'.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20200512145444.15483-7-yu-cheng.yu@intel.com
2020-05-13 20:11:08 +02:00
Fenghua Yu
b860eb8dce x86/fpu/xstate: Define new functions for clearing fpregs and xstates
Currently, fpu__clear() clears all fpregs and xstates.  Once XSAVES
supervisor states are introduced, supervisor settings (e.g. CET xstates)
must remain active for signals; It is necessary to have separate functions:

- Create fpu__clear_user_states(): clear only user settings for signals;
- Create fpu__clear_all(): clear both user and supervisor settings in
   flush_thread().

Also modify copy_init_fpstate_to_fpregs() to take a mask from above two
functions.

Remove obvious side-comment in fpu__clear(), while at it.

 [ bp: Make the second argument of fpu__clear() bool after requesting it
   a bunch of times during review.
  - Add a comment about copy_init_fpstate_to_fpregs() locking needs. ]

Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200512145444.15483-6-yu-cheng.yu@intel.com
2020-05-13 13:41:50 +02:00
Yu-cheng Yu
71581eefd7 x86/fpu/xstate: Introduce XSAVES supervisor states
Enable XSAVES supervisor states by setting MSR_IA32_XSS bits according
to CPUID enumeration results. Also revise comments at various places.

Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200512145444.15483-5-yu-cheng.yu@intel.com
2020-05-13 12:16:47 +02:00
Yu-cheng Yu
524bb73bc1 x86/fpu/xstate: Separate user and supervisor xfeatures mask
Before the introduction of XSAVES supervisor states, 'xfeatures_mask' is
used at various places to determine XSAVE buffer components and XCR0 bits.
It contains only user xstates.  To support supervisor xstates, it is
necessary to separate user and supervisor xstates:

- First, change 'xfeatures_mask' to 'xfeatures_mask_all', which represents
  the full set of bits that should ever be set in a kernel XSAVE buffer.
- Introduce xfeatures_mask_supervisor() and xfeatures_mask_user() to
  extract relevant xfeatures from xfeatures_mask_all.

Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200512145444.15483-4-yu-cheng.yu@intel.com
2020-05-13 10:31:07 +02:00
Fenghua Yu
8ab22804ef x86/fpu/xstate: Define new macros for supervisor and user xstates
XCNTXT_MASK is 'all supported xfeatures' before introducing supervisor
xstates.  Rename it to XFEATURE_MASK_USER_SUPPORTED to make clear that
these are user xstates.

Replace XFEATURE_MASK_SUPERVISOR with the following:
- XFEATURE_MASK_SUPERVISOR_SUPPORTED: Currently nothing.  ENQCMD and
  Control-flow Enforcement Technology (CET) will be introduced in separate
  series.
- XFEATURE_MASK_SUPERVISOR_UNSUPPORTED: Currently only Processor Trace.
- XFEATURE_MASK_SUPERVISOR_ALL: the combination of above.

Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200512145444.15483-3-yu-cheng.yu@intel.com
2020-05-12 20:34:38 +02:00
Fenghua Yu
5274e6c172 x86/fpu/xstate: Rename validate_xstate_header() to validate_user_xstate_header()
The function validate_xstate_header() validates an xstate header coming
from userspace (PTRACE or sigreturn). To make it clear, rename it to
validate_user_xstate_header().

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200512145444.15483-2-yu-cheng.yu@intel.com
2020-05-12 20:20:32 +02:00
Dave Hansen
16171bffc8 x86/pkeys: Add check for pkey "overflow"
Alex Shi reported the pkey macros above arch_set_user_pkey_access()
to be unused.  They are unused, and even refer to a nonexistent
CONFIG option.

But, they might have served a good use, which was to ensure that
the code does not try to set values that would not fit in the
PKRU register.  As it stands, a too-large 'pkey' value would
be likely to silently overflow the u32 new_pkru_bits.

Add a check to look for overflows.  Also add a comment to remind
any future developer to closely examine the types used to store
pkey values if arch_max_pkey() ever changes.

This boots and passes the x86 pkey selftests.

Reported-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200122165346.AD4DA150@viggo.jf.intel.com
2020-02-24 20:25:21 +01:00
Yu-cheng Yu
e70b100806 x86/fpu/xstate: Warn when checking alignment of disabled xfeatures
An XSAVES component's alignment/offset is meaningful only when the
feature is enabled. Return zero and WARN_ONCE on checking alignment of
disabled features.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20200109211452.27369-4-yu-cheng.yu@intel.com
2020-02-12 15:43:34 +01:00
Yu-cheng Yu
49a91d61ae x86/fpu/xstate: Fix XSAVES offsets in setup_xstate_comp()
In setup_xstate_comp(), each XSAVES component offset starts from the
end of its preceding component plus alignment. A disabled feature does
not take space and its offset should be set to the end of its preceding
one with no alignment. However, in this case, alignment is incorrectly
added to the offset, which can cause the next component to have a wrong
offset.

This problem has not been visible because currently there is no xfeature
requiring alignment.

Fix it by tracking the next starting offset only from enabled
xfeatures. To make it clear, also change the function name to
setup_xstate_comp_offsets().

 [ bp: Fix a typo in the comment above it, while at it. ]

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20200109211452.27369-3-yu-cheng.yu@intel.com
2020-02-12 15:43:31 +01:00
Yu-cheng Yu
c12e13dcd8 x86/fpu/xstate: Fix last_good_offset in setup_xstate_features()
The function setup_xstate_features() uses CPUID to find each xfeature's
standard-format offset and size.  Since XSAVES always uses the compacted
format, supervisor xstates are *NEVER* in the standard-format and their
offsets are left as -1's.  However, they are still being tracked as
last_good_offset.

Fix it by tracking only user xstate offsets.

 [ bp: Use xfeature_is_supervisor() and save an indentation level. Drop
   now unused xfeature_is_user(). ]

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20200109211452.27369-2-yu-cheng.yu@intel.com
2020-02-11 19:54:04 +01:00
Sebastian Andrzej Siewior
bbc55341b9 x86/fpu: Deactivate FPU state after failure during state load
In __fpu__restore_sig(), fpu_fpregs_owner_ctx needs to be reset if the
FPU state was not fully restored. Otherwise the following may happen (on
the same CPU):

  Task A                     Task B               fpu_fpregs_owner_ctx
  *active*                                        A.fpu
  __fpu__restore_sig()
                             ctx switch           load B.fpu
                             *active*             B.fpu
  fpregs_lock()
  copy_user_to_fpregs_zeroing()
    copy_kernel_to_xregs() *modify*
    copy_user_to_xregs() *fails*
  fpregs_unlock()
                            ctx switch            skip loading B.fpu,
                            *active*              B.fpu

In the success case, fpu_fpregs_owner_ctx is set to the current task.

In the failure case, the FPU state might have been modified by loading
the init state.

In this case, fpu_fpregs_owner_ctx needs to be reset in order to ensure
that the FPU state of the following task is loaded from saved state (and
not skipped because it was the previous state).

Reset fpu_fpregs_owner_ctx after a failure during restore occurred, to
ensure that the FPU state for the next task is always loaded.

The problem was debugged-by Yu-cheng Yu <yu-cheng.yu@intel.com>.

 [ bp: Massage commit message. ]

Fixes: 5f409e20b7 ("x86/fpu: Defer FPU state load until return to userspace")
Reported-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191220195906.plk6kpmsrikvbcfn@linutronix.de
2020-01-07 13:44:42 +01:00
Yu-cheng Yu
158e2ee61f x86/fpu/xstate: Make xfeature_is_supervisor()/xfeature_is_user() return bool
Have both xfeature_is_supervisor()/xfeature_is_user() return bool
because they are used only in boolean context.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191212210855.19260-3-yu-cheng.yu@intel.com
2020-01-06 19:08:40 +01:00
Yu-cheng Yu
8c9e607376 x86/fpu/xstate: Fix small issues
In response to earlier comments, fix small issues before introducing
XSAVES supervisor states:

- Fix comments of xfeature_is_supervisor().
- Replace ((u64)1 << 63) with XCOMP_BV_COMPACTED_FORMAT.

No functional changes.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191212210855.19260-2-yu-cheng.yu@intel.com
2020-01-06 17:31:11 +01:00
Pankaj Bharadiya
c593642c8b treewide: Use sizeof_field() macro
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().

This patch is generated using following script:

EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do

	if [[ "$file" =~ $EXCLUDE_FILES ]]; then
		continue
	fi
	sed -i  -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done

Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: David Miller <davem@davemloft.net> # for net
2019-12-09 10:36:44 -08:00
Cyrill Gorcunov
446e693ca3 x86/fpu: Use XFEATURE_FP/SSE enum values instead of hardcoded numbers
When setting up sizes and offsets for legacy header entries the code uses
hardcoded 0/1 instead of the corresponding enum values XFEATURE_FP and
XFEATURE_SSE.

Replace the hardcoded numbers which enhances readability of the code and
also makes this code the first user of those enum values..

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191101130153.GG1615@uranus.lan
2019-11-04 22:10:07 +01:00
Cyrill Gorcunov
c08550510c x86/fpu: Shrink space allocated for xstate_comp_offsets
commit 8ff925e10f ("x86/xsaves: Clean up code in xstate offsets
computation in xsave area") introduced an allocation of 64 entries for
xstate_comp_offsets while the code only handles up to XFEATURE_MAX entries.

For this reason xstate_offsets and xstate_sizes are already defined with
the explicit XFEATURE_MAX limit.

Do the same for compressed format for consistency sake.

As the changelog of that commit is not giving any information it's assumed
that the main idea was to cover all possible bits in xfeatures_mask, but
this doesn't explain why other variables such as the non-compacted offsets
and sizes are explicitely limited to XFEATURE_MAX.

For consistency it's better to use the XFEATURE_MAX limit everywhere and
extend it on demand when new features get implemented at the hardware
level and subsequently supported by the kernel.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191101124228.GF1615@uranus.lan
2019-11-04 22:04:19 +01:00
Cyrill Gorcunov
58db103784 x86/fpu: Update stale variable name in comment
When the fpu code was reworked pcntxt_mask was renamed to
xfeatures_mask. Reflect it in the comment as well.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191101123850.GE1615@uranus.lan
2019-11-04 22:04:19 +01:00
Linus Torvalds
3431a940bb Merge branch 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 AVX512 status update from Ingo Molnar:
 "This adds a new ABI that the main scheduler probably doesn't want to
  deal with but HPC job schedulers might want to use: the
  AVX512_elapsed_ms field in the new /proc/<pid>/arch_status task status
  file, which allows the user-space job scheduler to cluster such tasks,
  to avoid turbo frequency drops"

* 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation/filesystems/proc.txt: Add arch_status file
  x86/process: Add AVX-512 usage elapsed time to /proc/pid/arch_status
  proc: Add /proc/<pid>/arch_status
2019-07-08 17:28:57 -07:00
Sebastian Andrzej Siewior
7891bc0ab7 x86/fpu: Inline fpu__xstate_clear_all_cpu_caps()
All fpu__xstate_clear_all_cpu_caps() does is to invoke one simple
function since commit

  73e3a7d2a7 ("x86/fpu: Remove the explicit clearing of XSAVE dependent features")

so invoke that function directly and remove the wrapper.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190704060743.rvew4yrjd6n33uzx@linutronix.de
2019-07-07 12:01:47 +02:00
Sebastian Andrzej Siewior
9838e3bff0 x86/fpu: Make 'no387' and 'nofxsr' command line options useful
The command line option `no387' is designed to disable the FPU
entirely. This only 'works' with CONFIG_MATH_EMULATION enabled.

But on 64bit this cannot work because user space expects SSE to work which
required basic FPU support. MATH_EMULATION does not help because SSE is not
emulated.

The command line option `nofxsr' should also be limited to 32bit because
FXSR is part of the required flags on 64bit so turning it off is not
possible.

Clearing X86_FEATURE_FPU without emulation enabled will not work anyway and
hang in fpu__init_system_early_generic() before the console is enabled.

Setting additioal dependencies, ensures that the CPU still boots on a
modern CPU. Otherwise, dropping FPU will leave FXSR enabled causing the
kernel to crash early in fpu__init_system_mxcsr().

With XSAVE support it will crash in fpu__init_cpu_xstate(). The problem is
that xsetbv() with XMM set and SSE cleared is not allowed.  That means
XSAVE has to be disabled. The XSAVE support is disabled in
fpu__init_system_xstate_size_legacy() but it is too late. It can be
removed, it has been added in commit

  1f999ab5a1 ("x86, xsave: Disable xsave in i387 emulation mode")

to use `no387' on a CPU with XSAVE support.

All this happens before console output.

After hat, the next possible crash is in RAID6 detect code because MMX
remained enabled. With a 3DNOW enabled config it will explode in memcpy()
for instance due to kernel_fpu_begin() but this is unconditionally enabled.

This is enough to boot a Debian Wheezy on a 32bit qemu "host" CPU which
supports everything up to XSAVES, AVX2 without 3DNOW. Later, Debian
increased the minimum requirements to i686 which means it does not boot
userland atleast due to CMOV.

After masking the additional features it still keeps SSE4A and 3DNOW*
enabled (if present on the host) but those are unused in the kernel.

Restrict `no387' and `nofxsr' otions to 32bit only. Add dependencies for
FPU, FXSR to additionaly mask CMOV, MMX, XSAVE if FXSR or FPU is cleared.

Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190703083247.57kjrmlxkai3vpw3@linutronix.de
2019-07-07 12:01:46 +02:00
Christoph Hellwig
466329bf40 x86/fpu: Remove the fpu__save() export
This function is only use by the core FPU code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190604071524.12835-4-hch@lst.de
2019-06-17 12:21:26 +02:00
Christoph Hellwig
6d79d86f96 x86/fpu: Simplify kernel_fpu_begin()
Merge two helpers into the main function, remove a pointless local
variable and flatten a conditional.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190604071524.12835-3-hch@lst.de
2019-06-17 12:19:49 +02:00
Christoph Hellwig
b78ea19ac2 x86/fpu: Simplify kernel_fpu_end()
Remove two little helpers and merge them into kernel_fpu_end() to
streamline the function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190604071524.12835-2-hch@lst.de
2019-06-17 10:43:43 +02:00
Christoph Hellwig
8d3289f2fa x86/fpu: Don't use current->mm to check for a kthread
current->mm can be non-NULL if a kthread calls use_mm(). Check for
PF_KTHREAD instead to decide when to store user mode FP state.

Fixes: 2722146eb7 ("x86/fpu: Remove fpu->initialized")
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190604175411.GA27477@lst.de
2019-06-13 20:57:49 +02:00
Aubrey Li
0c608dad2a x86/process: Add AVX-512 usage elapsed time to /proc/pid/arch_status
AVX-512 components usage can result in turbo frequency drop. So it's useful
to expose AVX-512 usage elapsed time as a heuristic hint for user space job
schedulers to cluster the AVX-512 using tasks together.

Examples:
$ while [ 1 ]; do cat /proc/tid/arch_status | grep AVX512; sleep 1; done
AVX512_elapsed_ms:      4
AVX512_elapsed_ms:      8
AVX512_elapsed_ms:      4

This means that 4 milliseconds have elapsed since the tsks AVX512 usage was
detected when the task was scheduled out.

$ cat /proc/tid/arch_status | grep AVX512
AVX512_elapsed_ms:      -1

'-1' indicates that no AVX512 usage was recorded for this task.

The time exposed is not necessarily accurate when the arch_status file is
read as the AVX512 usage is only evaluated when a task is scheduled
out. Accurate usage information can be obtained with performance counters.

[ tglx: Massaged changelog ]

Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: peterz@infradead.org
Cc: hpa@zytor.com
Cc: ak@linux.intel.com
Cc: tim.c.chen@linux.intel.com
Cc: dave.hansen@intel.com
Cc: arjan@linux.intel.com
Cc: adobriyan@gmail.com
Cc: aubrey.li@intel.com
Cc: linux-api@vger.kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linux API <linux-api@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190606012236.9391-2-aubrey.li@linux.intel.com
2019-06-12 11:42:13 +02:00
Sebastian Andrzej Siewior
aab8445c4e x86/fpu: Update kernel's FPU state before using for the fsave header
In commit

  39388e80f9 ("x86/fpu: Don't save fxregs for ia32 frames in copy_fpstate_to_sigframe()")

I removed the statement

|       if (ia32_fxstate)
|               copy_fxregs_to_kernel(fpu);

and argued that it was wrongly merged because the content was already
saved in kernel's state.

This was wrong: It is required to write it back because it is only
saved on the user-stack and save_fsave_header() reads it from task's
FPU-state. I missed that part…

Save x87 FPU state unless thread's FPU registers are already up to date.

Fixes: 39388e80f9 ("x86/fpu: Don't save fxregs for ia32 frames in copy_fpstate_to_sigframe()")
Reported-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Eric Biggers <ebiggers@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: kvm ML <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190607142915.y52mfmgk5lvhll7n@linutronix.de
2019-06-08 11:45:15 +02:00
Hugh Dickins
b81ff1013e x86/fpu: Use fault_in_pages_writeable() for pre-faulting
Since commit

   d9c9ce34ed ("x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails")

get_user_pages_unlocked() pre-faults user's memory if a write generates
a page fault while the handler is disabled.

This works in general and uncovered a bug as reported by Mike
Rapoport¹. It has been pointed out that this function may be fragile
and a simple pre-fault as in fault_in_pages_writeable() would be a
better solution. Better as in taste and simplicity: that write (as
performed by the alternative function) performs exactly the same
faulting of memory as before. This was suggested by Hugh Dickins and
Andrew Morton.

Use fault_in_pages_writeable() for pre-faulting user's stack.

  [ bigeasy: Write commit message. ]
  [ bp: Massage some. ]

¹ https://lkml.kernel.org/r/1557844195-18882-1-git-send-email-rppt@linux.ibm.com

Fixes: d9c9ce34ed ("x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails")
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: linux-mm <linux-mm@kvack.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190529072540.g46j4kfeae37a3iu@linutronix.de
Link: https://lkml.kernel.org/r/1557844195-18882-1-git-send-email-rppt@linux.ibm.com
2019-06-06 19:15:17 +02:00