linux/Documentation
Dave Hansen 30d02551ba x86/fpu: Optimize out sigframe xfeatures when in init state
tl;dr: AMX state is ~8k.  Signal frames can have space for this
~8k and each signal entry writes out all 8k even if it is zeros.
Skip writing zeros for AMX to speed up signal delivery by about
4% overall when AMX is in its init state.

This is a user-visible change to the sigframe ABI.

== Hardware XSAVE Background ==

XSAVE state components may be tracked by the processor as being
in their initial configuration.  Software can detect which
features are in this configuration by looking at the XSTATE_BV
field in an XSAVE buffer or with the XGETBV(1) instruction.

Both the XSAVE and XSAVEOPT instructions enumerate features s
being in the initial configuration via the XSTATE_BV field in the
XSAVE header,  However, XSAVEOPT declines to actually write
features in their initial configuration to the buffer.  XSAVE
writes the feature unconditionally, regardless of whether it is
in the initial configuration or not.

Basically, XSAVE users never need to inspect XSTATE_BV to
determine if the feature has been written to the buffer.
XSAVEOPT users *do* need to inspect XSTATE_BV.  They might also
need to clear out the buffer if they want to make an isolated
change to the state, like modifying one register.

== Software Signal / XSAVE Background ==

Signal frames have historically been written with XSAVE itself.
Each state is written in its entirety, regardless of being in its
initial configuration.

In other words, the signal frame ABI uses the XSAVE behavior, not
the XSAVEOPT behavior.

== Problem ==

This means that any application which has acquired permission to
use AMX via ARCH_REQ_XCOMP_PERM will write 8k of state to the
signal frame.  This 8k write will occur even when AMX was in its
initial configuration and software *knows* this because of
XSTATE_BV.

This problem also exists to a lesser degree with AVX-512 and its
2k of state.  However, AVX-512 use does not require
ARCH_REQ_XCOMP_PERM and is more likely to have existing users
which would be impacted by any change in behavior.

== Solution ==

Stop writing out AMX xfeatures which are in their initial state
to the signal frame.  This effectively makes the signal frame
XSAVE buffer look as if it were written with a combination of
XSAVEOPT and XSAVE behavior.  Userspace which handles XSAVEOPT-
style buffers should be able to handle this naturally.

For now, include only the AMX xfeatures: XTILE and XTILEDATA in
this new behavior.  These require new ABI to use anyway, which
makes their users very unlikely to be broken.  This XSAVEOPT-like
behavior should be expected for all future dynamic xfeatures.  It
may also be extended to legacy features like AVX-512 in the
future.

Only attempt this optimization on systems with dynamic features.
Disable dynamic feature support (XFD) if XGETBV1 is unavailable
by adding a CPUID dependency.

This has been measured to reduce the *overall* cycle cost of
signal delivery by about 4%.

Fixes: 2308ee57d9 ("x86/fpu/amx: Enable the AMX feature in 64-bit mode")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: "Chang S. Bae" <chang.seok.bae@intel.com>
Link: https://lore.kernel.org/r/20211102224750.FA412E26@davehans-spike.ostc.intel.com
2021-11-03 22:42:35 +01:00
..
ABI topology: Represent clusters of CPUs within a die 2021-10-15 11:25:15 +02:00
accounting
admin-guide x86/fpu updates: 2021-11-01 14:03:56 -07:00
arm Documentation: arm: marvell: Add 88F6825 model into list 2021-08-24 13:26:32 -06:00
arm64 Merge remote-tracking branch 'tip/sched/arm64' into for-next/core 2021-08-31 09:10:00 +01:00
block fscrypt updates for 5.16 2021-11-01 11:36:35 -07:00
bpf libbpf: Rename libbpf documentation index file 2021-08-18 08:45:25 -07:00
cdrom drivers/cdrom: improved ioctl for media change detection 2021-09-14 20:05:26 -06:00
core-api Updates for the interrupt subsystem: 2021-11-01 13:09:10 -07:00
cpu-freq cpufreq: Remove ready() callback 2021-09-02 18:04:17 +02:00
crypto
dev-tools Merge branch 'akpm' (patches from Andrew) 2021-09-08 12:55:35 -07:00
devicetree Updates for the interrupt subsystem: 2021-11-01 13:09:10 -07:00
doc-guide
driver-api cxl for v5.15 2021-09-09 11:48:27 -07:00
fault-injection Char / Misc driver changes for 5.15-rc1 2021-09-01 08:35:06 -07:00
fb
features RISC-V Patches for the 5.15 Merge Window, Part 2 2021-09-11 14:29:42 -07:00
filesystems Changes since last update: 2021-11-01 11:39:22 -07:00
firmware_class
firmware-guide docs: firmware-guide: acpi: dsd: graph.rst: replace some characters 2021-07-25 14:35:46 -06:00
fpga
gpu Merge tag 'amd-drm-fixes-5.15-2021-10-06' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes 2021-10-08 11:40:21 +10:00
hid
hwmon hwmon: (k10temp) Remove residues of current and voltage 2021-09-12 17:56:36 -07:00
i2c Documentation: i2c: add i2c-sysfs into index 2021-08-10 22:58:32 +02:00
ia64
ide
iio
infiniband
input
isdn
kbuild Merge branch 'akpm' (patches from Andrew) 2021-09-08 12:55:35 -07:00
kernel-hacking docs: futex: Fix kernel-doc references 2021-10-19 17:27:05 +02:00
leds Documentation: leds: standartizing LED names 2021-08-20 10:26:24 +02:00
litmus-tests
livepatch
locking Documentation: locking: fix references 2021-08-24 13:20:39 -06:00
m68k
maintainer
mhi
mips
misc-devices
netlabel
networking mctp: unify sockaddr_mctp types 2021-10-18 13:47:09 +01:00
nios2
nvdimm
openrisc
parisc
PCI pci-v5.15-changes 2021-09-07 19:13:42 -07:00
pcmcia
power Documentation: power: include kernel-doc in Energy Model doc 2021-09-07 21:17:28 +02:00
powerpc powerpc/doc: Fix htmldocs errors 2021-08-27 00:56:34 +10:00
process Merge branch 'gcc-min-version-5.1' (make gcc-5.1 the minimum version) 2021-09-13 10:43:04 -07:00
RCU
riscv
s390
scheduler sched/fair: Add document for burstable CFS bandwidth 2021-10-05 15:51:41 +02:00
scsi
security
sh
sound Yet another set of documentation changes: 2021-09-01 18:49:47 -07:00
sparc
sphinx docs: sphinx-requirements: Move sphinx_rtd_theme to top 2021-08-12 09:15:38 -06:00
sphinx-static
spi
staging
target
timers
trace Tracing updates for 5.15: 2021-09-05 11:50:41 -07:00
translations Locking updates: 2021-11-01 13:15:36 -07:00
usb docs: usb: fix malformed table 2021-08-05 12:31:51 +02:00
userspace-api Locking updates: 2021-11-01 13:15:36 -07:00
virt ARM: 2021-09-07 13:40:51 -07:00
vm Merge branch 'akpm' (patches from Andrew) 2021-09-08 12:55:35 -07:00
w1
watchdog
x86 x86/fpu: Optimize out sigframe xfeatures when in init state 2021-11-03 22:42:35 +01:00
xtensa
.gitignore
arch.rst
asm-annotations.rst
atomic_bitops.txt
atomic_t.txt Documentation/atomic_t: Document forward progress expectations 2021-08-04 15:16:47 +02:00
Changes
CodingStyle
conf.py docs: pdfdocs: Fix typo in CJK-language specific font settings 2021-09-06 16:53:39 -06:00
COPYING-logo
docutils.conf
dontdiff
index.rst
Kconfig
logo.gif
Makefile
memory-barriers.txt
SubmittingPatches
watch_queue.rst