- Add new P-state driver for AMD processors (Huang Rui).
- Fix initialization of min and max frequency QoS requests in the
cpufreq core (Rafael Wysocki).
- Fix EPP handling on Alder Lake in intel_pstate (Srinivas Pandruvada).
- Make intel_pstate update cpuinfo.max_freq when notified of HWP
capabilities changes and drop a redundant function call from that
driver (Rafael Wysocki).
- Improve IRQ support in the Qcom cpufreq driver (Ard Biesheuvel,
Stephen Boyd, Vladimir Zapolskiy).
- Fix double devm_remap() in the Mediatek cpufreq driver (Hector Yuan).
- Introduce thermal pressure helpers for cpufreq CPU cooling (Lukasz
Luba).
- Make cpufreq use default_groups in kobj_type (Greg Kroah-Hartman).
- Make cpuidle use default_groups in kobj_type (Greg Kroah-Hartman).
- Fix two comments in cpuidle code (Jason Wang, Yang Li).
- Allow model-specific normal EPB value to be used in the intel_epb
sysfs attribute handling code (Srinivas Pandruvada).
- Simplify locking in pm_runtime_put_suppliers() (Rafael Wysocki).
- Add safety net to supplier device release in the runtime PM core
code (Rafael Wysocki).
- Capture device status before disabling runtime PM for it (Rafael
Wysocki).
- Add new macros for declaring PM operations to allow drivers to
avoid guarding them with CONFIG_PM #ifdefs or __maybe_unused and
update some drivers to use these macros (Paul Cercueil).
- Allow ACPI hardware signature to be honoured during restore from
hibernation (David Woodhouse).
- Update outdated operating performance points (OPP) documentation
(Tang Yizhou).
- Reduce log severity for informative message regarding frequency
transition failures in devfreq (Tzung-Bi Shih).
- Add DRAM frequency controller devfreq driver for Allwinner sunXi
SoCs (Samuel Holland).
- Add missing COMMON_CLK dependency to sun8i devfreq driver (Arnd
Bergmann).
- Add support for new layout of Psys PowerLimit Register on SPR to
the Intel RAPL power capping driver (Zhang Rui).
- Fix typo in a comment in idle_inject.c (Jason Wang).
- Remove unused function definition from the DTPM (Dynamit Thermal
Power Management) power capping framework (Daniel Lezcano).
- Reduce DTPM trace verbosity (Daniel Lezcano).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmHcgkgSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxs34P/3kFhRk7qrwEekx6F11im6caLKT9+Qap
PuGVqfTbK7TupVQDVGFBEjTjgKY7Ph7Fcr4bqn6wvNOp96cjXyOSk/c1fcpS3Bpr
b1PYsFsb9diNKE462sGGYClyCT3X5qQqtpxzOl3g4I1PWKTC1mKFm4Jm2m6S6cFq
DKhsgYKFzQSZNb1wJM4JjHS9c3BRygqp4nfEAmifu5b9tLZf7stWnFHhbGq63M9m
OwHOrEEnzhf4pOXGZTvIXeczgE6IcuDdlGkIg7XMHnmKSNvj1HqhEgi2lfSRb98z
5eI4S6JymCJGVK+gr8iVCq1iJ+LKqV3YPXRqvI35/+NqIKYxMt2ZivQQf5s3aQLe
26gUulD3O6Pz5tMlwcDElD4/tcClfg35PCD/VzpRR8TAo8vLBb63kZ5v6+HM34ZJ
6QbLTNZJTnGmEqxMccUxP+HhZz8ssqpLAC+R2sE5yXbNpIZq8CbPiGb65RGiX3SG
CmRKqH/xQVNKBYP0ChjmUyhKcBxOnx1Xu8AhsN7gRAy0aht7j7OdjTnJuGiX6gu3
Q5WxvVvkekyfhuFQ5TST9y/fzvMJWzeaA6GhVIr6RoBmshNQGTb0H4HXARxS3Ah5
qjd7ao7BFLa898FCHaHIpmFWp0wF5iljwCJQVP3I2qUpPvDJxEtsxc4CF/AZzyNR
VudoFqLoIV5C
=1egI
-----END PGP SIGNATURE-----
Merge tag 'pm-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"The most signigicant change here is the addition of a new cpufreq
'P-state' driver for AMD processors as a better replacement for the
venerable acpi-cpufreq driver.
There are also other cpufreq updates (in the core, intel_pstate, ARM
drivers), PM core updates (mostly related to adding new macros for
declaring PM operations which should make the lives of driver
developers somewhat easier), and a bunch of assorted fixes and
cleanups.
Summary:
- Add new P-state driver for AMD processors (Huang Rui).
- Fix initialization of min and max frequency QoS requests in the
cpufreq core (Rafael Wysocki).
- Fix EPP handling on Alder Lake in intel_pstate (Srinivas
Pandruvada).
- Make intel_pstate update cpuinfo.max_freq when notified of HWP
capabilities changes and drop a redundant function call from that
driver (Rafael Wysocki).
- Improve IRQ support in the Qcom cpufreq driver (Ard Biesheuvel,
Stephen Boyd, Vladimir Zapolskiy).
- Fix double devm_remap() in the Mediatek cpufreq driver (Hector
Yuan).
- Introduce thermal pressure helpers for cpufreq CPU cooling (Lukasz
Luba).
- Make cpufreq use default_groups in kobj_type (Greg Kroah-Hartman).
- Make cpuidle use default_groups in kobj_type (Greg Kroah-Hartman).
- Fix two comments in cpuidle code (Jason Wang, Yang Li).
- Allow model-specific normal EPB value to be used in the intel_epb
sysfs attribute handling code (Srinivas Pandruvada).
- Simplify locking in pm_runtime_put_suppliers() (Rafael Wysocki).
- Add safety net to supplier device release in the runtime PM core
code (Rafael Wysocki).
- Capture device status before disabling runtime PM for it (Rafael
Wysocki).
- Add new macros for declaring PM operations to allow drivers to
avoid guarding them with CONFIG_PM #ifdefs or __maybe_unused and
update some drivers to use these macros (Paul Cercueil).
- Allow ACPI hardware signature to be honoured during restore from
hibernation (David Woodhouse).
- Update outdated operating performance points (OPP) documentation
(Tang Yizhou).
- Reduce log severity for informative message regarding frequency
transition failures in devfreq (Tzung-Bi Shih).
- Add DRAM frequency controller devfreq driver for Allwinner sunXi
SoCs (Samuel Holland).
- Add missing COMMON_CLK dependency to sun8i devfreq driver (Arnd
Bergmann).
- Add support for new layout of Psys PowerLimit Register on SPR to
the Intel RAPL power capping driver (Zhang Rui).
- Fix typo in a comment in idle_inject.c (Jason Wang).
- Remove unused function definition from the DTPM (Dynamit Thermal
Power Management) power capping framework (Daniel Lezcano).
- Reduce DTPM trace verbosity (Daniel Lezcano)"
* tag 'pm-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (53 commits)
x86, sched: Fix undefined reference to init_freq_invariance_cppc() build error
cpufreq: amd-pstate: Fix Kconfig dependencies for AMD P-State
cpufreq: amd-pstate: Fix struct amd_cpudata kernel-doc comment
cpuidle: use default_groups in kobj_type
x86: intel_epb: Allow model specific normal EPB value
MAINTAINERS: Add AMD P-State driver maintainer entry
Documentation: amd-pstate: Add AMD P-State driver introduction
cpufreq: amd-pstate: Add AMD P-State performance attributes
cpufreq: amd-pstate: Add AMD P-State frequencies attributes
cpufreq: amd-pstate: Add boost mode support for AMD P-State
cpufreq: amd-pstate: Add trace for AMD P-State module
cpufreq: amd-pstate: Introduce the support for the processors with shared memory solution
cpufreq: amd-pstate: Add fast switch function for AMD P-State
cpufreq: amd-pstate: Introduce a new AMD P-State driver to support future processors
ACPI: CPPC: Add CPPC enable register function
ACPI: CPPC: Check present CPUs for determining _CPC is valid
ACPI: CPPC: Implement support for SystemIO registers
x86/msr: Add AMD CPPC MSR definitions
x86/cpufeatures: Add AMD Collaborative Processor Performance Control feature flag
cpufreq: use default_groups in kobj_type
...
Pull random number generator updates from Jason Donenfeld:
"These a bit more numerous than usual for the RNG, due to folks
resubmitting patches that had been pending prior and generally renewed
interest.
There are a few categories of patches in here:
1) Dominik Brodowski and I traded a series back and forth for a some
weeks that fixed numerous issues related to seeds being provided
at extremely early boot by the firmware, before other parts of the
kernel or of the RNG have been initialized, both fixing some
crashes and addressing correctness around early boot randomness.
One of these is marked for stable.
2) I replaced the RNG's usage of SHA-1 with BLAKE2s in the entropy
extractor, and made the construction a bit safer and more
standard. This was sort of a long overdue low hanging fruit, as we
were supposed to have phased out SHA-1 usage quite some time ago
(even if all we needed here was non-invertibility). Along the way
it also made extraction 131% faster. This required a bit of
Kconfig and symbol plumbing to make things work well with the
crypto libraries, which is one of the reasons why I'm sending you
this pull early in the cycle.
3) I got rid of a truly superfluous call to RDRAND in the hot path,
which resulted in a whopping 370% increase in performance.
4) Sebastian Andrzej Siewior sent some patches regarding PREEMPT_RT,
the full series of which wasn't ready yet, but the first two
preparatory cleanups were good on their own. One of them touches
files in kernel/irq/, which is the other reason why I'm sending
you this pull early in the cycle.
5) Other assorted correctness fixes from Eric Biggers, Jann Horn,
Mark Brown, Dominik Brodowski, and myself"
* 'random-5.17-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: don't reset crng_init_cnt on urandom_read()
random: avoid superfluous call to RDRAND in CRNG extraction
random: early initialization of ChaCha constants
random: use IS_ENABLED(CONFIG_NUMA) instead of ifdefs
random: harmonize "crng init done" messages
random: mix bootloader randomness into pool
random: do not throw away excess input to crng_fast_load
random: do not re-init if crng_reseed completes before primary init
random: fix crash on multiple early calls to add_bootloader_randomness()
random: do not sign extend bytes for rotation when mixing
random: use BLAKE2s instead of SHA1 in extraction
lib/crypto: blake2s: include as built-in
random: fix data race on crng init time
random: fix data race on crng_node_pool
irq: remove unused flags argument from __handle_irq_event_percpu()
random: remove unused irq_flags argument from add_interrupt_randomness()
random: document add_hwgenerator_randomness() with other input functions
MAINTAINERS: add git tree for random.c
arch/x86/ to amd64_edac as that is its only user anyway
- Some MCE error injection improvements to the AMD side
- Reorganization of the #MC handler code and the facilities it calls to
make it noinstr-safe
- Add support for new AMD MCA bank types and non-uniform banks layout
- The usual set of cleanups and fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcGZ4ACgkQEsHwGGHe
VUr6Zw//WBvNvfV/akQGsvVo94G0DaF+buYB+Tl1p0goMd7QfKA5iHxjB1alEJC2
dTchIr7pjiiE3nr4svuWLLQZamx8kMwQNqipioBHXg3YThj0wD4PbUOC9TlIceBR
3yxVbvwlD7Y7sb2PII6IMlagzTiIeW0/ps29DHFr5vqDBvEanNdAHoV/h2vQi+76
Ma96psIxzTMSk11yGB6l9k66EASCdDGBU7sODjup7wuQmuRaQ/1oJAWY0wIJvJez
frjpaz/YKmlTwTf9bxoJbky2FkeBsD4yXXUGwjDgMq0EyUUaeSbvaQkm8gSHX9Yr
VDDv1WvT6QIw6x7Wc4skS8lWmZghNBbAHOoNS31BPJ2IDmFWkF5Q2bNEuHrtU4EC
0mkNeyN6x48L/F8j/1aE/tm+SjiGexZX4zhi6MNWReTV140I1zqQq/r7CCu5+MEa
PAB1YH/96k2dMPT6mbFrRIFJmkDuBuZOAkuwYWEjO/XjPl2SGBGj1jKolWW3qjRR
Po7vBJnDt7wgigWFh6+R4rJv+fh87XfB7B2wEOt4Yn37jUkK6dNRIy0zFmDaC1J2
bHgsJbWC+Sgs1G57gnYABJYzLj7RRdDyCu1/UUVyBBP7/WfZJw0kjABE7p3AaYTd
15JV1L0c/Ypuv05LJf40LkyF2F5w2fnP5QM2Rr8U4xW/GumEyWs=
=8Hu7
-----END PGP SIGNATURE-----
Merge tag 'ras_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Borislav Petkov:
"A relatively big amount of movements in RAS-land this time around:
- First part of a series to move the AMD address translation code
from arch/x86/ to amd64_edac as that is its only user anyway
- Some MCE error injection improvements to the AMD side
- Reorganization of the #MC handler code and the facilities it calls
to make it noinstr-safe
- Add support for new AMD MCA bank types and non-uniform banks layout
- The usual set of cleanups and fixes"
* tag 'ras_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/mce: Reduce number of machine checks taken during recovery
x86/mce/inject: Avoid out-of-bounds write when setting flags
x86/MCE/AMD, EDAC/mce_amd: Support non-uniform MCA bank type enumeration
x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types
x86/mce: Check regs before accessing it
x86/mce: Mark mce_start() noinstr
x86/mce: Mark mce_timed_out() noinstr
x86/mce: Move the tainting outside of the noinstr region
x86/mce: Mark mce_read_aux() noinstr
x86/mce: Mark mce_end() noinstr
x86/mce: Mark mce_panic() noinstr
x86/mce: Prevent severity computation from being instrumented
x86/mce: Allow instrumentation during task work queueing
x86/mce: Remove noinstr annotation from mce_setup()
x86/mce: Use mce_rdmsrl() in severity checking code
x86/mce: Remove function-local cpus variables
x86/mce: Do not use memset to clear the banks bitmaps
x86/mce/inject: Set the valid bit in MCA_STATUS before error injection
x86/mce/inject: Check if a bank is populated before injecting
x86/mce: Get rid of cpu_missing
...
copy_user_enhanced_fast_string()
- Avoid writing MSR_CSTAR on Intel due to TDX guests raising a #VE trap
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcFRcACgkQEsHwGGHe
VUrVYRAAg8hJS/aIMnqr+CDX+iOlx2hxJ2TA2bA45NwWc1A4VTt9kwRB0+NIKkjj
F3uJbidZjxSch9Oza6O5KyjJK8QtOfqxyYcx8TLjSleqJRoJWxl1Ub1/yAfKIX/0
QsqXVc/OuMzgwVGYLUwGSWifJOWMYKy03vSczmXK74zp9vZ56fdot8rOhDm3Xb/R
QSfT5nKlgCvxbvAqgFfbXKoEu/EqT43sTXq4o1C6yDX/G6JOGe6nXZIAvIVm3iKZ
utOqO+tBOmektF/yg3EHZL/7paFgtfETcI1YpmPYqKhG3KvvZgm7yyU6SqrcctSx
vMSPTcgcuZl2I5OF+eesUGfGGhHSfSPBAhkxpCTOb6lHf73PYRC3BnQtlQkQt6g/
UOtm3fQwrVJcKlMu7nem46iDCgbSyvASFa5ZyuOGcrAiFLhJzQNRDlXLpxp/q615
yOYTRgj4YS6vomzc6bL3zNCcF5aJUwAPNVghe3l2zwKXetoOPvtWX8sKlYjiN3GW
DTtEi117IAiWkosDIYY+aFNxLeOqxpNMcOkwd5eHHdpR3rkeFkjOtBctll/eHzPi
NYx++cV5yYW0z4S2uRr6o4k4hdgAQU/p7xhdO28Z+yzWpmXQ//79HhiOf2nNd1iI
dpQAx9roo8vbR3JYLxGYFuJrZsHna+/f6Gqf5teUy7SjVL5M95U=
=zbYM
-----END PGP SIGNATURE-----
Merge tag 'x86_cpu_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpuid updates from Borislav Petkov:
- Enable the short string copies for CPUs which support them, in
copy_user_enhanced_fast_string()
- Avoid writing MSR_CSTAR on Intel due to TDX guests raising a #VE trap
* tag 'x86_cpu_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/lib: Add fast-short-rep-movs check to copy_user_enhanced_fast_string()
x86/cpu: Don't write CSTAR MSR on Intel CPUs
pagetable to prevent any stale entries' presence
- Flush global mappings from the TLB, in addition to the CR3-write,
after switching off of the trampoline_pgd during boot to clear the
identity mappings
- Prevent instrumentation issues resulting from the above changes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcEGIACgkQEsHwGGHe
VUp14RAAgo6BbW9J82Pyl55egIhcQDdGsa16Gdm9S/AFIIW/NhwYo9ydrgtzr/70
3XKpJYX7nH7PUKYRmoca/m3NnzUU+wnjSGS1XMyB3bJvn2/8S1qeuwBty2VP2dYM
iS2eGRLjVjbMWwQUSK7tPJa5wi11zUqLIyCe3t0YiWso6TK7xKaVJTQ3/19Xc+/a
zVQ5VpmzglUTxA6xGCvTDn5IUViUb8QmIuw7Ty6QtQEoI6T3qQvPkdJNXOxDcHNy
9gDGf4O+5YlPCxYsNEkWDDa02zSZ2aWFSq76b98VyMiOK0xts+ktnAwq6oes+as9
ZLIipOu5aIkj8te7he0FelyvPhZAVzrFvvmMf1U+EV3PqbyVkabhk5SBeP5v8CZy
bM4eYNuJ2FLvFpUCC9zQ/MNVQ6ZtxN15rrrsTqk46KLPBHmHp/Aj9W/DP4zpCcNg
Wwh4xbnGNIN8jZBiBJG6R6q7oM/lZt/loEicxm2QFZHtAIYMsiUmE99HnIREjUHd
+0mwo2rHniie9zh6GoybX8OcbZCLYGdfe3iPvlO9fQpyDTn8IUIlnruDlUiTBMDM
fX4J2dynh7xXRH1WW+MwxDv4n400+C08SG9zTD0qPCbGhYwNscMlZhA2JN6mlPep
spuRPOzzwUUxqjXkDloeDDJNUQ8r032OB2LMhWSbLApJrJM9/QA=
=cM+z
-----END PGP SIGNATURE-----
Merge tag 'x86_mm_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Borislav Petkov:
- Flush *all* mappings from the TLB after switching to the trampoline
pagetable to prevent any stale entries' presence
- Flush global mappings from the TLB, in addition to the CR3-write,
after switching off of the trampoline_pgd during boot to clear the
identity mappings
- Prevent instrumentation issues resulting from the above changes
* tag 'x86_mm_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Prevent early boot triple-faults with instrumentation
x86/mm: Include spinlock_t definition in pgtable.
x86/mm: Flush global TLB when switching to trampoline page-table
x86/mm/64: Flush global TLB on boot and AP bringup
x86/realmode: Add comment for Global bit usage in trampoline_pgd
x86/mm: Add missing <asm/cpufeatures.h> dependency to <asm/page_64.h>
from poison memory and error injection into SGX pages
- A bunch of changes to the SGX selftests to simplify and allow of SGX
features testing without the need of a whole SGX software stack
- Add a sysfs attribute which is supposed to show the amount of SGX
memory in a NUMA node, similar to what /proc/meminfo is to normal
memory
- The usual bunch of fixes and cleanups too
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcDQMACgkQEsHwGGHe
VUq42xAAjWM0AFpIxgUBpbE0swV3ZMulnndl3/vA5XN+9Yn7Q52+AFyPRE0s7Zam
Ap+cInh2Il7d/sv54rZ4x/j7+TH4i7s8fWPVU/XiPALQuOuw0/B1wJJ+jmMiPFiU
3jr7DkUPyWjWTHduMY/tk+xMOpkx1XsxJheYnKvsKVW+fjJ0vPuftAZtfu2z2VOh
3JLcp5cAXPxW0UK9gdoF5bCBQhBu0NRguTbhHhbByAixQO2GyVSKLSRovUdj0a+y
QRrQ6hgcvpTOsVHJoWJ7yIX4SBzQTe9Bg6dT9DghOxE4Sc2GH89hu7wRztGawBJO
nLyzWgiW9ttjQutDpBvZANNVcFAPAdtDWczrzZpREbrGKkzT+kOBnIIL1LWITWOy
2YWTO3ytW0KNIK85GzMjSVOKRMgaHJeBaGuYZ7Z0kb3GuUPJ9zRlaRxNapKQFuzA
0PGoA4IDT+2Afy7VYBBNUA2d/WverFQuXKusSxK6b5zJ173o5/DXL2q0d3gn/j8Z
hhxJUJyVOsfRXSG4NKrj4se4FiA0n/RL4oyUZR9iJ8kWzzZTd0eZTAn468bpGIp5
yiOlPOLgsmu0xzVmAtG1+4d2+S2x+Ec5YE0sP1V/JLNciYk3Ebp7UyfnS3tn33Xc
cpdWjELvD1LJVpMEURnbjRrwU6OiiAekYJCP/9lmK9zfOGpwRHc=
=vFTM
-----END PGP SIGNATURE-----
Merge tag 'x86_sgx_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SGX updates from Borislav Petkov:
- Add support for handling hw errors in SGX pages: poisoning,
recovering from poison memory and error injection into SGX pages
- A bunch of changes to the SGX selftests to simplify and allow of SGX
features testing without the need of a whole SGX software stack
- Add a sysfs attribute which is supposed to show the amount of SGX
memory in a NUMA node, similar to what /proc/meminfo is to normal
memory
- The usual bunch of fixes and cleanups too
* tag 'x86_sgx_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/sgx: Fix NULL pointer dereference on non-SGX systems
selftests/sgx: Fix corrupted cpuid macro invocation
x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node
x86/sgx: Fix minor documentation issues
selftests/sgx: Add test for multiple TCS entry
selftests/sgx: Enable multiple thread support
selftests/sgx: Add page permission and exception test
selftests/sgx: Rename test properties in preparation for more enclave tests
selftests/sgx: Provide per-op parameter structs for the test enclave
selftests/sgx: Add a new kselftest: Unclobbered_vdso_oversubscribed
selftests/sgx: Move setup_test_encl() to each TEST_F()
selftests/sgx: Encpsulate the test enclave creation
selftests/sgx: Dump segments and /proc/self/maps only on failure
selftests/sgx: Create a heap for the test enclave
selftests/sgx: Make data measurement for an enclave segment optional
selftests/sgx: Assign source for each segment
selftests/sgx: Fix a benign linker warning
x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
x86/sgx: Add hook to error injection address validation
x86/sgx: Hook arch_memory_failure() into mainline code
...
== Problem ==
Nathan Chancellor reported an oops when aceessing the
'sgx_total_bytes' sysfs file:
https://lore.kernel.org/all/YbzhBrimHGGpddDM@archlinux-ax161/
The sysfs output code accesses the sgx_numa_nodes[] array
unconditionally. However, this array is allocated during SGX
initialization, which only occurs on systems where SGX is
supported.
If the sysfs file is accessed on systems without SGX support,
sgx_numa_nodes[] is NULL and an oops occurs.
== Solution ==
To fix this, hide the entire nodeX/x86/ attribute group on
systems without SGX support using the ->is_visible attribute
group callback.
Unfortunately, SGX is initialized via a device_initcall() which
occurs _after_ the ->is_visible() callback. Instead of moving
SGX initialization earlier, call sysfs_update_group() during
SGX initialization to update the group visiblility.
This update requires moving the SGX sysfs code earlier in
sgx/main.c. There are no code changes other than the addition of
arch_update_sysfs_visibility() and a minor whitespace fixup to
arch_node_attr_is_visible() which checkpatch caught.
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-sgx@vger.kernel.org
Cc: x86@kernel.org
Fixes: 50468e4313 ("x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/20220104171527.5E8416A8@davehans-spike.ostc.intel.com
Since commit
ee3e00e9e7 ("random: use registers from interrupted code for CPU's w/o a cycle counter")
the irq_flags argument is no longer used.
Remove unused irq_flags.
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: linux-hyperv@vger.kernel.org
Cc: x86@kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
The current EPB "normal" is defined as 6 and set whenever power-up EPB
value is 0. This setting resulted in the desired out of box power and
performance for several CPU generations. But this value is not suitable
for AlderLake mobile CPUs, as this resulted in higher uncore power.
Since EPB is model specific, this is not unreasonable to have different
behavior.
Allow a capability where "normal" EPB can be redefined. For AlderLake
mobile CPUs this desired normal value is 7.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
A contrived zero-length write, for example, by using write(2):
...
ret = write(fd, str, 0);
...
to the "flags" file causes:
BUG: KASAN: stack-out-of-bounds in flags_write
Write of size 1 at addr ffff888019be7ddf by task writefile/3787
CPU: 4 PID: 3787 Comm: writefile Not tainted 5.16.0-rc7+ #12
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
due to accessing buf one char before its start.
Prevent such out-of-bounds access.
[ bp: Productize into a proper patch. Link below is the next best
thing because the original mail didn't get archived on lore. ]
Fixes: 0451d14d05 ("EDAC, mce_amd_inj: Modify flags attribute to use string arguments")
Signed-off-by: Zhang Zixun <zhang133010@icloud.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/linux-edac/YcnePfF1OOqoQwrX@zn.tnic/
AMD systems currently lay out MCA bank types such that the type of bank
number "i" is either the same across all CPUs or is Reserved/Read-as-Zero.
For example:
Bank # | CPUx | CPUy
0 LS LS
1 RAZ UMC
2 CS CS
3 SMU RAZ
Future AMD systems will lay out MCA bank types such that the type of
bank number "i" may be different across CPUs.
For example:
Bank # | CPUx | CPUy
0 LS LS
1 RAZ UMC
2 CS NBIO
3 SMU RAZ
Change the structures that cache MCA bank types to be per-CPU and update
smca_get_bank_type() to handle this change.
Move some SMCA-specific structures to amd.c from mce.h, since they no
longer need to be global.
Break out the "count" for bank types from struct smca_hwid, since this
should provide a per-CPU count rather than a system-wide count.
Apply the "const" qualifier to the struct smca_hwid_mcatypes array. The
values in this array should not change at runtime.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211216162905.4132657-3-yazen.ghannam@amd.com
Add HWID and McaType values for new SMCA bank types, and add their error
descriptions to edac_mce_amd.
The "PHY" bank types all have the same error descriptions, and the NBIF
and SHUB bank types have the same error descriptions. So reuse the same
arrays where appropriate.
[ bp: Remove useless comments over hwid types. ]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211216162905.4132657-2-yazen.ghannam@amd.com
Commit in Fixes added a global TLB flush on the early boot path, after
the kernel switches off of the trampoline page table.
Compiler profiling options enabled with GCOV_PROFILE add additional
measurement code on clang which needs to be initialized prior to
use. The global flush in x86_64_start_kernel() happens before those
initializations can happen, leading to accessing invalid memory.
GCOV_PROFILE builds with gcc are still ok so this is clang-specific.
The second issue this fixes is with KASAN: for a similar reason,
kasan_early_init() needs to have happened before KASAN-instrumented
functions are called.
Therefore, reorder the flush to happen after the KASAN early init
and prevent the compilers from adding profiling instrumentation to
native_write_cr4().
Fixes: f154f29085 ("x86/mm/64: Flush global TLB on boot and AP bringup")
Reported-by: "J. Bruce Fields" <bfields@fieldses.org>
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Carel Si <beibei.si@intel.com>
Tested-by: "J. Bruce Fields" <bfields@fieldses.org>
Link: https://lore.kernel.org/r/20211209144141.GC25654@xsang-OptiPlex-9020
Commit in Fixes accesses pt_regs before checking whether it is NULL or
not. Make sure the NULL pointer check happens first.
Fixes: 0a5b288e85 ("x86/mce: Prevent severity computation from being instrumented")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20211217102029.GA29708@kili
add_taint() is yet another external facility which the #MC handler
calls. Move that tainting call into the instrumentation-allowed part of
the handler.
Fixes
vmlinux.o: warning: objtool: do_machine_check()+0x617: call to add_taint() leaves .noinstr.text section
While at it, allow instrumentation around the mce_log() call.
Fixes
vmlinux.o: warning: objtool: do_machine_check()+0x690: call to mce_log() leaves .noinstr.text section
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211208111343.8130-11-bp@alien8.de
It is called by the #MC handler which is noinstr.
Fixes
vmlinux.o: warning: objtool: do_machine_check()+0xbd6: call to memset() leaves .noinstr.text section
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211208111343.8130-9-bp@alien8.de
And allow instrumentation inside it because it does calls to other
facilities which will not be tagged noinstr.
Fixes
vmlinux.o: warning: objtool: do_machine_check()+0xc73: call to mce_panic() leaves .noinstr.text section
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211208111343.8130-8-bp@alien8.de
Mark all the MCE severity computation logic noinstr and allow
instrumentation when it "calls out".
Fixes
vmlinux.o: warning: objtool: do_machine_check()+0xc5d: call to mce_severity() leaves .noinstr.text section
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211208111343.8130-7-bp@alien8.de
Instead, sandwitch around the call which is done in noinstr context and
mark the caller - mce_gather_info() - as noinstr.
Also, document what the whole instrumentation strategy with #MC is going
to be in the future and where it all is supposed to be going to.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211208111343.8130-5-bp@alien8.de
The variable chunks is being shifted right and re-assinged the shifted
value which is then returned. Since chunks is not being read afterwards
the assignment is redundant and the >>= operator can be replaced with a
shift >> operator instead.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lkml.kernel.org/r/20211207223735.35173-1-colin.i.king@gmail.com
== Problem ==
The amount of SGX memory on a system is determined by the BIOS and it
varies wildly between systems. It can be as small as dozens of MB's
and as large as many GB's on servers. Just like how applications need
to know how much regular RAM is available, enclave builders need to
know how much SGX memory an enclave can consume.
== Solution ==
Introduce a new sysfs file:
/sys/devices/system/node/nodeX/x86/sgx_total_bytes
to enumerate the amount of SGX memory available in each NUMA node.
This serves the same function for SGX as /proc/meminfo or
/sys/devices/system/node/nodeX/meminfo does for normal RAM.
'sgx_total_bytes' is needed today to help drive the SGX selftests.
SGX-specific swap code is exercised by creating overcommitted enclaves
which are larger than the physical SGX memory on the system. They
currently use a CPUID-based approach which can diverge from the actual
amount of SGX memory available. 'sgx_total_bytes' ensures that the
selftests can work efficiently and do not attempt stupid things like
creating a 100,000 MB enclave on a system with 128 MB of SGX memory.
== Implementation Details ==
Introduce CONFIG_HAVE_ARCH_NODE_DEV_GROUP opt-in flag to expose an
arch specific attribute group, and add an attribute for the amount of
SGX memory in bytes to each NUMA node:
== ABI Design Discussion ==
As opposed to the per-node ABI, a single, global ABI was considered.
However, this would prevent enclaves from being able to size
themselves so that they fit on a single NUMA node. Essentially, a
single value would rule out NUMA optimizations for enclaves.
Create a new "x86/" directory inside each "nodeX/" sysfs directory.
'sgx_total_bytes' is expected to be the first of at least a few
sgx-specific files to be placed in the new directory. Just scanning
/proc/meminfo, these are the no-brainers that we have for RAM, but we
need for SGX:
MemTotal: xxxx kB // sgx_total_bytes (implemented here)
MemFree: yyyy kB // sgx_free_bytes
SwapTotal: zzzz kB // sgx_swapped_bytes
So, at *least* three. I think we will eventually end up needing
something more along the lines of a dozen. A new directory (as
opposed to being in the nodeX/ "root") directory avoids cluttering the
root with several "sgx_*" files.
Place the new file in a new "nodeX/x86/" directory because SGX is
highly x86-specific. It is very unlikely that any other architecture
(or even non-Intel x86 vendor) will ever implement SGX. Using "sgx/"
as opposed to "x86/" was also considered. But, there is a real chance
this can get used for other arch-specific purposes.
[ dhansen: rewrite changelog ]
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211116162116.93081-2-jarkko@kernel.org
MCA handlers check the valid bit in each status register
(MCA_STATUS[Val]) and continue processing the error only if the valid
bit is set.
Set the valid bit unconditionally in the corresponding MCA_STATUS
register and correct any Val=0 injections made by the user as such
errors will get ignored and such injections will be largely pointless.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211104215846.254012-3-Smita.KoralahalliChannabasappa@amd.com
The MCA_IPID register uniquely identifies a bank's type on Scalable MCA
(SMCA) systems. When an MCA bank is not populated, the MCA_IPID register
will read as zero and writes to it will be ignored.
On a hw-type error injection (injection which writes the actual MCA
registers in an attempt to cause a real MCE) check the value of this
register before trying to inject the error.
Do not impose any limitations on a sw injection and allow the user to
test out all the decoding paths without relying on the available hardware,
as its purpose is to just test the code.
[ bp: Heavily massage. ]
Link: https://lkml.kernel.org/r/20211019233641.140275-2-Smita.KoralahalliChannabasappa@amd.com
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211104215846.254012-2-Smita.KoralahalliChannabasappa@amd.com
Intel CPUs do not support SYSCALL in 32-bit mode, but the kernel
initializes MSR_CSTAR unconditionally. That MSR write is normally
ignored by the CPU, but in a TDX guest it raises a #VE trap.
Exclude Intel CPUs from the MSR_CSTAR initialization.
[ tglx: Fixed the subject line and removed the redundant comment. ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20211119035803.4012145-1-sathyanarayanan.kuppuswamy@linux.intel.com
- Move the command line preparation and the early command line parsing
earlier so that the command line parameters which affect
early_reserve_memory(), e.g. efi=nosftreserve, are taken into
account. This was broken when the invocation of early_reserve_memory()
was moved recently.
- Use an atomic type for the SGX page accounting, which is read and
written lockless, to plug various race conditions related to it.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmGaYPoTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoTM7D/9bivpPDiNzfjUV7kNKx6aTUwPjdFer
G0RuuDZqkpJm9j7+51VnQNFssIfAFtzKMJn/DuGVoXF0ERxXMEhJVHiSTeOlCjJU
u1760qFYlAQ1mwvKVNLk2SenWuNZwwgUneY3VvvS4qYsSq7PsbYlekuddPeX0Nws
AJ1llOoCoBkm5vNZ5c3/CmhY6iPSRQQkDmbA11cZZUyWl2uouSk21+ax24IDCvW3
E8Aq9QqB6ND2uukB32kQ7Wp7/UZ4inJHTUXF9UF/8P+N1ftDWeKDjQz6y9U19Tsd
ivuMr6NqqAos/Fpo9PhlGns07C8HeKGf4ronnt9cUMqjzYWfdS+pRT+0pQR+vIPa
M8+jyHQplzeOX9/nKOkpV+u0tYP2zgx8e7yeu5Sion8TqsKqiNOy9+D0D2utUDmw
1x3DzuzGx/mK2OX5gjGSx4ZbS4u0DIAWnF8vB9YfgEfcnqpxr6KdbrY0bLatIbKv
ip9mh0rRYeTkTZ4FGmvy3hFgAmadCODWxva/7AhzbWVZoM+AShwnTDsipkRaaj3V
nMdgcVix8qVDg9YIAn9ziZbxkXKQUXFJn7lZj3KBeWjKcV2svA89S/9YL6JTaSeW
TJ4X6wK8EoApKhEasZhufXBNAl9EmQlBS1k9pHiIjKVuRgBGlzMuEhzvrqZM2+rA
KaUQSwBN6Ij6Dg==
=CJUK
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2021-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
- Move the command line preparation and the early command line parsing
earlier so that the command line parameters which affect
early_reserve_memory(), e.g. efi=nosftreserve, are taken into
account. This was broken when the invocation of
early_reserve_memory() was moved recently.
- Use an atomic type for the SGX page accounting, which is read and
written locklessly, to plug various race conditions related to it.
* tag 'x86-urgent-2021-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Fix free page accounting
x86/boot: Pull up cmdline preparation and early param parsing
Get rid of cpu_missing because
7bb39313cd ("x86/mce: Make mce_timed_out() identify holdout CPUs")
provides a more detailed message about which CPUs are missing.
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Zhaolong Zhang <zhangzl2013@126.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211109112345.2673403-1-zhangzl2013@126.com
The SGX driver maintains a single global free page counter,
sgx_nr_free_pages, that reflects the number of free pages available
across all NUMA nodes. Correspondingly, a list of free pages is
associated with each NUMA node and sgx_nr_free_pages is updated
every time a page is added or removed from any of the free page
lists. The main usage of sgx_nr_free_pages is by the reclaimer
that runs when it (sgx_nr_free_pages) goes below a watermark
to ensure that there are always some free pages available to, for
example, support efficient page faults.
With sgx_nr_free_pages accessed and modified from a few places
it is essential to ensure that these accesses are done safely but
this is not the case. sgx_nr_free_pages is read without any
protection and updated with inconsistent protection by any one
of the spin locks associated with the individual NUMA nodes.
For example:
CPU_A CPU_B
----- -----
spin_lock(&nodeA->lock); spin_lock(&nodeB->lock);
... ...
sgx_nr_free_pages--; /* NOT SAFE */ sgx_nr_free_pages--;
spin_unlock(&nodeA->lock); spin_unlock(&nodeB->lock);
Since sgx_nr_free_pages may be protected by different spin locks
while being modified from different CPUs, the following scenario
is possible:
CPU_A CPU_B
----- -----
{sgx_nr_free_pages = 100}
spin_lock(&nodeA->lock); spin_lock(&nodeB->lock);
sgx_nr_free_pages--; sgx_nr_free_pages--;
/* LOAD sgx_nr_free_pages = 100 */ /* LOAD sgx_nr_free_pages = 100 */
/* sgx_nr_free_pages-- */ /* sgx_nr_free_pages-- */
/* STORE sgx_nr_free_pages = 99 */ /* STORE sgx_nr_free_pages = 99 */
spin_unlock(&nodeA->lock); spin_unlock(&nodeB->lock);
In the above scenario, sgx_nr_free_pages is decremented from two CPUs
but instead of sgx_nr_free_pages ending with a value that is two less
than it started with, it was only decremented by one while the number
of free pages were actually reduced by two. The consequence of
sgx_nr_free_pages not being protected is that its value may not
accurately reflect the actual number of free pages on the system,
impacting the availability of free pages in support of many flows.
The problematic scenario is when the reclaimer does not run because it
believes there to be sufficient free pages while any attempt to allocate
a page fails because there are no free pages available. In the SGX driver
the reclaimer's watermark is only 32 pages so after encountering the
above example scenario 32 times a user space hang is possible when there
are no more free pages because of repeated page faults caused by no
free pages made available.
The following flow was encountered:
asm_exc_page_fault
...
sgx_vma_fault()
sgx_encl_load_page()
sgx_encl_eldu() // Encrypted page needs to be loaded from backing
// storage into newly allocated SGX memory page
sgx_alloc_epc_page() // Allocate a page of SGX memory
__sgx_alloc_epc_page() // Fails, no free SGX memory
...
if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) // Wake reclaimer
wake_up(&ksgxd_waitq);
return -EBUSY; // Return -EBUSY giving reclaimer time to run
return -EBUSY;
return -EBUSY;
return VM_FAULT_NOPAGE;
The reclaimer is triggered in above flow with the following code:
static bool sgx_should_reclaim(unsigned long watermark)
{
return sgx_nr_free_pages < watermark &&
!list_empty(&sgx_active_page_list);
}
In the problematic scenario there were no free pages available yet the
value of sgx_nr_free_pages was above the watermark. The allocation of
SGX memory thus always failed because of a lack of free pages while no
free pages were made available because the reclaimer is never started
because of sgx_nr_free_pages' incorrect value. The consequence was that
user space kept encountering VM_FAULT_NOPAGE that caused the same
address to be accessed repeatedly with the same result.
Change the global free page counter to an atomic type that
ensures simultaneous updates are done safely. While doing so, move
the updating of the variable outside of the spin lock critical
section to which it does not belong.
Cc: stable@vger.kernel.org
Fixes: 901ddbb9ec ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()")
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/a95a40743bbd3f795b465f30922dde7f1ea9e0eb.1637004094.git.reinette.chatre@intel.com
Provide a recovery function sgx_memory_failure(). If the poison was
consumed synchronously then send a SIGBUS. Note that the virtual
address of the access is not included with the SIGBUS as is the case
for poison outside of SGX enclaves. This doesn't matter as addresses
of code/data inside an enclave is of little to no use to code executing
outside the (now dead) enclave.
Poison found in a free page results in the page being moved from the
free list to the per-node poison page list.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-5-tony.luck@intel.com
A memory controller patrol scrubber can report poison in a page
that isn't currently being used.
Add "poison" field in the sgx_epc_page that can be set for an
sgx_epc_page. Check for it:
1) When sanitizing dirty pages
2) When freeing epc pages
Poison is a new field separated from flags to avoid having to make all
updates to flags atomic, or integrate poison state changes into some
other locking scheme to protect flags (Currently just sgx_reclaimer_lock
which protects the SGX_EPC_PAGE_RECLAIMER_TRACKED bit in page->flags).
In both cases place the poisoned page on a per-node list of poisoned
epc pages to make sure it will not be reallocated.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-4-tony.luck@intel.com
X86 machine check architecture reports a physical address when there
is a memory error. Handling that error requires a method to determine
whether the physical address reported is in any of the areas reserved
for EPC pages by BIOS.
SGX EPC pages do not have Linux "struct page" associated with them.
Keep track of the mapping from ranges of EPC pages to the sections
that contain them using an xarray. N.B. adds CONFIG_XARRAY_MULTI to
the SGX dependecies. So "select" that in arch/x86/Kconfig for X86/SGX.
Create a function arch_is_platform_page() that simply reports whether an
address is an EPC page for use elsewhere in the kernel. The ACPI error
injection code needs this function and is typically built as a module,
so export it.
Note that arch_is_platform_page() will be slower than other similar
"what type is this page" functions that can simply check bits in the
"struct page". If there is some future performance critical user of
this function it may need to be implemented in a more efficient way.
Note also that the current implementation of xarray allocates a few
hundred kilobytes for this usage on a system with 4GB of SGX EPC memory
configured. This isn't ideal, but worth it for the code simplicity.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-3-tony.luck@intel.com
SGX EPC pages go through the following life cycle:
DIRTY ---> FREE ---> IN-USE --\
^ |
\-----------------/
Recovery action for poison for a DIRTY or FREE page is simple. Just
make sure never to allocate the page. IN-USE pages need some extra
handling.
Add a new flag bit SGX_EPC_PAGE_IS_FREE that is set when a page
is added to a free list and cleared when the page is allocated.
Notes:
1) These transitions are made while holding the node->lock so that
future code that checks the flags while holding the node->lock
can be sure that if the SGX_EPC_PAGE_IS_FREE bit is set, then the
page is on the free list.
2) Initially while the pages are on the dirty list the
SGX_EPC_PAGE_IS_FREE bit is cleared.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-2-tony.luck@intel.com
Explicitly check for MSR_HYPERCALL and MSR_VP_INDEX support when probing
for running as a Hyper-V guest instead of waiting until hyperv_init() to
detect the bogus configuration. Add messages to give the admin a heads
up that they are likely running on a broken virtual machine setup.
At best, silently disabling Hyper-V is confusing and difficult to debug,
e.g. the kernel _says_ it's using all these fancy Hyper-V features, but
always falls back to the native versions. At worst, the half baked setup
will crash/hang the kernel.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20211104182239.1302956-3-seanjc@google.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
- Do not log spurious corrected MCEs on SKL too, due to an erratum
- Clarify the path of paravirt ops patches upstream
- Add an optimization to avoid writing out AMX components to sigframes
when former are in init state
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmGQ3CgACgkQEsHwGGHe
VUoLAA/+NXRvcBHYkLaByT9f4OI6B79HzyguIBSfipYiw8ir0H7uEdV5FUCCUgCz
egBRVFpOsXWt1teeuu6ViO+WBHncUxG/ryZ0ka35lri/3kuVYnugZExWDs4MrGR5
vehRXehOxYNRaYc3oLYjubSbxqF1nWz3WWfGfhiBKk0jT/S1T9tX6lsRXlKsJCgj
M4x5aqBWP8HTbFQfqjdHwagNitmSKzgjZvMcC4UWcql33ZCycbjvRdrAzBtw7WRI
UBvgxWVmeMoagu5fqEOoph1oSoFxWuFrweFUjnxJmT6uZrTsfF7BVgXkxdG6eYUy
2Xogcd4bPDBiRgbs0vPEog1tyyrKHOQ6p1pvksySKMPq6ULcSZ6hBpEZRpgr6Y9u
0jB3P6weQgCckx5Hd+iwvX1a+GvEuHSEqAE+j160wFyrsBS5Cir3P1WqthWaPd5I
3nH3h955PokUHPUioUhdf+8cfuP6h6K0nz1gdYI8GR8+fJHhEceT+pLLeyIxj/VM
yr+bq+V7D6Cg62w3z3s9Dzg2XKpxStu1R9L1N/K8MtIGf6Uc7paL6xR27XxhmBp5
Y6bGZw0mxxFhp6AEsFWo3rwLL9Dl5DmFcfgUHHpPK5VP0pVWp48Uapx2Hi2/JzAo
c1o4UkPQa/EZJBPTklmGkS1JNp/2TsEL4Fw7sew+j7DWtsJpCfk=
=Ge2T
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Add the model number of a new, Raptor Lake CPU, to intel-family.h
- Do not log spurious corrected MCEs on SKL too, due to an erratum
- Clarify the path of paravirt ops patches upstream
- Add an optimization to avoid writing out AMX components to sigframes
when former are in init state
* tag 'x86_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add Raptor Lake to Intel family
x86/mce: Add errata workaround for Skylake SKX37
MAINTAINERS: Add some information to PARAVIRT_OPS entry
x86/fpu: Optimize out sigframe xfeatures when in init state
Errata SKX37 is word-for-word identical to the other errata listed in
this workaround. I happened to notice this after investigating a CMCI
storm on a Skylake host. While I can't confirm this was the root cause,
spurious corrected errors does sound like a likely suspect.
Fixes: 2976908e41 ("x86/mce: Do not log spurious corrected mce errors")
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20211029205759.GA7385@codemonkey.org.uk
Here is the big set of driver core changes for 5.16-rc1.
All of these have been in linux-next for a while now with no reported
problems.
Included in here are:
- big update and cleanup of the sysfs abi documentation files
and scripts from Mauro. We are almost at the place where we
can properly check that the running kernel's sysfs abi is
documented fully.
- firmware loader updates
- dyndbg updates
- kernfs cleanups and fixes from Christoph
- device property updates
- component fix
- other minor driver core cleanups and fixes
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYYPbjQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ync9gCfXKMUI1GAnCfJWAwTdTcd18q5akoAoMw32/AH
0yh5TjAWFyFd7xz5d7qs
=itsC
-----END PGP SIGNATURE-----
Merge tag 'driver-core-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the big set of driver core changes for 5.16-rc1.
All of these have been in linux-next for a while now with no reported
problems.
Included in here are:
- big update and cleanup of the sysfs abi documentation files and
scripts from Mauro. We are almost at the place where we can
properly check that the running kernel's sysfs abi is documented
fully.
- firmware loader updates
- dyndbg updates
- kernfs cleanups and fixes from Christoph
- device property updates
- component fix
- other minor driver core cleanups and fixes"
* tag 'driver-core-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (122 commits)
device property: Drop redundant NULL checks
x86/build: Tuck away built-in firmware under FW_LOADER
vmlinux.lds.h: wrap built-in firmware support under FW_LOADER
firmware_loader: move struct builtin_fw to the only place used
x86/microcode: Use the firmware_loader built-in API
firmware_loader: remove old DECLARE_BUILTIN_FIRMWARE()
firmware_loader: formalize built-in firmware API
component: do not leave master devres group open after bind
dyndbg: refine verbosity 1-4 summary-detail
gpiolib: acpi: Replace custom code with device_match_acpi_handle()
i2c: acpi: Replace custom function with device_match_acpi_handle()
driver core: Provide device_match_acpi_handle() helper
dyndbg: fix spurious vNpr_info change
dyndbg: no vpr-info on empty queries
dyndbg: vpr-info on remove-module complete, not starting
device property: Add missed header in fwnode.h
Documentation: dyndbg: Improve cli param examples
dyndbg: Remove support for ddebug_query param
dyndbg: make dyndbg a known cli param
dyndbg: show module in vpr-info in dd-exec-queries
...
tl;dr: AMX state is ~8k. Signal frames can have space for this
~8k and each signal entry writes out all 8k even if it is zeros.
Skip writing zeros for AMX to speed up signal delivery by about
4% overall when AMX is in its init state.
This is a user-visible change to the sigframe ABI.
== Hardware XSAVE Background ==
XSAVE state components may be tracked by the processor as being
in their initial configuration. Software can detect which
features are in this configuration by looking at the XSTATE_BV
field in an XSAVE buffer or with the XGETBV(1) instruction.
Both the XSAVE and XSAVEOPT instructions enumerate features s
being in the initial configuration via the XSTATE_BV field in the
XSAVE header, However, XSAVEOPT declines to actually write
features in their initial configuration to the buffer. XSAVE
writes the feature unconditionally, regardless of whether it is
in the initial configuration or not.
Basically, XSAVE users never need to inspect XSTATE_BV to
determine if the feature has been written to the buffer.
XSAVEOPT users *do* need to inspect XSTATE_BV. They might also
need to clear out the buffer if they want to make an isolated
change to the state, like modifying one register.
== Software Signal / XSAVE Background ==
Signal frames have historically been written with XSAVE itself.
Each state is written in its entirety, regardless of being in its
initial configuration.
In other words, the signal frame ABI uses the XSAVE behavior, not
the XSAVEOPT behavior.
== Problem ==
This means that any application which has acquired permission to
use AMX via ARCH_REQ_XCOMP_PERM will write 8k of state to the
signal frame. This 8k write will occur even when AMX was in its
initial configuration and software *knows* this because of
XSTATE_BV.
This problem also exists to a lesser degree with AVX-512 and its
2k of state. However, AVX-512 use does not require
ARCH_REQ_XCOMP_PERM and is more likely to have existing users
which would be impacted by any change in behavior.
== Solution ==
Stop writing out AMX xfeatures which are in their initial state
to the signal frame. This effectively makes the signal frame
XSAVE buffer look as if it were written with a combination of
XSAVEOPT and XSAVE behavior. Userspace which handles XSAVEOPT-
style buffers should be able to handle this naturally.
For now, include only the AMX xfeatures: XTILE and XTILEDATA in
this new behavior. These require new ABI to use anyway, which
makes their users very unlikely to be broken. This XSAVEOPT-like
behavior should be expected for all future dynamic xfeatures. It
may also be extended to legacy features like AVX-512 in the
future.
Only attempt this optimization on systems with dynamic features.
Disable dynamic feature support (XFD) if XGETBV1 is unavailable
by adding a CPUID dependency.
This has been measured to reduce the *overall* cycle cost of
signal delivery by about 4%.
Fixes: 2308ee57d9 ("x86/fpu/amx: Enable the AMX feature in 64-bit mode")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: "Chang S. Bae" <chang.seok.bae@intel.com>
Link: https://lore.kernel.org/r/20211102224750.FA412E26@davehans-spike.ostc.intel.com
core:
- improve dma_fence, lease and resv documentation
- shmem-helpers: allocate WC pages on x86, use vmf_insert_pin
- sched fixes/improvements
- allow empty drm leases
- add dma resv iterator
- add more DP 2.0 headers
- DP MST helper improvements for DP2.0
dma-buf:
- avoid warnings, remove fence trace macros
bridge:
- new helper to get rid of panels
- probe improvements for it66121
- enable DSI EOTP for anx7625
fbdev:
- efifb: release runtime PM on destroy
ttm:
- kerneldoc switch
- helper to clear all DMA mappings
- pool shrinker optimizaton
- remove ttm_tt_destroy_common
- update ttm_move_memcpy for async use
panel:
- add new panel-edp driver
amdgpu:
- Initial DP 2.0 support
- Initial USB4 DP tunnelling support
- Aldebaran MCE support
- Modifier support for DCC image stores for GFX 10.3
- Display rework for better FP code handling
- Yellow Carp/Cyan Skillfish updates
- Cyan Skillfish display support
- convert vega/navi to IP discovery asic enumeration
- validate IP discovery table
- RAS improvements
- Lots of fixes
i915:
- DG1 PCI IDs + LMEM discovery/placement
- DG1 GuC submission by default
- ADL-S PCI IDs updated + enabled by default
- ADL-P (XE_LPD) fixed and updates
- DG2 display fixes
- PXP protected object support for Gen12 integrated
- expose multi-LRC submission interface for GuC
- export logical engine instance to user
- Disable engine bonding on Gen12+
- PSR cleanup
- PSR2 selective fetch by default
- DP 2.0 prep work
- VESA vendor block + MSO use of it
- FBC refactor
- try again to fix fast-narrow vs slow-wide eDP training
- use THP when IOMMU enabled
- LMEM backup/restore for suspend/resume
- locking simplification
- GuC major reworking
- async flip VT-D workaround changes
- DP link training improvements
- misc display refactorings
bochs:
- new PCI ID
rcar-du:
- Non-contiguious buffer import support for rcar-du
- r8a779a0 support prep
omapdrm:
- COMPILE_TEST fixes
sti:
- COMPILE_TEST fixes
msm:
- fence ordering improvements
- eDP support in DP sub-driver
- dpu irq handling cleanup
- CRC support for making igt happy
- NO_CONNECTOR bridge support
- dsi: 14nm phy support for msm8953
- mdp5: msm8x53, sdm450, sdm632 support
stm:
- layer alpha + zpo support
v3d:
- fix Vulkan CTS failure
- support multiple sync objects
gud:
- add R8/RGB332/RGB888 pixel formats
vc4:
- convert to new bridge helpers
vgem:
- use shmem helpers
virtio:
- support mapping exported vram
zte:
- remove obsolete driver
rockchip:
- use bridge attach no connector for LVDS/RGB
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmGByPYACgkQDHTzWXnE
hr6fxA//cXUvTHlEtF7UJDBRAYv+9lXH39NbGYU4aLJuBNlZztCuUi5JOSyDFDH1
N9VI5biVseev2PEnCzJUubWxTqbUO7FBQTw0TyvZ4Eqn+UZMuFeo0dvdKZRAkvjV
VHSUc0fm0+WSYanKUK7XK0fwG8aE6JVyYngzgKPSjifhszTdiiRsbU21iTinFhkS
rgh3HEVELp+LqfoG4qzAYqFUjYqUjvCjd/hX/UkzCII8ZXKr38/4127e95443WOk
+jes0gWGJe9TvSDrqo9TMx4qukcOniINFUvnzoD2RhOS+Jzr/i5rBh51Xy92g3NO
Q7hy6byZdk/ZO/MXCDQ2giUOkBiqn5fQjlRGQp4iAZYw9pb3HU+/xrTq0BWVWd8o
/vmzZYEKKU/sCGpxVDMZxsHV3mXIuVBvuZq6bjmSGcybgOBCiDx5F/Rum4nY2yHp
lr3cuc0HP3m3f4b/HVvACO4tGd1nDDpVcon7CuhBB7HB7t6Zl9u18qc/qFw0tCTh
3sgAhno6XFXtPFcSX2KAeeg0mhKDKKrsOnq5y3bDRr05Z0jLocJk95aXEKs6em4j
gbyHwNaX3CHtiCnFn2/5169+n1K7zqHBtVSGmQlmFDv55rcdx7L3Spk7tCahQeSQ
ur24r+sEggm8d5Wjl+MYq6wW3oP31s04JFaeV6oCkaSp1wS+alg=
=jdhH
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2021-11-03' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"Summary below. i915 starts to add support for DG2 GPUs, enables DG1
and ADL-S support by default, lots of work to enable DisplayPort 2.0
across drivers. Lots of documentation updates and fixes across the
board.
core:
- improve dma_fence, lease and resv documentation
- shmem-helpers: allocate WC pages on x86, use vmf_insert_pin
- sched fixes/improvements
- allow empty drm leases
- add dma resv iterator
- add more DP 2.0 headers
- DP MST helper improvements for DP2.0
dma-buf:
- avoid warnings, remove fence trace macros
bridge:
- new helper to get rid of panels
- probe improvements for it66121
- enable DSI EOTP for anx7625
fbdev:
- efifb: release runtime PM on destroy
ttm:
- kerneldoc switch
- helper to clear all DMA mappings
- pool shrinker optimizaton
- remove ttm_tt_destroy_common
- update ttm_move_memcpy for async use
panel:
- add new panel-edp driver
amdgpu:
- Initial DP 2.0 support
- Initial USB4 DP tunnelling support
- Aldebaran MCE support
- Modifier support for DCC image stores for GFX 10.3
- Display rework for better FP code handling
- Yellow Carp/Cyan Skillfish updates
- Cyan Skillfish display support
- convert vega/navi to IP discovery asic enumeration
- validate IP discovery table
- RAS improvements
- Lots of fixes
i915:
- DG1 PCI IDs + LMEM discovery/placement
- DG1 GuC submission by default
- ADL-S PCI IDs updated + enabled by default
- ADL-P (XE_LPD) fixed and updates
- DG2 display fixes
- PXP protected object support for Gen12 integrated
- expose multi-LRC submission interface for GuC
- export logical engine instance to user
- Disable engine bonding on Gen12+
- PSR cleanup
- PSR2 selective fetch by default
- DP 2.0 prep work
- VESA vendor block + MSO use of it
- FBC refactor
- try again to fix fast-narrow vs slow-wide eDP training
- use THP when IOMMU enabled
- LMEM backup/restore for suspend/resume
- locking simplification
- GuC major reworking
- async flip VT-D workaround changes
- DP link training improvements
- misc display refactorings
bochs:
- new PCI ID
rcar-du:
- Non-contiguious buffer import support for rcar-du
- r8a779a0 support prep
omapdrm:
- COMPILE_TEST fixes
sti:
- COMPILE_TEST fixes
msm:
- fence ordering improvements
- eDP support in DP sub-driver
- dpu irq handling cleanup
- CRC support for making igt happy
- NO_CONNECTOR bridge support
- dsi: 14nm phy support for msm8953
- mdp5: msm8x53, sdm450, sdm632 support
stm:
- layer alpha + zpo support
v3d:
- fix Vulkan CTS failure
- support multiple sync objects
gud:
- add R8/RGB332/RGB888 pixel formats
vc4:
- convert to new bridge helpers
vgem:
- use shmem helpers
virtio:
- support mapping exported vram
zte:
- remove obsolete driver
rockchip:
- use bridge attach no connector for LVDS/RGB"
* tag 'drm-next-2021-11-03' of git://anongit.freedesktop.org/drm/drm: (1259 commits)
drm/amdgpu/gmc6: fix DMA mask from 44 to 40 bits
drm/amd/display: MST support for DPIA
drm/amdgpu: Fix even more out of bound writes from debugfs
drm/amdgpu/discovery: add SDMA IP instance info for soc15 parts
drm/amdgpu/discovery: add UVD/VCN IP instance info for soc15 parts
drm/amdgpu/UAPI: rearrange header to better align related items
drm/amd/display: Enable dpia in dmub only for DCN31 B0
drm/amd/display: Fix USB4 hot plug crash issue
drm/amd/display: Fix deadlock when falling back to v2 from v3
drm/amd/display: Fallback to clocks which meet requested voltage on DCN31
drm/amd/display: move FPU associated DCN301 code to DML folder
drm/amd/display: fix link training regression for 1 or 2 lane
drm/amd/display: add two lane settings training options
drm/amd/display: decouple hw_lane_settings from dpcd_lane_settings
drm/amd/display: implement decide lane settings
drm/amd/display: adopt DP2.0 LT SCR revision 8
drm/amd/display: FEC configuration for dpia links in MST mode
drm/amd/display: FEC configuration for dpia links
drm/amd/display: Add workaround flag for EDID read on certain docks
drm/amd/display: Set phy_mux_sel bit in dmub scratch register
...
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmGBMQUTHHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXmE5B/9MK3Ju+tc6C8eyR3Ic4XBYHJ3voEKO
M+R90gggBriDOgkz4B8vF+k0aD8wevXAUtmCSXonDzCh5H7GoyfrVZmJEVkwlioH
ZMSMlFHcjGhCPIXhLbNtfo/NsAYEtT/lRM2lLGCSbdGuKabylXKujVdhuSIcRPdj
Rj5innUgcAywOoxG6WzFt3JBzM33UQErCGfUF2b7Rvp9E+Zii4vIMxkMzUpnkEHH
F8WMEdL0DqH5ThOs0MslNgy03pUC9wk1d5DNd9ytYHqiSQtcQZhFHw/P6dxzUFlW
OptWv31PXUIsiJf4Zi9hmfjgUl+KZHeacZ2hXtidAo86VPcIjVs25OQW
=40fn
-----END PGP SIGNATURE-----
Merge tag 'hyperv-next-signed-20211102' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
- Initial patch set for Hyper-V isolation VM support (Tianyu Lan)
- Fix a warning on preemption (Vitaly Kuznetsov)
- A bunch of misc cleanup patches
* tag 'hyperv-next-signed-20211102' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Protect set_hv_tscchange_cb() against getting preempted
Drivers: hv : vmbus: Adding NULL pointer check
x86/hyperv: Remove duplicate include
x86/hyperv: Remove duplicated include in hv_init
Drivers: hv: vmbus: Remove unused code to check for subchannels
Drivers: hv: vmbus: Initialize VMbus ring buffer for Isolation VM
Drivers: hv: vmbus: Add SNP support for VMbus channel initiate message
x86/hyperv: Add ghcb hvcall support for SNP VM
x86/hyperv: Add Write/Read MSR registers via ghcb page
Drivers: hv: vmbus: Mark vmbus ring buffer visible to host in Isolation VM
x86/hyperv: Add new hvcall guest address host visibility support
x86/hyperv: Initialize shared memory boundary in the Isolation VM.
x86/hyperv: Initialize GHCB page in Isolation VM
- set spec_store_bypass_disable & spectre_v2_user to prctl (Andrea Arcangeli)
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmGAGAkWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJqOWD/4mMFp84IMa/VdCmD6PS+BhisyI
i7+Hyfisg8AWpjgW4+JihU/6hfsDgs/hNNKbiIopcwc/12KV4M0QIQyF7vmceSwB
uMsAX7pkobNUUisnrQVbw6boK4hrBvrV3STlVdRHvNlLeQLQIu4UN3+9UMj/qsmh
46ltdxR489oDDLXFgMkKq9auVP2t5t4fbyRmgBPLSKIXaOxIhWck3kUQwt/Rbr44
M87/Xr4iQ0w4ddiBFJz9GOHQ5Iz08ms4dBfO+e5FSl6I69Nt6q836el35c/6j4y8
r7C21WU088MSkjk75RCa3v2sq8db2CjLe+wBugq+yYC29qGgxtTiUZaoiNQCN5bL
DIRfl1iU5Ge1wEKorpr3DR6DksmfJO4MNPdMo4CcVZT3Gkdi7udLHfrEI82xgdDl
lh1UiJlRx4YNEcDbGBnxCzKGwauqHa2TgPNWulUPdH7OGhUL86FAV49L84uz9lCD
C/+PKxDqc2XKjbgqMsbuyQ7hzB2KQK/ieEXzduoHxTxIr5vO/viENrbkUiSL8bsO
6msCVbCIjtFDvW4Ac16IOwGoflJ7vLAIuXIdAYCeN+JXqOVV+FG/MN447Y674FeH
R84G6JCT82ULEXrKlwuoSSVJEwA5lzP4IwoWm/ujeUbzi1s+7m+7WRpuJe2jZm6c
zPsCVkNPUrvp82L/wA==
=NAsc
-----END PGP SIGNATURE-----
Merge tag 'seccomp-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp updates from Kees Cook:
"These are x86-specific, but I carried these since they're also
seccomp-specific.
This flips the defaults for spec_store_bypass_disable and
spectre_v2_user from "seccomp" to "prctl", as enough time has passed
to allow system owners to have updated the defensive stances of their
various workloads, and it's long overdue to unpessimize seccomp
threads.
Extensive rationale and details are in Andrea's main patch.
Summary:
- set spec_store_bypass_disable & spectre_v2_user to prctl (Andrea Arcangeli)"
* tag 'seccomp-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
x86: deduplicate the spectre_v2_user documentation
x86: change default to spec_store_bypass_disable=prctl spectre_v2_user=prctl
which EPC pages can be put back into their uninitialized state without
having to reopen /dev/sgx_vepc, which could not be possible anymore
after startup due to security policies.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmF/x7AACgkQEsHwGGHe
VUqHXA//YWrukmJ5PQZwWqkXGo6h42JWhIdNfSC2c1SVdz1cioGUCCswALTX4g8l
MYYf3eN12GJ296jPh7m9bz8JvlYjdavSm3Y1yzHIjuQ3q6qywHIuYTbsrMD7waUD
PkcY1TTYgNJ2+f0AgsC4GZhlcpf9g5DqiftW6wvExx5tLUNsVu3Y3gZy/+fajP4f
s/TMjcdr2QmPsjun00KfoIY4/z0u8LkyRMSwyoxSV6wYdL6rRtfYFWsbEUS+W6Nw
/VJ0IKl+aBQ1ztsDc4M5h1uy9II2M/Row5k6JjyrdG4X8D6ACSG7cho6qcMjXgcP
Gac7Im5IyjPEorxqXAgJiMoAl9lU9a2JMVZqPtihYsQW/ygMTdpzP9sBpcZPMevc
gxQD4gyixwzUa3cyVDzTPBdk/DEuGc2nwn2k9nPvmNxKMonX1oLEiP7hu265mvet
56DtwKJF9ddtpepO2zFCg1qX+eZnTuhuZNCPsm/pmdGgzI8cyLznho33OgUSZEQY
c1UisT7HXNRVC/1Q8VBDTU/D9LtIk+2+Q5lQkcNeftI5PYKTXIVddkOkqJ4GhGWJ
9EasA4UtnhvsLzJ76gxxuUf677ns+1TCo65e7Hu1+X0eTmBJK3boe3aMHvJeHEWH
Asd+SMkYWfxAlW/arAYhR2JgT9wgEH3pSx4eXnpGwpeValxBPRs=
=1UYy
-----END PGP SIGNATURE-----
Merge tag 'x86_sgx_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SGX updates from Borislav Petkov:
"Add a SGX_IOC_VEPC_REMOVE ioctl to the /dev/sgx_vepc virt interface
with which EPC pages can be put back into their uninitialized state
without having to reopen /dev/sgx_vepc, which could not be possible
anymore after startup due to security policies"
* tag 'x86_sgx_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx/virt: implement SGX_IOC_VEPC_REMOVE ioctl
x86/sgx/virt: extract sgx_vepc_remove_page
clears the segment base when a null selector is written. Do the explicit
detection on older CPUs, zen2 and hygon specifically, which have the
functionality but do not advertize the CPUID bit. Factor in the presence
of a hypervisor underneath the kernel and avoid doing the explicit check
there which the HV might've decided to not advertize for migration
safety reasons, a.o.
- Add support for a new X86 CPU vendor: VORTEX. Needed for whitelisting
those CPUs in the hardware vulnerabilities detection
- Force the compiler to use rIP-relative addressing in the fallback path of
static_cpu_has(), in order to avoid unnecessary register pressure
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmF/wRgACgkQEsHwGGHe
VUoGQBAAk9V9//FMoENuGFGul/IK8+VBibTfztYgaPvm7vjMDYaYuRBCQiZg5Y8U
D14pwkg7CuRa6iwZmrk/X/y6FVjo5BJA//ROk/n/9JNvV5QUp3/o00uLiziv80K3
H6Wm3PUyGgkpBuJg+/K8SLE9UQ6uSh4nsykS+70Dcd45DtkC/vH8pkDs5Q1fVQwb
7AuOuWTCWKUYOMFYWFI3a9D8tZYhg99ABREbXBaJGiGdIlZKNVe/7W8qQw5s6cVA
cD5Q2ILY2RCGP55ZQiWoFy3XNP3/ygvZ7Zm1ARYUvUMR2Y5X2XJWN/B6oMbc0oEu
OZsDDA/ILYcah9eBV/zk4ON/1djksp1iWNXNxjct0cNBPAKxi6T/HhHuIHBtzvW+
zDyBWUMLlv1m2i1oW4J4NuNJJi9Gaz+7PesmI7C0OQPgywR8UqqfMD+TzlEHWya1
YqYqI0f3aiyC/sLjUp3GSA7a9sWSd3BZfyAlLBJZCxyXAxX92tXX5BRPh/KYbnJn
c/NaYA6X4m4Rdvr0gKKtCklaC6w4GLzVak6wIvftzHlUYsWX21BhnTkQrciKbqc+
AKWed41AO+4pDHROePxc409x3UZolti+1RandikrztIVAolVJ6W/OkHWxXfy28Fg
iSrtl4M3omv8fCHDaJ26STrXqxH8pIK8noVolwQoXKyAFVyvXTk=
=rlVy
-----END PGP SIGNATURE-----
Merge tag 'x86_cpu_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Borislav Petkov:
- Start checking a CPUID bit on AMD Zen3 which states that the CPU
clears the segment base when a null selector is written. Do the
explicit detection on older CPUs, zen2 and hygon specifically, which
have the functionality but do not advertize the CPUID bit. Factor in
the presence of a hypervisor underneath the kernel and avoid doing
the explicit check there which the HV might've decided to not
advertize for migration safety reasons, or similar.
- Add support for a new X86 CPU vendor: VORTEX. Needed for whitelisting
those CPUs in the hardware vulnerabilities detection
- Force the compiler to use rIP-relative addressing in the fallback
path of static_cpu_has(), in order to avoid unnecessary register
pressure
* tag 'x86_cpu_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Fix migration safety with X86_BUG_NULL_SEL
x86/CPU: Add support for Vortex CPUs
x86/umip: Downgrade warning messages to debug loglevel
x86/asm: Avoid adding register pressure for the init case in static_cpu_has()
x86/asm: Add _ASM_RIP() macro for x86-64 (%rip) suffix