Pull x86/pti updates from Thomas Gleixner:
"Another set of melted spectrum related changes:
- Code simplifications and cleanups for RSB and retpolines.
- Make the indirect calls in KVM speculation safe.
- Whitelist CPUs which are known not to speculate from Meltdown and
prepare for the new CPUID flag which tells the kernel that a CPU is
not affected.
- A less rigorous variant of the module retpoline check which merily
warns when a non-retpoline protected module is loaded and reflects
that fact in the sysfs file.
- Prepare for Indirect Branch Prediction Barrier support.
- Prepare for exposure of the Speculation Control MSRs to guests, so
guest OSes which depend on those "features" can use them. Includes
a blacklist of the broken microcodes. The actual exposure of the
MSRs through KVM is still being worked on"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation: Simplify indirect_branch_prediction_barrier()
x86/retpoline: Simplify vmexit_fill_RSB()
x86/cpufeatures: Clean up Spectre v2 related CPUID flags
x86/cpu/bugs: Make retpoline module warning conditional
x86/bugs: Drop one "mitigation" from dmesg
x86/nospec: Fix header guards names
x86/alternative: Print unadorned pointers
x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support
x86/cpufeature: Blacklist SPEC_CTRL/PRED_CMD on early Spectre v2 microcodes
x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown
x86/msr: Add definitions for new speculation control MSRs
x86/cpufeatures: Add AMD feature bits for Speculation Control
x86/cpufeatures: Add Intel feature bits for Speculation Control
x86/cpufeatures: Add CPUID_7_EDX CPUID leaf
module/retpoline: Warn about missing retpoline in module
KVM: VMX: Make indirect call speculation safe
KVM: x86: Make indirect calls in emulator speculation safe
Pull x86 mm update from Thomas Gleixner:
"A single patch which excludes the GART aperture from vmcore as
accessing that area from a dump kernel can crash the kernel.
Not necessarily the nicest way to fix this, but curing this from
ground up requires a more thorough rewrite of the whole kexec/kdump
magic"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/gart: Exclude GART aperture from vmcore
Pull x86 timer updates from Thomas Gleixner:
"A small set of updates for x86 specific timers:
- Mark TSC invariant on a subset of Centaur CPUs
- Allow TSC calibration without PIT on mobile platforms which lack
legacy devices"
* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/centaur: Mark TSC invariant
x86/tsc: Introduce early tsc clocksource
x86/time: Unconditionally register legacy timer interrupt
x86/tsc: Allow TSC calibration without PIT
Pull x86 platform updates from Thomas Gleixner:
"The platform support for x86 contains the following updates:
- A set of updates for the UV platform to support new CPUs and to fix
some of the UV4A BAU MRRs
- The initial platform support for the jailhouse hypervisor to allow
native Linux guests (inmates) in non-root cells.
- A fix for the PCI initialization on Intel MID platforms"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/jailhouse: Respect pci=lastbus command line settings
x86/jailhouse: Set X86_FEATURE_TSC_KNOWN_FREQ
x86/platform/intel-mid: Move PCI initialization to arch_init()
x86/platform/uv/BAU: Replace hard-coded values with MMR definitions
x86/platform/UV: Fix UV4A BAU MMRs
x86/platform/UV: Fix GAM MMR references in the UV x2apic code
x86/platform/UV: Fix GAM MMR changes in UV4A
x86/platform/UV: Add references to access fixed UV4A HUB MMRs
x86/platform/UV: Fix UV4A support on new Intel Processors
x86/platform/UV: Update uv_mmrs.h to prepare for UV4A fixes
x86/jailhouse: Add PCI dependency
x86/jailhouse: Hide x2apic code when CONFIG_X86_X2APIC=n
x86/jailhouse: Initialize PCI support
x86/jailhouse: Wire up IOAPIC for legacy UART ports
x86/jailhouse: Halt instead of failing to restart
x86/jailhouse: Silence ACPI warning
x86/jailhouse: Avoid access of unsupported platform resources
x86/jailhouse: Set up timekeeping
x86/jailhouse: Enable PMTIMER
x86/jailhouse: Enable APIC and SMP support
...
Pull x86/cache updates from Thomas Gleixner:
"A set of patches which add support for L2 cache partitioning to the
Intel RDT facility"
* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/intel_rdt: Add command line parameter to control L2_CDP
x86/intel_rdt: Enable L2 CDP in MSR IA32_L2_QOS_CFG
x86/intel_rdt: Add two new resources for L2 Code and Data Prioritization (CDP)
x86/intel_rdt: Enumerate L2 Code and Data Prioritization (CDP) feature
x86/intel_rdt: Add L2CDP support in documentation
x86/intel_rdt: Update documentation
- Update the ACPICA kernel code to upstream revision 20171215 including:
* Support for ACPI 6.0A changes in the NFIT table (Bob Moore).
* Local 64-bit divide in string conversions (Bob Moore).
* Fix for a regression in acpi_evaluate_object_type() (Bob Moore).
* Fixes for memory leaks during package object resolution (Bob Moore).
* Deployment of safe version of strncpy() (Bob Moore).
* Debug and messaging updates (Bob Moore).
* Support for PDTT, SDEV, TPM2 tables in iASL and tools (Bob Moore).
* Null pointer dereference avoidance in Op and cleanups (Colin Ian King).
* Fix for memory leak from building prefixed pathname (Erik Schmauss).
* Coding style fixes, disassembler and compiler updates (Hanjun Guo,
Erik Schmauss).
* Additional PPTT flags from ACPI 6.2 (Jeremy Linton).
* Fix for an off-by-one error in acpi_get_timer_duration() (Jung-uk Kim).
* Infinite loop detection timeout and utilities cleanups (Lv Zheng).
* Windows 10 version 1607 and 1703 OSI strings (Mario Limonciello).
- Update ACPICA information in MAINTAINERS to reflect the current
status of ACPICA maintenance and rename a local variable in one
function to match the corresponding upstream code (Rafael Wysocki).
- Clean up ACPI-related initialization on x86 (Andy Shevchenko).
- Add support for Intel Merrifield to the ACPI GPIO code (Andy
Shevchenko).
- Clean up ACPI PMIC drivers (Andy Shevchenko, Arvind Yadav).
- Fix the ACPI Generic Event Device (GED) driver to free IRQs on
shutdown and clean up the PCI IRQ Link driver (Sinan Kaya).
- Make the GHES code call into the AER driver on all errors and
clean up the ACPI APEI code (Colin Ian King, Tyler Baicar).
- Make the IA64 ACPI NUMA code parse all SRAT entries (Ganapatrao
Kulkarni).
- Add a lid switch blacklist to the ACPI button driver and make it
print extra debug messages on lid events (Hans de Goede).
- Add quirks for Asus GL502VSK and UX305LA to the ACPI battery
driver and clean it up somewhat (Bjørn Mork, Kai-Heng Feng).
- Add device link for CHT SD card dependency on I2C to the ACPI
LPSS (Intel SoCs) driver and make it avoid creating platform
device objects for devices without MMIO resources (Adrian Hunter,
Hans de Goede).
- Fix the ACPI GPE mask kernel command line parameter handling
(Prarit Bhargava).
- Fix the handling of (incorrectly exposed) backlight interfaces
without LCD (Hans de Goede).
- Fix the usage of debugfs_create_*() in the ACPI EC driver (Geert
Uytterhoeven).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJaY/BrAAoJEILEb/54YlRxR10P/1dVxfhLiGBrwKzA1urr71Vg
LH6ZdlIlihyu9a1PZHjfO72IuZCMSkSnoJUPJPFK6FNA0hIDqsP+hC8gcknCxnAU
i6r2ZzQesOzzjGblpASvdDg0GkYe9r6sHpUQ0xW/hnijamforflGveW1bagbnFuI
gvT6m6+lMJwBd0NrWhQiTJmTuSTwgJBXDA+HhlDnGd6ziVfHPaCxon4L9GQfVhsb
jbOI/kBjnEKoN1dBbEAcSpgzklVUXUj4x2NHUMCyvKOJyKG/F7Ycbghux9t3C+ej
1T0XJAU7K3hkmstWkWwylqVZt3UW47xiJKe6K2Z5p3CaJx0cnI18C+g7x/IcRGiA
+J/Uco+xeMa8yqYV96j+AJexpUDu7fYo6B4nRZ/K+MjWifboeSLKn8PHLKhqYn6k
sV3s0dUf8SJK5pTu+IkAgzDzsw/uJAI8Rylmig9ea12/nIt6EH3Kero31hi3lkoN
Y2rdi9MIqFIj2tX42047Y/q2UEFkMWGO3q8fLkXRvWPwnwStHDDFVj/kd19CWcTy
B1kNxNQQS/Q9u0uoW4rIHW6ipEU2sqyt/tvVQnmJlVP0HuO++uIO7h3mrBDxhSJS
zQF4qtusbMHC85BHrozmGFECN8Cex8DYSUTO/yCoBvMlMxJ7UlSt25TBN+SzmnOV
H1LhVRcFh1488lXzuM+u
=/eCQ
-----END PGP SIGNATURE-----
Merge tag 'acpi-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"The majority of this is an update of the ACPICA kernel code to
upstream revision 20171215 with a cosmetic change and a maintainers
information update on top of it.
The rest is mostly some minor fixes and cleanups in the ACPI drivers
and cleanups to initialization on x86.
Specifics:
- Update the ACPICA kernel code to upstream revision 20171215 including:
* Support for ACPI 6.0A changes in the NFIT table (Bob Moore)
* Local 64-bit divide in string conversions (Bob Moore)
* Fix for a regression in acpi_evaluate_object_type() (Bob Moore)
* Fixes for memory leaks during package object resolution (Bob
Moore)
* Deployment of safe version of strncpy() (Bob Moore)
* Debug and messaging updates (Bob Moore)
* Support for PDTT, SDEV, TPM2 tables in iASL and tools (Bob
Moore)
* Null pointer dereference avoidance in Op and cleanups (Colin Ian
King)
* Fix for memory leak from building prefixed pathname (Erik
Schmauss)
* Coding style fixes, disassembler and compiler updates (Hanjun
Guo, Erik Schmauss)
* Additional PPTT flags from ACPI 6.2 (Jeremy Linton)
* Fix for an off-by-one error in acpi_get_timer_duration()
(Jung-uk Kim)
* Infinite loop detection timeout and utilities cleanups (Lv
Zheng)
* Windows 10 version 1607 and 1703 OSI strings (Mario
Limonciello)
- Update ACPICA information in MAINTAINERS to reflect the current
status of ACPICA maintenance and rename a local variable in one
function to match the corresponding upstream code (Rafael Wysocki)
- Clean up ACPI-related initialization on x86 (Andy Shevchenko)
- Add support for Intel Merrifield to the ACPI GPIO code (Andy
Shevchenko)
- Clean up ACPI PMIC drivers (Andy Shevchenko, Arvind Yadav)
- Fix the ACPI Generic Event Device (GED) driver to free IRQs on
shutdown and clean up the PCI IRQ Link driver (Sinan Kaya)
- Make the GHES code call into the AER driver on all errors and clean
up the ACPI APEI code (Colin Ian King, Tyler Baicar)
- Make the IA64 ACPI NUMA code parse all SRAT entries (Ganapatrao
Kulkarni)
- Add a lid switch blacklist to the ACPI button driver and make it
print extra debug messages on lid events (Hans de Goede)
- Add quirks for Asus GL502VSK and UX305LA to the ACPI battery driver
and clean it up somewhat (Bjørn Mork, Kai-Heng Feng)
- Add device link for CHT SD card dependency on I2C to the ACPI LPSS
(Intel SoCs) driver and make it avoid creating platform device
objects for devices without MMIO resources (Adrian Hunter, Hans de
Goede)
- Fix the ACPI GPE mask kernel command line parameter handling
(Prarit Bhargava)
- Fix the handling of (incorrectly exposed) backlight interfaces
without LCD (Hans de Goede)
- Fix the usage of debugfs_create_*() in the ACPI EC driver (Geert
Uytterhoeven)"
* tag 'acpi-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (62 commits)
ACPI/PCI: pci_link: reduce verbosity when IRQ is enabled
ACPI / LPSS: Do not instiate platform_dev for devs without MMIO resources
ACPI / PMIC: Convert to use builtin_platform_driver() macro
ACPI / x86: boot: Propagate error code in acpi_gsi_to_irq()
ACPICA: Update version to 20171215
ACPICA: trivial style fix, no functional change
ACPICA: Fix a couple memory leaks during package object resolution
ACPICA: Recognize the Windows 10 version 1607 and 1703 OSI strings
ACPICA: DT compiler: prevent error if optional field at the end of table is not present
ACPICA: Rename a global variable, no functional change
ACPICA: Create and deploy safe version of strncpy
ACPICA: Cleanup the global variables and update comments
ACPICA: Debugger: fix slight indentation issue
ACPICA: Fix a regression in the acpi_evaluate_object_type() interface
ACPICA: Update for a few debug output statements
ACPICA: Debug output, no functional change
ACPI: EC: Fix debugfs_create_*() usage
ACPI / video: Default lcd_only to true on Win8-ready and newer machines
ACPI / x86: boot: Don't setup SCI on HW-reduced platforms
ACPI / x86: boot: Use INVALID_ACPI_IRQ instead of 0 for acpi_sci_override_gsi
...
- Define a PM driver flag allowing drivers to request that their
devices be left in suspend after system-wide transitions to the
working state if possible and add support for it to the PCI bus
type and the ACPI PM domain (Rafael Wysocki).
- Make the PM core carry out optimizations for devices with driver
PM flags set in some cases and make a few drivers set those flags
(Rafael Wysocki).
- Fix and clean up wrapper routines allowing runtime PM device
callbacks to be re-used for system-wide PM, change the generic
power domains (genpd) framework to stop using those routines
incorrectly and fix up a driver depending on that behavior of
genpd (Rafael Wysocki, Ulf Hansson, Geert Uytterhoeven).
- Fix and clean up the PM core's device wakeup framework and
re-factor system-wide PM core code related to device wakeup
(Rafael Wysocki, Ulf Hansson, Brian Norris).
- Make more x86-based systems use the Low Power Sleep S0 _DSM
interface by default (to fix power button wakeup from
suspend-to-idle on Surface Pro3) and add a kernel command line
switch to tell it to ignore the system sleep blacklist in the
ACPI core (Rafael Wysocki).
- Fix a race condition related to cpufreq governor module removal
and clean up the governor management code in the cpufreq core
(Rafael Wysocki).
- Drop the unused generic code related to the handling of the static
power energy usage model in the CPU cooling thermal driver along
with the corresponding documentation (Viresh Kumar).
- Add mt2712 support to the Mediatek cpufreq driver (Andrew-sh Cheng).
- Add a new operating point to the imx6ul and imx6q cpufreq drivers
and switch the latter to using clk_bulk_get() (Anson Huang, Dong
Aisheng).
- Add support for multiple regulators to the TI cpufreq driver along
with a new DT binding related to that and clean up that driver
somewhat (Dave Gerlach).
- Fix a powernv cpufreq driver regression leading to incorrect CPU
frequency reporting, fix that driver to deal with non-continguous
P-states correctly and clean it up (Gautham Shenoy, Shilpasri Bhat).
- Add support for frequency scaling on Armada 37xx SoCs through the
generic DT cpufreq driver (Gregory CLEMENT).
- Fix error code paths in the mvebu cpufreq driver (Gregory CLEMENT).
- Fix a transition delay setting regression in the longhaul cpufreq
driver (Viresh Kumar).
- Add Skylake X (server) support to the intel_pstate cpufreq driver
and clean up that driver somewhat (Srinivas Pandruvada).
- Clean up the cpufreq statistics collection code (Viresh Kumar).
- Drop cluster terminology and dependency on physical_package_id
from the PSCI driver and drop dependency on arm_big_little from
the SCPI cpufreq driver (Sudeep Holla).
- Add support for system-wide suspend and resume to the RAPL power
capping driver and drop a redundant semicolon from it (Zhen Han,
Luis de Bethencourt).
- Make SPI domain validation (in the SCSI SPI transport driver) and
system-wide suspend mutually exclusive as they rely on the same
underlying mechanism and cannot be carried out at the same time
(Bart Van Assche).
- Fix the computation of the amount of memory to preallocate in the
hibernation core and clean up one function in there (Rainer Fiebig,
Kyungsik Lee).
- Prepare the Operating Performance Points (OPP) framework for being
used with power domains and clean up one function in it (Viresh
Kumar, Wei Yongjun).
- Clean up the generic sysfs interface for device PM (Andy Shevchenko).
- Fix several minor issues in power management frameworks and clean
them up a bit (Arvind Yadav, Bjorn Andersson, Geert Uytterhoeven,
Gustavo Silva, Julia Lawall, Luis de Bethencourt, Paul Gortmaker,
Sergey Senozhatsky, gaurav jindal).
- Make it easier to disable PM via Kconfig (Mark Brown).
- Clean up the cpupower and intel_pstate_tracer utilities (Doug
Smythies, Laura Abbott).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJaYw2iAAoJEILEb/54YlRxLHwP/iabmAcbXBeg30/wSCKcWB6f
Ar785YbkFedNP7b2dypR7bcKIkaV55EExNHHoVuvC6gKrW+zx3F39v9QzK3HBKfw
DgLWMjxR5Xdm9o8o2chsBEMl0itSRB9s864s+AAAElP+qjyT6kmbFyRFgVYLiNH0
v9jNhPF9EmirViwES/syELa/P1AJDMxCb/SbRY+Xp1sPhGKlx2J/2eQsVDs7G+wL
2BJeyBqwL9D78U/eY2bvpCoZLpmZmklx1eY5iK3Mzo6LZKYMaSypgkGuRfh//K+a
8vFLwOBsOlpZ8lsPBRatV5+SMu8qMQMTnstui1m3/9bOPFfjymat6u0lLw4BV2hv
zrNfqWOiwTAt/fczR1/naYuuSeRCLABvYDKjs/9iYdrCZYJ+n+ZzU/wi5geswDtD
cQKDMOdOBrnfkN0Vqpw6ZBqun0RDldNT/+6oy93tHWBlF0CA4mMq5jr8q3iH35CW
8TA1GCkurHZXTyYdYXR5SUHxPbOgZC87GAb7RlFEJJnvvkmy3jmBng675Hl5XAn7
D8eJp3d4h5n121pkMLGcBc7K036T2uFsjrHWx+QsjKFUBWUBnuRfInRrLA5WnGo2
U+KIEUPepdnbFFvYNv+kTgz2uE6FOqycEmnUKUKWUZYPN0GDAOw/V3813uxVRYtq
27omIOL7PJp1wWjQnfXK
=dnb7
-----END PGP SIGNATURE-----
Merge tag 'pm-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"This includes some infrastructure changes in the PM core, mostly
related to integration between runtime PM and system-wide suspend and
hibernation, plus some driver changes depending on them and fixes for
issues in that area which have become quite apparent recently.
Also included are changes making more x86-based systems use the Low
Power Sleep S0 _DSM interface by default, which turned out to be
necessary to handle power button wakeups from suspend-to-idle on
Surface Pro3.
On the cpufreq front we have fixes and cleanups in the core, some new
hardware support, driver updates and the removal of some unused code
from the CPU cooling thermal driver.
Apart from this, the Operating Performance Points (OPP) framework is
prepared to be used with power domains in the future and there is a
usual bunch of assorted fixes and cleanups.
Specifics:
- Define a PM driver flag allowing drivers to request that their
devices be left in suspend after system-wide transitions to the
working state if possible and add support for it to the PCI bus
type and the ACPI PM domain (Rafael Wysocki).
- Make the PM core carry out optimizations for devices with driver PM
flags set in some cases and make a few drivers set those flags
(Rafael Wysocki).
- Fix and clean up wrapper routines allowing runtime PM device
callbacks to be re-used for system-wide PM, change the generic
power domains (genpd) framework to stop using those routines
incorrectly and fix up a driver depending on that behavior of genpd
(Rafael Wysocki, Ulf Hansson, Geert Uytterhoeven).
- Fix and clean up the PM core's device wakeup framework and
re-factor system-wide PM core code related to device wakeup
(Rafael Wysocki, Ulf Hansson, Brian Norris).
- Make more x86-based systems use the Low Power Sleep S0 _DSM
interface by default (to fix power button wakeup from
suspend-to-idle on Surface Pro3) and add a kernel command line
switch to tell it to ignore the system sleep blacklist in the ACPI
core (Rafael Wysocki).
- Fix a race condition related to cpufreq governor module removal and
clean up the governor management code in the cpufreq core (Rafael
Wysocki).
- Drop the unused generic code related to the handling of the static
power energy usage model in the CPU cooling thermal driver along
with the corresponding documentation (Viresh Kumar).
- Add mt2712 support to the Mediatek cpufreq driver (Andrew-sh
Cheng).
- Add a new operating point to the imx6ul and imx6q cpufreq drivers
and switch the latter to using clk_bulk_get() (Anson Huang, Dong
Aisheng).
- Add support for multiple regulators to the TI cpufreq driver along
with a new DT binding related to that and clean up that driver
somewhat (Dave Gerlach).
- Fix a powernv cpufreq driver regression leading to incorrect CPU
frequency reporting, fix that driver to deal with non-continguous
P-states correctly and clean it up (Gautham Shenoy, Shilpasri
Bhat).
- Add support for frequency scaling on Armada 37xx SoCs through the
generic DT cpufreq driver (Gregory CLEMENT).
- Fix error code paths in the mvebu cpufreq driver (Gregory CLEMENT).
- Fix a transition delay setting regression in the longhaul cpufreq
driver (Viresh Kumar).
- Add Skylake X (server) support to the intel_pstate cpufreq driver
and clean up that driver somewhat (Srinivas Pandruvada).
- Clean up the cpufreq statistics collection code (Viresh Kumar).
- Drop cluster terminology and dependency on physical_package_id from
the PSCI driver and drop dependency on arm_big_little from the SCPI
cpufreq driver (Sudeep Holla).
- Add support for system-wide suspend and resume to the RAPL power
capping driver and drop a redundant semicolon from it (Zhen Han,
Luis de Bethencourt).
- Make SPI domain validation (in the SCSI SPI transport driver) and
system-wide suspend mutually exclusive as they rely on the same
underlying mechanism and cannot be carried out at the same time
(Bart Van Assche).
- Fix the computation of the amount of memory to preallocate in the
hibernation core and clean up one function in there (Rainer Fiebig,
Kyungsik Lee).
- Prepare the Operating Performance Points (OPP) framework for being
used with power domains and clean up one function in it (Viresh
Kumar, Wei Yongjun).
- Clean up the generic sysfs interface for device PM (Andy
Shevchenko).
- Fix several minor issues in power management frameworks and clean
them up a bit (Arvind Yadav, Bjorn Andersson, Geert Uytterhoeven,
Gustavo Silva, Julia Lawall, Luis de Bethencourt, Paul Gortmaker,
Sergey Senozhatsky, gaurav jindal).
- Make it easier to disable PM via Kconfig (Mark Brown).
- Clean up the cpupower and intel_pstate_tracer utilities (Doug
Smythies, Laura Abbott)"
* tag 'pm-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (89 commits)
PCI / PM: Remove spurious semicolon
cpufreq: scpi: remove arm_big_little dependency
drivers: psci: remove cluster terminology and dependency on physical_package_id
powercap: intel_rapl: Fix trailing semicolon
dmaengine: rcar-dmac: Make DMAC reinit during system resume explicit
PM / runtime: Allow no callbacks in pm_runtime_force_suspend|resume()
PM / hibernate: Drop unused parameter of enough_swap
PM / runtime: Check ignore_children in pm_runtime_need_not_resume()
PM / runtime: Rework pm_runtime_force_suspend/resume()
PM / genpd: Stop/start devices without pm_runtime_force_suspend/resume()
cpufreq: powernv: Dont assume distinct pstate values for nominal and pmin
cpufreq: intel_pstate: Add Skylake servers support
cpufreq: intel_pstate: Replace bxt_funcs with core_funcs
platform/x86: surfacepro3: Support for wakeup from suspend-to-idle
ACPI / PM: Use Low Power S0 Idle on more systems
PM / wakeup: Print warn if device gets enabled as wakeup source during sleep
PM / domains: Don't skip driver's ->suspend|resume_noirq() callbacks
PM / core: Propagate wakeup_path status flag in __device_suspend_late()
PM / core: Re-structure code for clearing the direct_complete flag
powercap: add suspend and resume mechanism for SOC power limit
...
Pull x86 fixes from Thomas Gleixner:
"A set of small fixes for 4.15:
- Fix vmapped stack synchronization on systems with 4-level paging
and a large amount of memory caused by a missing 5-level folding
which made the pgd synchronization logic to fail and causing double
faults.
- Add a missing sanity check in the vmalloc_fault() logic on 5-level
paging systems.
- Bring back protection against accessing a freed initrd in the
microcode loader which was lost by a wrong merge conflict
resolution.
- Extend the Broadwell micro code loading sanity check.
- Add a missing ENDPROC annotation in ftrace assembly code which
makes ORC unhappy.
- Prevent loading the AMD power module on !AMD platforms. The load
itself is uncritical, but an unload attempt results in a kernel
crash.
- Update Peter Anvins role in the MAINTAINERS file"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ftrace: Add one more ENDPROC annotation
x86: Mark hpa as a "Designated Reviewer" for the time being
x86/mm/64: Tighten up vmalloc_fault() sanity checks on 5-level kernels
x86/mm/64: Fix vmapped stack syncing on very-large-memory 4-level systems
x86/microcode: Fix again accessing initrd after having been freed
x86/microcode/intel: Extend BDW late-loading further with LLC size check
perf/x86/amd/power: Do not load AMD power module on !AMD platforms
When ORC support was added for the ftrace_64.S code, an ENDPROC
for function_hook() was missed. This results in the following warning:
arch/x86/kernel/ftrace_64.o: warning: objtool: .entry.text+0x0: unreachable instruction
Fixes: e2ac83d74a ("x86/ftrace: Fix ORC unwinding from ftrace handlers")
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20180128022150.dqierscqmt3uwwsr@treble
We want to expose the hardware features simply in /proc/cpuinfo as "ibrs",
"ibpb" and "stibp". Since AMD has separate CPUID bits for those, use them
as the user-visible bits.
When the Intel SPEC_CTRL bit is set which indicates both IBRS and IBPB
capability, set those (AMD) bits accordingly. Likewise if the Intel STIBP
bit is set, set the AMD STIBP that's used for the generic hardware
capability.
Hide the rest from /proc/cpuinfo by putting "" in the comments. Including
RETPOLINE and RETPOLINE_AMD which shouldn't be visible there. There are
patches to make the sysfs vulnerabilities information non-readable by
non-root, and the same should apply to all information about which
mitigations are actually in use. Those *shouldn't* appear in /proc/cpuinfo.
The feature bit for whether IBPB is actually used, which is needed for
ALTERNATIVEs, is renamed to X86_FEATURE_USE_IBPB.
Originally-by: Borislav Petkov <bp@suse.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ak@linux.intel.com
Cc: dave.hansen@intel.com
Cc: karahmed@amazon.de
Cc: arjan@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: pbonzini@redhat.com
Cc: tim.c.chen@linux.intel.com
Cc: gregkh@linux-foundation.org
Link: https://lkml.kernel.org/r/1517070274-12128-2-git-send-email-dwmw@amazon.co.uk
If sysfs is disabled and RETPOLINE not defined:
arch/x86/kernel/cpu/bugs.c:97:13: warning: ‘spectre_v2_bad_module’ defined but not used
[-Wunused-variable]
static bool spectre_v2_bad_module;
Hide it.
Fixes: caf7501a1b ("module/retpoline: Warn about missing retpoline in module")
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
There's a risk that a kernel which has full retpoline mitigations becomes
vulnerable when a module gets loaded that hasn't been compiled with the
right compiler or the right option.
To enable detection of that mismatch at module load time, add a module info
string "retpoline" at build time when the module was compiled with
retpoline support. This only covers compiled C source, but assembler source
or prebuilt object files are not checked.
If a retpoline enabled kernel detects a non retpoline protected module at
load time, print a warning and report it in the sysfs vulnerability file.
[ tglx: Massaged changelog ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: gregkh@linuxfoundation.org
Cc: torvalds@linux-foundation.org
Cc: jeyu@kernel.org
Cc: arjan@linux.intel.com
Link: https://lkml.kernel.org/r/20180125235028.31211-1-andi@firstfloor.org
Centaur CPU has a constant frequency TSC and that TSC does not stop in
C-States. But because the corresponding TSC feature flags are not set for
that CPU, the TSC is treated as not constant frequency and assumed to stop
in C-States, which makes it an unreliable and unusable clock source.
Setting those flags tells the kernel that the TSC is usable, so it will
select it over HPET. The effect of this is that reading time stamps (from
kernel or user space) will be faster and more efficent.
Signed-off-by: davidwang <davidwang@zhaoxin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: qiyuanwang@zhaoxin.com
Cc: linux-pm@vger.kernel.org
Cc: brucechang@via-alliance.com
Cc: cooperyan@zhaoxin.com
Cc: benjaminpan@viatech.com
Link: https://lkml.kernel.org/r/1516616057-5158-1-git-send-email-davidwang@zhaoxin.com
Commit 24c2503255 ("x86/microcode: Do not access the initrd after it has
been freed") fixed attempts to access initrd from the microcode loader
after it has been freed. However, a similar KASAN warning was reported
(stack trace edited):
smpboot: Booting Node 0 Processor 1 APIC 0x11
==================================================================
BUG: KASAN: use-after-free in find_cpio_data+0x9b5/0xa50
Read of size 1 at addr ffff880035ffd000 by task swapper/1/0
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.8-slack #7
Hardware name: System manufacturer System Product Name/A88X-PLUS, BIOS 3003 03/10/2016
Call Trace:
dump_stack
print_address_description
kasan_report
? find_cpio_data
__asan_report_load1_noabort
find_cpio_data
find_microcode_in_initrd
__load_ucode_amd
load_ucode_amd_ap
load_ucode_ap
After some investigation, it turned out that a merge was done using the
wrong side to resolve, leading to picking up the previous state, before
the 24c2503255 fix. Therefore the Fixes tag below contains a merge
commit.
Revert the mismerge by catching the save_microcode_in_initrd_amd()
retval and thus letting the function exit with the last return statement
so that initrd_gone can be set to true.
Fixes: f26483eaed ("Merge branch 'x86/urgent' into x86/microcode, to resolve conflicts")
Reported-by: <higuita@gmx.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=198295
Link: https://lkml.kernel.org/r/20180123104133.918-2-bp@alien8.de
Commit b94b737331 ("x86/microcode/intel: Extend BDW late-loading with a
revision check") reduced the impact of erratum BDF90 for Broadwell model
79.
The impact can be reduced further by checking the size of the last level
cache portion per core.
Tony: "The erratum says the problem only occurs on the large-cache SKUs.
So we only need to avoid the update if we are on a big cache SKU that is
also running old microcode."
For more details, see erratum BDF90 in document #334165 (Intel Xeon
Processor E7-8800/4800 v4 Product Family Specification Update) from
September 2017.
Fixes: b94b737331 ("x86/microcode/intel: Extend BDW late-loading with a revision check")
Signed-off-by: Jia Zhang <zhang.jia@linux.alibaba.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1516321542-31161-1-git-send-email-zhang.jia@linux.alibaba.com
The function tracer can create a dynamically allocated trampoline that is
called by the function mcount or fentry hook that is used to call the
function callback that is registered. The problem is that the orc undwinder
will bail if it encounters one of these trampolines. This breaks the stack
trace of function callbacks, which include the stack tracer and setting the
stack trace for individual functions.
Since these dynamic trampolines are basically copies of the static ftrace
trampolines defined in ftrace_*.S, we do not need to create new orc entries
for the dynamic trampolines. Finding the return address on the stack will be
identical as the functions that were copied to create the dynamic
trampolines. When encountering a ftrace dynamic trampoline, we can just use
the orc entry of the ftrace static function that was copied for that
trampoline.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt discovered that the ftrace stack tracer is broken when
it's used with the ORC unwinder. The problem is that objtool is
instructed by the Makefile to ignore the ftrace_64.S code, so it doesn't
generate any ORC data for it.
Fix it by making the asm code objtool-friendly:
- Objtool doesn't like the fact that save_mcount_regs pushes RBP at the
beginning, but it's never restored (directly, at least). So just skip
the original RBP push, which is only needed for frame pointers anyway.
- Annotate some functions as normal callable functions with
ENTRY/ENDPROC.
- Add an empty unwind hint to return_to_handler(). The return address
isn't on the stack, so there's nothing ORC can do there. It will just
punt in the unlikely case it tries to unwind from that code.
With all that fixed, remove the OBJECT_FILES_NON_STANDARD Makefile
annotation so objtool can read the file.
Link: http://lkml.kernel.org/r/20180123040746.ih4ep3tk4pbjvg7c@treble
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Today 4 architectures set ARCH_SUPPORTS_MEMORY_FAILURE (arm64, parisc,
powerpc, and x86), while 4 other architectures set __ARCH_SI_TRAPNO
(alpha, metag, sparc, and tile). These two sets of architectures do
not interesect so remove the trapno paramater to remove confusion.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Pull x86 pti fixes from Thomas Gleixner:
"A small set of fixes for the meltdown/spectre mitigations:
- Make kprobes aware of retpolines to prevent probes in the retpoline
thunks.
- Make the machine check exception speculation protected. MCE used to
issue an indirect call directly from the ASM entry code. Convert
that to a direct call into a C-function and issue the indirect call
from there so the compiler can add the retpoline protection,
- Make the vmexit_fill_RSB() assembly less stupid
- Fix a typo in the PTI documentation"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/retpoline: Optimize inline assembler for vmexit_fill_RSB
x86/pti: Document fix wrong index
kprobes/x86: Disable optimizing on the function jumps to indirect thunk
kprobes/x86: Blacklist indirect thunk functions for kprobes
retpoline: Introduce start/end markers of indirect thunk
x86/mce: Make machine check speculation protected
Limiting the scan width to the known last bus via the command line can
accelerate the boot noteworthy.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jailhouse <jailhouse-dev@googlegroups.com>
Link: https://lkml.kernel.org/r/51f5fe62-ca8f-9286-5cdb-39df3fad78b4@siemens.com
Otherwise, Linux will not recognize precalibrated_tsc_khz and disable
the tsc as clocksource.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jailhouse <jailhouse-dev@googlegroups.com>
Link: https://lkml.kernel.org/r/975fbfc9-2a64-cc56-40d5-164992ec3916@siemens.com
Since indirect jump instructions will be replaced by jump
to __x86_indirect_thunk_*, those jmp instruction must be
treated as an indirect jump. Since optprobe prohibits to
optimize probes in the function which uses an indirect jump,
it also needs to find out the function which jump to
__x86_indirect_thunk_* and disable optimization.
Add a check that the jump target address is between the
__indirect_thunk_start/end when optimizing kprobe.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/151629212062.10241.6991266100233002273.stgit@devbox
Introduce start/end markers of __x86_indirect_thunk_* functions.
To make it easy, consolidate .text.__x86.indirect_thunk.* sections
to one .text.__x86.indirect_thunk section and put it in the
end of kernel text section and adds __indirect_thunk_start/end
so that other subsystem (e.g. kprobes) can identify it.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/151629206178.10241.6828804696410044771.stgit@devbox
The machine check idtentry uses an indirect branch directly from the low
level code. This evades the speculation protection.
Replace it by a direct call into C code and issue the indirect call there
so the compiler can apply the proper speculation protection.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by:Borislav Petkov <bp@alien8.de>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Niced-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801181626290.1847@nanos
Some issues have been reported with the for loop in stop_this_cpu() that
issues the 'wbinvd; hlt' sequence. Reverting this sequence to halt()
has been shown to resolve the issue.
However, the wbinvd is needed when running with SME. The reason for the
wbinvd is to prevent cache flush races between encrypted and non-encrypted
entries that have the same physical address. This can occur when
kexec'ing from memory encryption active to inactive or vice-versa. The
important thing is to not have outside of kernel text memory references
(such as stack usage), so the usage of the native_*() functions is needed
since these expand as inline asm sequences. So instead of reverting the
change, rework the sequence.
Move the wbinvd instruction outside of the for loop as native_wbinvd()
and make its execution conditional on X86_FEATURE_SME. In the for loop,
change the asm 'wbinvd; hlt' sequence back to a halt sequence but use
the native_halt() call.
Fixes: bba4ed011a ("x86/mm, kexec: Allow kexec to be used with SME")
Reported-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yu Chen <yu.c.chen@intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: kexec@lists.infradead.org
Cc: ebiederm@redhat.com
Cc: Borislav Petkov <bp@alien8.de>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180117234141.21184.44067.stgit@tlendack-t1.amdoffice.net
L2 CDP can be controlled by kernel parameter "rdt=".
If "rdt=l2cdp", L2 CDP is turned on.
If "rdt=!l2cdp", L2 CDP is turned off.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: Vikas" <vikas.shivappa@intel.com>
Cc: Sai Praneeth" <sai.praneeth.prakhya@intel.com>
Cc: Reinette" <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/1513810644-78015-7-git-send-email-fenghua.yu@intel.com
Bit 0 in MSR IA32_L2_QOS_CFG (0xc82) is L2 CDP enable bit. By default,
the bit is zero, i.e. L2 CAT is enabled, and L2 CDP is disabled. When
the resctrl mount parameter "cdpl2" is given, the bit is set to 1 and L2
CDP is enabled.
In L2 CDP mode, the L2 CAT mask MSRs are re-mapped into interleaved pairs
of mask MSRs for code (referenced by an odd CLOSID) and data (referenced by
an even CLOSID).
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: Vikas" <vikas.shivappa@intel.com>
Cc: Sai Praneeth" <sai.praneeth.prakhya@intel.com>
Cc: Reinette" <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/1513810644-78015-6-git-send-email-fenghua.yu@intel.com
L2 data and L2 code are added as new resources in rdt_resources_all[]
and data in the resources are configured.
When L2 CDP is enabled, the schemata will have the two resources in
this format:
L2DATA:l2id0=xxxx;l2id1=xxxx;....
L2CODE:l2id0=xxxx;l2id1=xxxx;....
xxxx represent CBM (Cache Bit Mask) values in the schemata, similar to all
others (L2 CAT/L3 CAT/L3 CDP).
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: Vikas" <vikas.shivappa@intel.com>
Cc: Sai Praneeth" <sai.praneeth.prakhya@intel.com>
Cc: Reinette" <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/1513810644-78015-5-git-send-email-fenghua.yu@intel.com
* acpi-pm:
platform/x86: surfacepro3: Support for wakeup from suspend-to-idle
ACPI / PM: Use Low Power S0 Idle on more systems
ACPI / PM: Make it possible to ignore the system sleep blacklist
* pm-sleep:
PM / hibernate: Drop unused parameter of enough_swap
block, scsi: Fix race between SPI domain validation and system suspend
PM / sleep: Make lock/unlock_system_sleep() available to kernel modules
PM: hibernate: Do not subtract NR_FILE_MAPPED in minimum_image_size()
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJaW+iVAAoJEHm+PkMAQRiGCDsIAJALNpX7odTx/8y+yCSWbpBH
E57iwr4rmnI6tXJY6gqBUWTYnjAcf4b8IsHGCO6q3WIE3l/kt+m3eA21a32mF2Db
/bfPGTOWu5LoOnFqzgH2kiFuC3Y474toxpld2YtkQWYxi5W7SUtIHi/jGgkUprth
g15yPfwYgotJd/gpmPfBDMPlYDYvLlnPYbTG6ZWdMbg39m2RF2m0BdQ6aBFLHvbJ
IN0tjCM6hrLFBP0+6Zn60pevUW9/AFYotZn2ankNTk5QVCQm14rgQIP+Pfoa5WpE
I25r0DbkG2jKJCq+tlgIJjxHKD37GEDMc4T8/5Y8CNNeT9Q8si9EWvznjaAPazw=
=o5gx
-----END PGP SIGNATURE-----
BackMerge tag 'v4.15-rc8' into drm-next
Linux 4.15-rc8
Daniel requested this for so the intel CI won't fall over on drm-next
so often.
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- A rather involved set of memory hardware encryption fixes to
support the early loading of microcode files via the initrd. These
are larger than what we normally take at such a late -rc stage, but
there are two mitigating factors: 1) much of the changes are
limited to the SME code itself 2) being able to early load
microcode has increased importance in the post-Meltdown/Spectre
era.
- An IRQ vector allocator fix
- An Intel RDT driver use-after-free fix
- An APIC driver bug fix/revert to make certain older systems boot
again
- A pkeys ABI fix
- TSC calibration fixes
- A kdump fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic/vector: Fix off by one in error path
x86/intel_rdt/cqm: Prevent use after free
x86/mm: Encrypt the initrd earlier for BSP microcode update
x86/mm: Prepare sme_encrypt_kernel() for PAGE aligned encryption
x86/mm: Centralize PMD flags in sme_encrypt_kernel()
x86/mm: Use a struct to reduce parameters for SME PGD mapping
x86/mm: Clean up register saving in the __enc_copy() assembly code
x86/idt: Mark IDT tables __initconst
Revert "x86/apic: Remove init_bsp_APIC()"
x86/mm/pkeys: Fix fill_sig_info_pkey
x86/tsc: Print tsc_khz, when it differs from cpu_khz
x86/tsc: Fix erroneous TSC rate on Skylake Xeon
x86/tsc: Future-proof native_calibrate_tsc()
kdump: Write the correct address of mem_section into vmcoreinfo
Pull x86 pti bits and fixes from Thomas Gleixner:
"This last update contains:
- An objtool fix to prevent a segfault with the gold linker by
changing the invocation order. That's not just for gold, it's a
general robustness improvement.
- An improved error message for objtool which spares tearing hairs.
- Make KASAN fail loudly if there is not enough memory instead of
oopsing at some random place later
- RSB fill on context switch to prevent RSB underflow and speculation
through other units.
- Make the retpoline/RSB functionality work reliably for both Intel
and AMD
- Add retpoline to the module version magic so mismatch can be
detected
- A small (non-fix) update for cpufeatures which prevents cpu feature
clashing for the upcoming extra mitigation bits to ease
backporting"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
module: Add retpoline tag to VERMAGIC
x86/cpufeature: Move processor tracing out of scattered features
objtool: Improve error message for bad file argument
objtool: Fix seg fault with gold linker
x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros
x86/retpoline: Fill RSB on context switch for affected CPUs
x86/kasan: Panic if there is not enough memory to boot
Keith reported the following warning:
WARNING: CPU: 28 PID: 1420 at kernel/irq/matrix.c:222 irq_matrix_remove_managed+0x10f/0x120
x86_vector_free_irqs+0xa1/0x180
x86_vector_alloc_irqs+0x1e4/0x3a0
msi_domain_alloc+0x62/0x130
The reason for this is that if the vector allocation fails the error
handling code tries to free the failed vector as well, which causes the
above imbalance warning to trigger.
Adjust the error path to handle this correctly.
Fixes: b5dc8e6c21 ("x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors")
Reported-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Keith Busch <keith.busch@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801161217300.1823@nanos
intel_rdt_iffline_cpu() -> domain_remove_cpu() frees memory first and then
proceeds accessing it.
BUG: KASAN: use-after-free in find_first_bit+0x1f/0x80
Read of size 8 at addr ffff883ff7c1e780 by task cpuhp/31/195
find_first_bit+0x1f/0x80
has_busy_rmid+0x47/0x70
intel_rdt_offline_cpu+0x4b4/0x510
Freed by task 195:
kfree+0x94/0x1a0
intel_rdt_offline_cpu+0x17d/0x510
Do the teardown first and then free memory.
Fixes: 24247aeeab ("x86/intel_rdt/cqm: Improve limbo list processing")
Reported-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Peter Zilstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "Roderick W. Smith" <rod.smith@canonical.com>
Cc: 1733662@bugs.launchpad.net
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801161957510.2366@nanos
Processor tracing is already enumerated in word 9 (CPUID[7,0].EBX),
so do not duplicate it in the scattered features word.
Besides being more tidy, this will be useful for KVM when it presents
processor tracing to the guests. KVM selects host features that are
supported by both the host kernel (depending on command line options,
CPU errata, or whatever) and KVM. Whenever a full feature word exists,
KVM's code is written in the expectation that the CPUID bit number
matches the X86_FEATURE_* bit number, but this is not the case for
X86_FEATURE_INTEL_PT.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luwei Kang <luwei.kang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/1516117345-34561-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This part of Secure Encrypted Virtualization (SEV) patch series focuses on KVM
changes required to create and manage SEV guests.
SEV is an extension to the AMD-V architecture which supports running encrypted
virtual machine (VMs) under the control of a hypervisor. Encrypted VMs have their
pages (code and data) secured such that only the guest itself has access to
unencrypted version. Each encrypted VM is associated with a unique encryption key;
if its data is accessed to a different entity using a different key the encrypted
guest's data will be incorrectly decrypted, leading to unintelligible data.
This security model ensures that hypervisor will no longer able to inspect or
alter any guest code or data.
The key management of this feature is handled by a separate processor known as
the AMD Secure Processor (AMD-SP) which is present on AMD SOCs. The SEV Key
Management Specification (see below) provides a set of commands which can be
used by hypervisor to load virtual machine keys through the AMD-SP driver.
The patch series adds a new ioctl in KVM driver (KVM_MEMORY_ENCRYPT_OP). The
ioctl will be used by qemu to issue SEV guest-specific commands defined in Key
Management Specification.
The following links provide additional details:
AMD Memory Encryption white paper:
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf
AMD64 Architecture Programmer's Manual:
http://support.amd.com/TechDocs/24593.pdf
SME is section 7.10
SEV is section 15.34
SEV Key Management:
http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf
KVM Forum Presentation:
http://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf
SEV Guest BIOS support:
SEV support has been add to EDKII/OVMF BIOS
https://github.com/tianocore/edk2
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Remote TLB flush does a busy wait which is fine in bare-metal
scenario. But with-in the guest, the vcpus might have been pre-empted or
blocked. In this scenario, the initator vcpu would end up busy-waiting
for a long amount of time; it also consumes CPU unnecessarily to wake
up the target of the shootdown.
This patch set adds support for KVM's new paravirtualized TLB flush;
remote TLB flush does not wait for vcpus that are sleeping, instead
KVM will flush the TLB as soon as the vCPU starts running again.
The improvement is clearly visible when the host is overcommitted; in this
case, the PV TLB flush (in addition to avoiding the wait on the main CPU)
prevents preempted vCPUs from stealing precious execution time from the
running ones.
Testing on a Xeon Gold 6142 2.6GHz 2 sockets, 32 cores, 64 threads,
so 64 pCPUs, and each VM is 64 vCPUs.
ebizzy -M
vanilla optimized boost
1VM 46799 48670 4%
2VM 23962 42691 78%
3VM 16152 37539 132%
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
The next patch will add another bit to the preempted field in
kvm_steal_time. Define a constant for bit 0 (the only one that is
currently used).
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Along with the fixes in UV4A (rev2) MMRs, the code to access those
MMRs also was modified by the fixes. UV3, UV4, and UV4A no longer
have compatible setups for Global Address Memory (GAM).
Correct the new mistakes.
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Acked-by: Andrew Banman <abanman@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dimitri Sivanich <sivanich@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1515440405-20880-6-git-send-email-mike.travis@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Upcoming Intel CascadeLake and IceLake processors have some architecture
changes that required fixes in the UV4 HUB bringing that chip to
revision 2. The nomenclature for that new chip is "UV4A".
This patch fixes the references for the expanded MMR definitions in the
previous (automated) patch.
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Acked-by: Andrew Banman <abanman@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dimitri Sivanich <sivanich@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1515440405-20880-3-git-send-email-mike.travis@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Among the existing architecture specific versions of
copy_siginfo_to_user32 there are several different implementation
problems. Some architectures fail to handle all of the cases in in
the siginfo union. Some architectures perform a blind copy of the
siginfo union when the si_code is negative. A blind copy suggests the
data is expected to be in 32bit siginfo format, which means that
receiving such a signal via signalfd won't work, or that the data is
in 64bit siginfo and the code is copying nonsense to userspace.
Create a single instance of copy_siginfo_to_user32 that all of the
architectures can share, and teach it to handle all of the cases in
the siginfo union correctly, with the assumption that siginfo is
stored internally to the kernel is 64bit siginfo format.
A special case is made for x86 x32 format. This is needed as presence
of both x32 and ia32 on x86_64 results in two different 32bit signal
formats. By allowing this small special case there winds up being
exactly one code base that needs to be maintained between all of the
architectures. Vastly increasing the testing base and the chances of
finding bugs.
As the x86 copy of copy_siginfo_to_user32 the call of the x86
signal_compat_build_tests were moved into sigaction_compat_abi, so
that they will keep running.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Currently the BSP microcode update code examines the initrd very early
in the boot process. If SME is active, the initrd is treated as being
encrypted but it has not been encrypted (in place) yet. Update the
early boot code that encrypts the kernel to also encrypt the initrd so
that early BSP microcode updates work.
Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180110192634.6026.10452.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The function copy_siginfo_from_user32 is used for two things, in ptrace
since the dawn of siginfo for arbirarily modifying a signal that
user space sees, and in sigqueueinfo to send a signal with arbirary
siginfo data.
Create a single copy of copy_siginfo_from_user32 that all architectures
share, and teach it to handle all of the cases in the siginfo union.
In the generic version of copy_siginfo_from_user32 ensure that all
of the fields in siginfo are initialized so that the siginfo structure
can be safely copied to userspace if necessary.
When copying the embedded sigval union copy the si_int member. That
ensures the 32bit values passes through the kernel unchanged.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Having si_codes in many different files simply encourages duplicate
definitions that can cause problems later. To avoid that merge the
ia64 specific si_codes into uapi/asm-generic/siginfo.h
Update the sanity checks in arch/x86/kernel/signal_compat.c to expect
the now lager NSIGILL and NSIGFPE. As nothing excpe the larger count
is exposed on x86 no additional code needs to be updated.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
--EWB Added #ifdef CONFIG_X86_X32_ABI to arch/x86/kernel/signal_compat.c
Changed #ifdef CONFIG_X86_X32 to #ifdef CONFIG_X86_X32_ABI in
linux/compat.h
CONFIG_X86_X32 is set when the user requests X32 support.
CONFIG_X86_X32_ABI is set when the user requests X32 support
and the tool-chain has X32 allowing X32 support to be built.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
x2apic_phys is not available when CONFIG_X86_X2APIC=n and the code is not
optimized out resulting in a build fail:
jailhouse.c: In function ‘jailhouse_get_smp_config’:
jailhouse.c:73:3: error: ‘x2apic_phys’ undeclared (first use in this function)
Fixes: 11c8dc419b ("x86/jailhouse: Enable APIC and SMP support")
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: jailhouse-dev@googlegroups.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
We'll need that name for a generic implementation soon.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To implement the x86 forbid_dac and iommu_sac_force we want an arch hook
so that it can apply the global options across all dma_map_ops
implementations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Lift the code from x86 so that we behave consistently. In the future we
should probably warn if any of these is set.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
On context switch from a shallow call stack to a deeper one, as the CPU
does 'ret' up the deeper side it may encounter RSB entries (predictions for
where the 'ret' goes to) which were populated in userspace.
This is problematic if neither SMEP nor KPTI (the latter of which marks
userspace pages as NX for the kernel) are active, as malicious code in
userspace may then be executed speculatively.
Overwrite the CPU's return prediction stack with calls which are predicted
to return to an infinite loop, to "capture" speculation if this
happens. This is required both for retpoline, and also in conjunction with
IBRS for !SMEP && !KPTI.
On Skylake+ the problem is slightly different, and an *underflow* of the
RSB may cause errant branch predictions to occur. So there it's not so much
overwrite, as *filling* the RSB to attempt to prevent it getting
empty. This is only a partial solution for Skylake+ since there are many
other conditions which may result in the RSB becoming empty. The full
solution on Skylake+ is to use IBRS, which will prevent the problem even
when the RSB becomes empty. With IBRS, the RSB-stuffing will not be
required on context switch.
[ tglx: Added missing vendor check and slighty massaged comments and
changelog ]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515779365-9032-1-git-send-email-dwmw@amazon.co.uk
The typical I/O interrupts in non-root cells are MSI-based. However, the
platform UARTs do not support MSI. In order to run a non-root cell that
shall use one of them, the standard IOAPIC must be registered and 1:1
routing for IRQ 3 and 4 set up.
If an IOAPIC is not available, the boot loader clears standard_ioapic in
the setup data, so registration is skipped. If the guest is not allowed to
to use one of those pins, Jailhouse will simply ignore the access.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/90d942dda9d48a8046e00bb3c1bb6757c83227be.1511770314.git.jan.kiszka@siemens.com
Non-root cells do not have CMOS access, thus the warm reset cannot be
enabled. There is no RTC, thus also no wall clock. Furthermore, there
are no ISA IRQs and no PIC.
Also disable probing of i8042 devices that are typically blocked for
non-root cells. In theory, access could also be granted to a non-root
cell, provided the root cell is not using the devices. But there is no
concrete scenario in sight, and disabling probing over Jailhouse allows
to build generic kernels that keep CONFIG_SERIO enabled for use in
normal systems.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/39b68cc2c496501c9d95e6f40e5d76e3053c3908.1511770314.git.jan.kiszka@siemens.com
Register the APIC which Jailhouse always exposes at 0xfee00000 if in
xAPIC mode or via MSRs as x2APIC. The latter is only available if it was
already activated because there is no support for switching its mode
during runtime.
Jailhouse requires the APIC to be operated in phys-flat mode. Ensure
that this mode is selected by Linux.
The available CPUs are taken from the setup data structure that the
loader filled and registered with the kernel.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/8b2255da0a9856c530293a67aa9d6addfe102a2b.1511770314.git.jan.kiszka@siemens.com
The Jailhouse hypervisor is able to statically partition a multicore
system into multiple so-called cells. Linux is used as boot loader and
continues to run in the root cell after Jailhouse is enabled. Linux can
also run in non-root cells.
Jailhouse does not emulate usual x86 devices. It also provides no
complex ACPI but basic platform information that the boot loader
forwards via setup data. This adds the infrastructure to detect when
running in a non-root cell so that the platform can be configured as
required in succeeding steps.
Support is limited to x86-64 so far, primarily because no boot loader
stub exists for i386 and, thus, we wouldn't be able to test the 32-bit
path.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/7f823d077b38b1a70c526b40b403f85688c137d3.1511770314.git.jan.kiszka@siemens.com
As the comment already stated, there is no need for setting up LDR (and
DFR) in physflat mode as it remains unused (see SDM, 10.6.2.1).
flat_init_apic_ldr only served as a placeholder for a nop operation so
far, causing no harm.
That will change when running over the Jailhouse hypervisor. Here we
must not touch LDR in a way that destroys the mapping originally set up
by the Linux root cell. Jailhouse enforces this setting in order to
efficiently validate any IPI requests sent by a cell.
Avoid a needless clash caused by flat_init_apic_ldr by installing a true
nop handler.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/f9867d294cdae4d45ed89d3a2e6adb524f4f6794.1511770314.git.jan.kiszka@siemens.com
Without TSC_KNOWN_FREQ the TSC clocksource is registered so late that the
kernel first switches to the HPET. Using HPET on large CPU count machines is
undesirable.
Therefore register a tsc-early clocksource using the preliminary tsc_khz
from quick calibration. Then when the final TSC calibration is done, it
can switch to the tuned frequency.
The only notably problem is that the real tsc clocksource must be marked
with CLOCK_SOURCE_VALID_FOR_HRES, otherwise it will not be selected when
unregistering tsc-early. tsc-early cannot be left registered, because then
the clocksource code would fall back to it when we tsc clocksource is
marked unstable later.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: len.brown@intel.com
Cc: rui.zhang@intel.com
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20171222092243.431585460@infradead.org
Zhang Rui reported that a Surface Pro 4 will fail to boot with
lapic=notscdeadline. Part of the problem is that that machine doesn't have
a PIT.
If, for some reason, the TSC init has to fall back to TSC calibration, it
relies on the PIT to be present.
Allow TSC calibration to reliably fall back to HPET.
The below results in an accurate TSC measurement when forced on a IVB:
tsc: Unable to calibrate against PIT
tsc: No reference (HPET/PMTIMER) available
tsc: Unable to calibrate against PIT
tsc: using HPET reference calibration
tsc: Detected 2792.451 MHz processor
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: len.brown@intel.com
Cc: rui.zhang@intel.com
Link: https://lkml.kernel.org/r/20171222092243.333145937@infradead.org
Pull x86 pti updates from Thomas Gleixner:
"This contains:
- a PTI bugfix to avoid setting reserved CR3 bits when PCID is
disabled. This seems to cause issues on a virtual machine at least
and is incorrect according to the AMD manual.
- a PTI bugfix which disables the perf BTS facility if PTI is
enabled. The BTS AUX buffer is not globally visible and causes the
CPU to fault when the mapping disappears on switching CR3 to user
space. A full fix which restores BTS on PTI is non trivial and will
be worked on.
- PTI bugfixes for EFI and trusted boot which make sure that the user
space visible page table entries have the NX bit cleared
- removal of dead code in the PTI pagetable setup functions
- add PTI documentation
- add a selftest for vsyscall to verify that the kernel actually
implements what it advertises.
- a sysfs interface to expose vulnerability and mitigation
information so there is a coherent way for users to retrieve the
status.
- the initial spectre_v2 mitigations, aka retpoline:
+ The necessary ASM thunk and compiler support
+ The ASM variants of retpoline and the conversion of affected ASM
code
+ Make LFENCE serializing on AMD so it can be used as speculation
trap
+ The RSB fill after vmexit
- initial objtool support for retpoline
As I said in the status mail this is the most of the set of patches
which should go into 4.15 except two straight forward patches still on
hold:
- the retpoline add on of LFENCE which waits for ACKs
- the RSB fill after context switch
Both should be ready to go early next week and with that we'll have
covered the major holes of spectre_v2 and go back to normality"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
x86,perf: Disable intel_bts when PTI
security/Kconfig: Correct the Documentation reference for PTI
x86/pti: Fix !PCID and sanitize defines
selftests/x86: Add test_vsyscall
x86/retpoline: Fill return stack buffer on vmexit
x86/retpoline/irq32: Convert assembler indirect jumps
x86/retpoline/checksum32: Convert assembler indirect jumps
x86/retpoline/xen: Convert Xen hypercall indirect jumps
x86/retpoline/hyperv: Convert assembler indirect jumps
x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
x86/retpoline/entry: Convert entry assembler indirect jumps
x86/retpoline/crypto: Convert crypto assembler indirect jumps
x86/spectre: Add boot time option to select Spectre v2 mitigation
x86/retpoline: Add initial retpoline support
objtool: Allow alternatives to be ignored
objtool: Detect jumps to retpoline thunks
x86/pti: Make unpoison of pgd for trusted boot work for real
x86/alternatives: Fix optimize_nops() checking
sysfs/cpu: Fix typos in vulnerability documentation
x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
...
If CPU and TSC frequency are the same the printout of the CPU frequency is
valid for the TSC as well:
tsc: Detected 2900.000 MHz processor
If the TSC frequency is different there is no information in dmesg. Add a
conditional printout:
tsc: Detected 2904.000 MHz TSC
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Link: https://lkml.kernel.org/r/537b342debcd8e8aebc8d631015dcdf9f9ba8a26.1513920414.git.len.brown@intel.com
The INTEL_FAM6_SKYLAKE_X hardcoded crystal_khz value of 25MHZ is
problematic:
- SKX workstations (with same model # as server variants) use a 24 MHz
crystal. This results in a -4.0% time drift rate on SKX workstations.
- SKX servers subject the crystal to an EMI reduction circuit that reduces its
actual frequency by (approximately) -0.25%. This results in -1 second per
10 minute time drift as compared to network time.
This issue can also trigger a timer and power problem, on configurations
that use the LAPIC timer (versus the TSC deadline timer). Clock ticks
scheduled with the LAPIC timer arrive a few usec before the time they are
expected (according to the slow TSC). This causes Linux to poll-idle, when
it should be in an idle power saving state. The idle and clock code do not
graciously recover from this error, sometimes resulting in significant
polling and measurable power impact.
Stop using native_calibrate_tsc() for INTEL_FAM6_SKYLAKE_X.
native_calibrate_tsc() will return 0, boot will run with tsc_khz = cpu_khz,
and the TSC refined calibration will update tsc_khz to correct for the
difference.
[ tglx: Sanitized change log ]
Fixes: 6baf3d6182 ("x86/tsc: Add additional Intel CPU models to the crystal quirk list")
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/ff6dcea166e8ff8f2f6a03c17beab2cb436aa779.1513920414.git.len.brown@intel.com
If the crystal frequency cannot be determined via CPUID(15).crystal_khz or
the built-in table then native_calibrate_tsc() will still set the
X86_FEATURE_TSC_KNOWN_FREQ flag which prevents the refined TSC calibration.
As a consequence such systems use cpu_khz for the TSC frequency which is
incorrect when cpu_khz != tsc_khz resulting in time drift.
Return early when the crystal frequency cannot be retrieved without setting
the X86_FEATURE_TSC_KNOWN_FREQ flag. This ensures that the refined TSC
calibration is invoked.
[ tglx: Steam-blastered changelog. Sigh ]
Fixes: 4ca4df0b7e ("x86/tsc: Mark TSC frequency determined by CPUID as known")
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Cc: Bin Gao <bin.gao@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/0fe2503aa7d7fc69137141fc705541a78101d2b9.1513920414.git.len.brown@intel.com
Add a spectre_v2= option to select the mitigation used for the indirect
branch speculation vulnerability.
Currently, the only option available is retpoline, in its various forms.
This will be expanded to cover the new IBRS/IBPB microcode features.
The RETPOLINE_AMD feature relies on a serializing LFENCE for speculation
control. For AMD hardware, only set RETPOLINE_AMD if LFENCE is a
serializing instruction, which is indicated by the LFENCE_RDTSC feature.
[ tglx: Folded back the LFENCE/AMD fixes and reworked it so IBRS
integration becomes simple ]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-5-git-send-email-dwmw@amazon.co.uk
Enable the use of -mindirect-branch=thunk-extern in newer GCC, and provide
the corresponding thunks. Provide assembler macros for invoking the thunks
in the same way that GCC does, from native and inline assembler.
This adds X86_FEATURE_RETPOLINE and sets it by default on all CPUs. In
some circumstances, IBRS microcode features may be used instead, and the
retpoline can be disabled.
On AMD CPUs if lfence is serialising, the retpoline can be dramatically
simplified to a simple "lfence; jmp *\reg". A future patch, after it has
been verified that lfence really is serialising in all circumstances, can
enable this by setting the X86_FEATURE_RETPOLINE_AMD feature bit in addition
to X86_FEATURE_RETPOLINE.
Do not align the retpoline in the altinstr section, because there is no
guarantee that it stays aligned when it's copied over the oldinstr during
alternative patching.
[ Andi Kleen: Rename the macros, add CONFIG_RETPOLINE option, export thunks]
[ tglx: Put actual function CALL/JMP in front of the macros, convert to
symbolic labels ]
[ dwmw2: Convert back to numeric labels, merge objtool fixes ]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-4-git-send-email-dwmw@amazon.co.uk
On machines where the GART aperture is mapped over physical RAM
/proc/vmcore contains the remapped range and reading it may cause hangs or
reboots.
In the past, the GART region was added into the resource map, implemented
by commit 56dd669a13 ("[PATCH] Insert GART region into resource map")
However, inserting the iomem_resource from the early GART code caused
resource conflicts with some AGP drivers (bko#72201), which got avoided by
reverting the patch in commit 707d4eefbd ("Revert [PATCH] Insert GART
region into resource map"). This revert introduced the /proc/vmcore bug.
The vmcore ELF header is either prepared by the kernel (when using the
kexec_file_load syscall) or by the kexec userspace (when using the kexec_load
syscall). Since we no longer have the GART iomem resource, the userspace
kexec has no way of knowing which region to exclude from the ELF header.
Changes from v1 of this patch:
Instead of excluding the aperture from the ELF header, this patch
makes /proc/vmcore return zeroes in the second kernel when attempting to
read the aperture region. This is done by reusing the
gart_oldmem_pfn_is_ram infrastructure originally intended to exclude XEN
balooned memory. This works for both, the kexec_file_load and kexec_load
syscalls.
[Note that the GART region is the same in the first and second kernels:
regardless whether the first kernel fixed up the northbridge/bios setting
and mapped the aperture over physical memory, the second kernel finds the
northbridge properly configured by the first kernel and the aperture
never overlaps with e820 memory because the second kernel has a fake e820
map created from the crashkernel memory regions. Thus, the second kernel
keeps the aperture address/size as configured by the first kernel.]
register_oldmem_pfn_is_ram can only register one callback and returns an error
if the callback has been registered already. Since XEN used to be the only user
of this function, it never checks the return value. Now that we have more than
one user, I added a WARN_ON just in case agp, XEN, or any other future user of
register_oldmem_pfn_is_ram were to step on each other's toes.
Fixes: 707d4eefbd ("Revert [PATCH] Insert GART region into resource map")
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Baoquan He <bhe@redhat.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: David Airlie <airlied@linux.ie>
Cc: yinghai@kernel.org
Cc: joro@8bytes.org
Cc: kexec@lists.infradead.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: https://lkml.kernel.org/r/20180106010013.73suskgxm7lox7g6@dwarf.suse.cz
The alternatives code checks only the first byte whether it is a NOP, but
with NOPs in front of the payload and having actual instructions after it
breaks the "optimized' test.
Make sure to scan all bytes before deciding to optimize the NOPs in there.
Reported-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/20180110112815.mgciyf5acwacphkq@pd.tnic
phys_to_dma, dma_to_phys and dma_capable are helpers published by
architecture code for use of swiotlb and xen-swiotlb only. Drivers are
not supposed to use these directly, but use the DMA API instead.
Move these to a new asm/dma-direct.h helper, included by a
linux/dma-direct.h wrapper that provides the default linear mapping
unless the architecture wants to override it.
In the MIPS case the existing dma-coherent.h is reused for now as
untangling it will take a bit of work.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Robin Murphy <robin.murphy@arm.com>
With LFENCE now a serializing instruction, use LFENCE_RDTSC in preference
to MFENCE_RDTSC. However, since the kernel could be running under a
hypervisor that does not support writing that MSR, read the MSR back and
verify that the bit has been set successfully. If the MSR can be read
and the bit is set, then set the LFENCE_RDTSC feature, otherwise set the
MFENCE_RDTSC feature.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/20180108220932.12580.52458.stgit@tlendack-t1.amdoffice.net
To aid in speculation control, make LFENCE a serializing instruction
since it has less overhead than MFENCE. This is done by setting bit 1
of MSR 0xc0011029 (DE_CFG). Some families that support LFENCE do not
have this MSR. For these families, the LFENCE instruction is already
serializing.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/20180108220921.12580.71694.stgit@tlendack-t1.amdoffice.net
This is another case similar to what EFI does: create a new set of
page tables, map some code at a low address, and jump to it. PTI
mistakes this low address for userspace and mistakenly marks it
non-executable in an effort to make it unusable for userspace.
Undo the poison to allow execution.
Fixes: 385ce0ea4c ("x86/mm/pti: Add Kconfig")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jeff Law <law@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: David" <dwmw@amazon.co.uk>
Cc: Nick Clifton <nickc@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180108102805.GK25546@redhat.com
Implement the CPU vulnerabilty show functions for meltdown, spectre_v1 and
spectre_v2.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/20180107214913.177414879@linutronix.de
Add the bug bits for spectre v1/2 and force them unconditionally for all
cpus.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1515239374-23361-2-git-send-email-dwmw@amazon.co.uk
Instead of blacklisting all model 79 CPUs when attempting a late
microcode loading, limit that only to CPUs with microcode revisions <
0x0b000021 because only on those late loading may cause a system hang.
For such processors either:
a) a BIOS update which might contain a newer microcode revision
or
b) the early microcode loading method
should be considered.
Processors with revisions 0x0b000021 or higher will not experience such
hangs.
For more details, see erratum BDF90 in document #334165 (Intel Xeon
Processor E7-8800/4800 v4 Product Family Specification Update) from
September 2017.
[ bp: Heavily massage commit message and pr_* statements. ]
Fixes: 723f2828a9 ("x86/microcode/intel: Disable late loading on model 79")
Signed-off-by: Jia Zhang <qianyue.zj@alibaba-inc.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Cc: <stable@vger.kernel.org> # v4.14
Link: http://lkml.kernel.org/r/1514772287-92959-1-git-send-email-qianyue.zj@alibaba-inc.com
Pull more x86 pti fixes from Thomas Gleixner:
"Another small stash of fixes for fallout from the PTI work:
- Fix the modules vs. KASAN breakage which was caused by making
MODULES_END depend of the fixmap size. That was done when the cpu
entry area moved into the fixmap, but now that we have a separate
map space for that this is causing more issues than it solves.
- Use the proper cache flush methods for the debugstore buffers as
they are mapped/unmapped during runtime and not statically mapped
at boot time like the rest of the cpu entry area.
- Make the map layout of the cpu_entry_area consistent for 4 and 5
level paging and fix the KASLR vaddr_end wreckage.
- Use PER_CPU_EXPORT for per cpu variable and while at it unbreak
nvidia gfx drivers by dropping the GPL export. The subject line of
the commit tells it the other way around, but I noticed that too
late.
- Fix the ASM alternative macros so they can be used in the middle of
an inline asm block.
- Rename the BUG_CPU_INSECURE flag to BUG_CPU_MELTDOWN so the attack
vector is properly identified. The Spectre mitigations will come
with their own bug bits later"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN
x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm
x86/tlb: Drop the _GPL from the cpu_tlbstate export
x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers
x86/kaslr: Fix the vaddr_end mess
x86/mm: Map cpu_entry_area at the same place on 4/5 level
x86/mm: Set MODULES_END to 0xffffffffff000000