Commit Graph

13985 Commits

Author SHA1 Message Date
Linus Torvalds
6304672b7f Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/pti updates from Thomas Gleixner:
 "Another set of melted spectrum related changes:

   - Code simplifications and cleanups for RSB and retpolines.

   - Make the indirect calls in KVM speculation safe.

   - Whitelist CPUs which are known not to speculate from Meltdown and
     prepare for the new CPUID flag which tells the kernel that a CPU is
     not affected.

   - A less rigorous variant of the module retpoline check which merily
     warns when a non-retpoline protected module is loaded and reflects
     that fact in the sysfs file.

   - Prepare for Indirect Branch Prediction Barrier support.

   - Prepare for exposure of the Speculation Control MSRs to guests, so
     guest OSes which depend on those "features" can use them. Includes
     a blacklist of the broken microcodes. The actual exposure of the
     MSRs through KVM is still being worked on"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation: Simplify indirect_branch_prediction_barrier()
  x86/retpoline: Simplify vmexit_fill_RSB()
  x86/cpufeatures: Clean up Spectre v2 related CPUID flags
  x86/cpu/bugs: Make retpoline module warning conditional
  x86/bugs: Drop one "mitigation" from dmesg
  x86/nospec: Fix header guards names
  x86/alternative: Print unadorned pointers
  x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support
  x86/cpufeature: Blacklist SPEC_CTRL/PRED_CMD on early Spectre v2 microcodes
  x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown
  x86/msr: Add definitions for new speculation control MSRs
  x86/cpufeatures: Add AMD feature bits for Speculation Control
  x86/cpufeatures: Add Intel feature bits for Speculation Control
  x86/cpufeatures: Add CPUID_7_EDX CPUID leaf
  module/retpoline: Warn about missing retpoline in module
  KVM: VMX: Make indirect call speculation safe
  KVM: x86: Make indirect calls in emulator speculation safe
2018-01-29 19:08:02 -08:00
Linus Torvalds
942633523c Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm update from Thomas Gleixner:
 "A single patch which excludes the GART aperture from vmcore as
  accessing that area from a dump kernel can crash the kernel.

  Not necessarily the nicest way to fix this, but curing this from
  ground up requires a more thorough rewrite of the whole kexec/kdump
  magic"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/gart: Exclude GART aperture from vmcore
2018-01-29 18:58:16 -08:00
Linus Torvalds
36c289e72a Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 timer updates from Thomas Gleixner:
 "A small set of updates for x86 specific timers:

   - Mark TSC invariant on a subset of Centaur CPUs

   - Allow TSC calibration without PIT on mobile platforms which lack
     legacy devices"

* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/centaur: Mark TSC invariant
  x86/tsc: Introduce early tsc clocksource
  x86/time: Unconditionally register legacy timer interrupt
  x86/tsc: Allow TSC calibration without PIT
2018-01-29 18:54:56 -08:00
Linus Torvalds
669c0f762e Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Thomas Gleixner:
 "The platform support for x86 contains the following updates:

   - A set of updates for the UV platform to support new CPUs and to fix
     some of the UV4A BAU MRRs

   - The initial platform support for the jailhouse hypervisor to allow
     native Linux guests (inmates) in non-root cells.

   - A fix for the PCI initialization on Intel MID platforms"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86/jailhouse: Respect pci=lastbus command line settings
  x86/jailhouse: Set X86_FEATURE_TSC_KNOWN_FREQ
  x86/platform/intel-mid: Move PCI initialization to arch_init()
  x86/platform/uv/BAU: Replace hard-coded values with MMR definitions
  x86/platform/UV: Fix UV4A BAU MMRs
  x86/platform/UV: Fix GAM MMR references in the UV x2apic code
  x86/platform/UV: Fix GAM MMR changes in UV4A
  x86/platform/UV: Add references to access fixed UV4A HUB MMRs
  x86/platform/UV: Fix UV4A support on new Intel Processors
  x86/platform/UV: Update uv_mmrs.h to prepare for UV4A fixes
  x86/jailhouse: Add PCI dependency
  x86/jailhouse: Hide x2apic code when CONFIG_X86_X2APIC=n
  x86/jailhouse: Initialize PCI support
  x86/jailhouse: Wire up IOAPIC for legacy UART ports
  x86/jailhouse: Halt instead of failing to restart
  x86/jailhouse: Silence ACPI warning
  x86/jailhouse: Avoid access of unsupported platform resources
  x86/jailhouse: Set up timekeeping
  x86/jailhouse: Enable PMTIMER
  x86/jailhouse: Enable APIC and SMP support
  ...
2018-01-29 18:17:39 -08:00
Linus Torvalds
f0b13428c9 Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/cache updates from Thomas Gleixner:
 "A set of patches which add support for L2 cache partitioning to the
  Intel RDT facility"

* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/intel_rdt: Add command line parameter to control L2_CDP
  x86/intel_rdt: Enable L2 CDP in MSR IA32_L2_QOS_CFG
  x86/intel_rdt: Add two new resources for L2 Code and Data Prioritization (CDP)
  x86/intel_rdt: Enumerate L2 Code and Data Prioritization (CDP) feature
  x86/intel_rdt: Add L2CDP support in documentation
  x86/intel_rdt: Update documentation
2018-01-29 17:48:22 -08:00
Linus Torvalds
1a9a126b50 ACPI updates for v4.16-rc1
- Update the ACPICA kernel code to upstream revision 20171215 including:
    * Support for ACPI 6.0A changes in the NFIT table (Bob Moore).
    * Local 64-bit divide in string conversions (Bob Moore).
    * Fix for a regression in acpi_evaluate_object_type() (Bob Moore).
    * Fixes for memory leaks during package object resolution (Bob Moore).
    * Deployment of safe version of strncpy() (Bob Moore).
    * Debug and messaging updates (Bob Moore).
    * Support for PDTT, SDEV, TPM2 tables in iASL and tools (Bob Moore).
    * Null pointer dereference avoidance in Op and cleanups (Colin Ian King).
    * Fix for memory leak from building prefixed pathname (Erik Schmauss).
    * Coding style fixes, disassembler and compiler updates (Hanjun Guo,
      Erik Schmauss).
    * Additional PPTT flags from ACPI 6.2 (Jeremy Linton).
    * Fix for an off-by-one error in acpi_get_timer_duration() (Jung-uk Kim).
    * Infinite loop detection timeout and utilities cleanups (Lv Zheng).
    * Windows 10 version 1607 and 1703 OSI strings (Mario Limonciello).
 
  - Update ACPICA information in MAINTAINERS to reflect the current
    status of ACPICA maintenance and rename a local variable in one
    function to match the corresponding upstream code (Rafael Wysocki).
 
  - Clean up ACPI-related initialization on x86 (Andy Shevchenko).
 
  - Add support for Intel Merrifield to the ACPI GPIO code (Andy
    Shevchenko).
 
  - Clean up ACPI PMIC drivers (Andy Shevchenko, Arvind Yadav).
 
  - Fix the ACPI Generic Event Device (GED) driver to free IRQs on
    shutdown and clean up the PCI IRQ Link driver (Sinan Kaya).
 
  - Make the GHES code call into the AER driver on all errors and
    clean up the ACPI APEI code (Colin Ian King, Tyler Baicar).
 
  - Make the IA64 ACPI NUMA code parse all SRAT entries (Ganapatrao
    Kulkarni).
 
  - Add a lid switch blacklist to the ACPI button driver and make it
    print extra debug messages on lid events (Hans de Goede).
 
  - Add quirks for Asus GL502VSK and UX305LA to the ACPI battery
    driver and clean it up somewhat (Bjørn Mork, Kai-Heng Feng).
 
  - Add device link for CHT SD card dependency on I2C to the ACPI
    LPSS (Intel SoCs) driver and make it avoid creating platform
    device objects for devices without MMIO resources (Adrian Hunter,
    Hans de Goede).
 
  - Fix the ACPI GPE mask kernel command line parameter handling
    (Prarit Bhargava).
 
  - Fix the handling of (incorrectly exposed) backlight interfaces
    without LCD (Hans de Goede).
 
  - Fix the usage of debugfs_create_*() in the ACPI EC driver (Geert
    Uytterhoeven).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJaY/BrAAoJEILEb/54YlRxR10P/1dVxfhLiGBrwKzA1urr71Vg
 LH6ZdlIlihyu9a1PZHjfO72IuZCMSkSnoJUPJPFK6FNA0hIDqsP+hC8gcknCxnAU
 i6r2ZzQesOzzjGblpASvdDg0GkYe9r6sHpUQ0xW/hnijamforflGveW1bagbnFuI
 gvT6m6+lMJwBd0NrWhQiTJmTuSTwgJBXDA+HhlDnGd6ziVfHPaCxon4L9GQfVhsb
 jbOI/kBjnEKoN1dBbEAcSpgzklVUXUj4x2NHUMCyvKOJyKG/F7Ycbghux9t3C+ej
 1T0XJAU7K3hkmstWkWwylqVZt3UW47xiJKe6K2Z5p3CaJx0cnI18C+g7x/IcRGiA
 +J/Uco+xeMa8yqYV96j+AJexpUDu7fYo6B4nRZ/K+MjWifboeSLKn8PHLKhqYn6k
 sV3s0dUf8SJK5pTu+IkAgzDzsw/uJAI8Rylmig9ea12/nIt6EH3Kero31hi3lkoN
 Y2rdi9MIqFIj2tX42047Y/q2UEFkMWGO3q8fLkXRvWPwnwStHDDFVj/kd19CWcTy
 B1kNxNQQS/Q9u0uoW4rIHW6ipEU2sqyt/tvVQnmJlVP0HuO++uIO7h3mrBDxhSJS
 zQF4qtusbMHC85BHrozmGFECN8Cex8DYSUTO/yCoBvMlMxJ7UlSt25TBN+SzmnOV
 H1LhVRcFh1488lXzuM+u
 =/eCQ
 -----END PGP SIGNATURE-----

Merge tag 'acpi-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI updates from Rafael Wysocki:
 "The majority of this is an update of the ACPICA kernel code to
  upstream revision 20171215 with a cosmetic change and a maintainers
  information update on top of it.

  The rest is mostly some minor fixes and cleanups in the ACPI drivers
  and cleanups to initialization on x86.

  Specifics:

   - Update the ACPICA kernel code to upstream revision 20171215 including:
      * Support for ACPI 6.0A changes in the NFIT table (Bob Moore)
      * Local 64-bit divide in string conversions (Bob Moore)
      * Fix for a regression in acpi_evaluate_object_type() (Bob Moore)
      * Fixes for memory leaks during package object resolution (Bob
        Moore)
      * Deployment of safe version of strncpy() (Bob Moore)
      * Debug and messaging updates (Bob Moore)
      * Support for PDTT, SDEV, TPM2 tables in iASL and tools (Bob
        Moore)
      * Null pointer dereference avoidance in Op and cleanups (Colin Ian
        King)
      * Fix for memory leak from building prefixed pathname (Erik
        Schmauss)
      * Coding style fixes, disassembler and compiler updates (Hanjun
        Guo, Erik Schmauss)
      * Additional PPTT flags from ACPI 6.2 (Jeremy Linton)
      * Fix for an off-by-one error in acpi_get_timer_duration()
        (Jung-uk Kim)
      * Infinite loop detection timeout and utilities cleanups (Lv
        Zheng)
      * Windows 10 version 1607 and 1703 OSI strings (Mario
        Limonciello)

   - Update ACPICA information in MAINTAINERS to reflect the current
     status of ACPICA maintenance and rename a local variable in one
     function to match the corresponding upstream code (Rafael Wysocki)

   - Clean up ACPI-related initialization on x86 (Andy Shevchenko)

   - Add support for Intel Merrifield to the ACPI GPIO code (Andy
     Shevchenko)

   - Clean up ACPI PMIC drivers (Andy Shevchenko, Arvind Yadav)

   - Fix the ACPI Generic Event Device (GED) driver to free IRQs on
     shutdown and clean up the PCI IRQ Link driver (Sinan Kaya)

   - Make the GHES code call into the AER driver on all errors and clean
     up the ACPI APEI code (Colin Ian King, Tyler Baicar)

   - Make the IA64 ACPI NUMA code parse all SRAT entries (Ganapatrao
     Kulkarni)

   - Add a lid switch blacklist to the ACPI button driver and make it
     print extra debug messages on lid events (Hans de Goede)

   - Add quirks for Asus GL502VSK and UX305LA to the ACPI battery driver
     and clean it up somewhat (Bjørn Mork, Kai-Heng Feng)

   - Add device link for CHT SD card dependency on I2C to the ACPI LPSS
     (Intel SoCs) driver and make it avoid creating platform device
     objects for devices without MMIO resources (Adrian Hunter, Hans de
     Goede)

   - Fix the ACPI GPE mask kernel command line parameter handling
     (Prarit Bhargava)

   - Fix the handling of (incorrectly exposed) backlight interfaces
     without LCD (Hans de Goede)

   - Fix the usage of debugfs_create_*() in the ACPI EC driver (Geert
     Uytterhoeven)"

* tag 'acpi-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (62 commits)
  ACPI/PCI: pci_link: reduce verbosity when IRQ is enabled
  ACPI / LPSS: Do not instiate platform_dev for devs without MMIO resources
  ACPI / PMIC: Convert to use builtin_platform_driver() macro
  ACPI / x86: boot: Propagate error code in acpi_gsi_to_irq()
  ACPICA: Update version to 20171215
  ACPICA: trivial style fix, no functional change
  ACPICA: Fix a couple memory leaks during package object resolution
  ACPICA: Recognize the Windows 10 version 1607 and 1703 OSI strings
  ACPICA: DT compiler: prevent error if optional field at the end of table is not present
  ACPICA: Rename a global variable, no functional change
  ACPICA: Create and deploy safe version of strncpy
  ACPICA: Cleanup the global variables and update comments
  ACPICA: Debugger: fix slight indentation issue
  ACPICA: Fix a regression in the acpi_evaluate_object_type() interface
  ACPICA: Update for a few debug output statements
  ACPICA: Debug output, no functional change
  ACPI: EC: Fix debugfs_create_*() usage
  ACPI / video: Default lcd_only to true on Win8-ready and newer machines
  ACPI / x86: boot: Don't setup SCI on HW-reduced platforms
  ACPI / x86: boot: Use INVALID_ACPI_IRQ instead of 0 for acpi_sci_override_gsi
  ...
2018-01-29 10:17:53 -08:00
Linus Torvalds
7f3fdd40a7 Power management updates for v4.16-rc1
- Define a PM driver flag allowing drivers to request that their
    devices be left in suspend after system-wide transitions to the
    working state if possible and add support for it to the PCI bus
    type and the ACPI PM domain (Rafael Wysocki).
 
  - Make the PM core carry out optimizations for devices with driver
    PM flags set in some cases and make a few drivers set those flags
    (Rafael Wysocki).
 
  - Fix and clean up wrapper routines allowing runtime PM device
    callbacks to be re-used for system-wide PM, change the generic
    power domains (genpd) framework to stop using those routines
    incorrectly and fix up a driver depending on that behavior of
    genpd (Rafael Wysocki, Ulf Hansson, Geert Uytterhoeven).
 
  - Fix and clean up the PM core's device wakeup framework and
    re-factor system-wide PM core code related to device wakeup
    (Rafael Wysocki, Ulf Hansson, Brian Norris).
 
  - Make more x86-based systems use the Low Power Sleep S0 _DSM
    interface by default (to fix power button wakeup from
    suspend-to-idle on Surface Pro3) and add a kernel command line
    switch to tell it to ignore the system sleep blacklist in the
    ACPI core (Rafael Wysocki).
 
  - Fix a race condition related to cpufreq governor module removal
    and clean up the governor management code in the cpufreq core
    (Rafael Wysocki).
 
  - Drop the unused generic code related to the handling of the static
    power energy usage model in the CPU cooling thermal driver along
    with the corresponding documentation (Viresh Kumar).
 
  - Add mt2712 support to the Mediatek cpufreq driver (Andrew-sh Cheng).
 
  - Add a new operating point to the imx6ul and imx6q cpufreq drivers
    and switch the latter to using clk_bulk_get() (Anson Huang, Dong
    Aisheng).
 
  - Add support for multiple regulators to the TI cpufreq driver along
    with a new DT binding related to that and clean up that driver
    somewhat (Dave Gerlach).
 
  - Fix a powernv cpufreq driver regression leading to incorrect CPU
    frequency reporting, fix that driver to deal with non-continguous
    P-states correctly and clean it up (Gautham Shenoy, Shilpasri Bhat).
 
  - Add support for frequency scaling on Armada 37xx SoCs through the
    generic DT cpufreq driver (Gregory CLEMENT).
 
  - Fix error code paths in the mvebu cpufreq driver (Gregory CLEMENT).
 
  - Fix a transition delay setting regression in the longhaul cpufreq
    driver (Viresh Kumar).
 
  - Add Skylake X (server) support to the intel_pstate cpufreq driver
    and clean up that driver somewhat (Srinivas Pandruvada).
 
  - Clean up the cpufreq statistics collection code (Viresh Kumar).
 
  - Drop cluster terminology and dependency on physical_package_id
    from the PSCI driver and drop dependency on arm_big_little from
    the SCPI cpufreq driver (Sudeep Holla).
 
  - Add support for system-wide suspend and resume to the RAPL power
    capping driver and drop a redundant semicolon from it (Zhen Han,
    Luis de Bethencourt).
 
  - Make SPI domain validation (in the SCSI SPI transport driver) and
    system-wide suspend mutually exclusive as they rely on the same
    underlying mechanism and cannot be carried out at the same time
    (Bart Van Assche).
 
  - Fix the computation of the amount of memory to preallocate in the
    hibernation core and clean up one function in there (Rainer Fiebig,
    Kyungsik Lee).
 
  - Prepare the Operating Performance Points (OPP) framework for being
    used with power domains and clean up one function in it (Viresh
    Kumar, Wei Yongjun).
 
  - Clean up the generic sysfs interface for device PM (Andy Shevchenko).
 
  - Fix several minor issues in power management frameworks and clean
    them up a bit (Arvind Yadav, Bjorn Andersson, Geert Uytterhoeven,
    Gustavo Silva, Julia Lawall, Luis de Bethencourt, Paul Gortmaker,
    Sergey Senozhatsky, gaurav jindal).
 
  - Make it easier to disable PM via Kconfig (Mark Brown).
 
  - Clean up the cpupower and intel_pstate_tracer utilities (Doug
    Smythies, Laura Abbott).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJaYw2iAAoJEILEb/54YlRxLHwP/iabmAcbXBeg30/wSCKcWB6f
 Ar785YbkFedNP7b2dypR7bcKIkaV55EExNHHoVuvC6gKrW+zx3F39v9QzK3HBKfw
 DgLWMjxR5Xdm9o8o2chsBEMl0itSRB9s864s+AAAElP+qjyT6kmbFyRFgVYLiNH0
 v9jNhPF9EmirViwES/syELa/P1AJDMxCb/SbRY+Xp1sPhGKlx2J/2eQsVDs7G+wL
 2BJeyBqwL9D78U/eY2bvpCoZLpmZmklx1eY5iK3Mzo6LZKYMaSypgkGuRfh//K+a
 8vFLwOBsOlpZ8lsPBRatV5+SMu8qMQMTnstui1m3/9bOPFfjymat6u0lLw4BV2hv
 zrNfqWOiwTAt/fczR1/naYuuSeRCLABvYDKjs/9iYdrCZYJ+n+ZzU/wi5geswDtD
 cQKDMOdOBrnfkN0Vqpw6ZBqun0RDldNT/+6oy93tHWBlF0CA4mMq5jr8q3iH35CW
 8TA1GCkurHZXTyYdYXR5SUHxPbOgZC87GAb7RlFEJJnvvkmy3jmBng675Hl5XAn7
 D8eJp3d4h5n121pkMLGcBc7K036T2uFsjrHWx+QsjKFUBWUBnuRfInRrLA5WnGo2
 U+KIEUPepdnbFFvYNv+kTgz2uE6FOqycEmnUKUKWUZYPN0GDAOw/V3813uxVRYtq
 27omIOL7PJp1wWjQnfXK
 =dnb7
 -----END PGP SIGNATURE-----

Merge tag 'pm-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "This includes some infrastructure changes in the PM core, mostly
  related to integration between runtime PM and system-wide suspend and
  hibernation, plus some driver changes depending on them and fixes for
  issues in that area which have become quite apparent recently.

  Also included are changes making more x86-based systems use the Low
  Power Sleep S0 _DSM interface by default, which turned out to be
  necessary to handle power button wakeups from suspend-to-idle on
  Surface Pro3.

  On the cpufreq front we have fixes and cleanups in the core, some new
  hardware support, driver updates and the removal of some unused code
  from the CPU cooling thermal driver.

  Apart from this, the Operating Performance Points (OPP) framework is
  prepared to be used with power domains in the future and there is a
  usual bunch of assorted fixes and cleanups.

  Specifics:

   - Define a PM driver flag allowing drivers to request that their
     devices be left in suspend after system-wide transitions to the
     working state if possible and add support for it to the PCI bus
     type and the ACPI PM domain (Rafael Wysocki).

   - Make the PM core carry out optimizations for devices with driver PM
     flags set in some cases and make a few drivers set those flags
     (Rafael Wysocki).

   - Fix and clean up wrapper routines allowing runtime PM device
     callbacks to be re-used for system-wide PM, change the generic
     power domains (genpd) framework to stop using those routines
     incorrectly and fix up a driver depending on that behavior of genpd
     (Rafael Wysocki, Ulf Hansson, Geert Uytterhoeven).

   - Fix and clean up the PM core's device wakeup framework and
     re-factor system-wide PM core code related to device wakeup
     (Rafael Wysocki, Ulf Hansson, Brian Norris).

   - Make more x86-based systems use the Low Power Sleep S0 _DSM
     interface by default (to fix power button wakeup from
     suspend-to-idle on Surface Pro3) and add a kernel command line
     switch to tell it to ignore the system sleep blacklist in the ACPI
     core (Rafael Wysocki).

   - Fix a race condition related to cpufreq governor module removal and
     clean up the governor management code in the cpufreq core (Rafael
     Wysocki).

   - Drop the unused generic code related to the handling of the static
     power energy usage model in the CPU cooling thermal driver along
     with the corresponding documentation (Viresh Kumar).

   - Add mt2712 support to the Mediatek cpufreq driver (Andrew-sh
     Cheng).

   - Add a new operating point to the imx6ul and imx6q cpufreq drivers
     and switch the latter to using clk_bulk_get() (Anson Huang, Dong
     Aisheng).

   - Add support for multiple regulators to the TI cpufreq driver along
     with a new DT binding related to that and clean up that driver
     somewhat (Dave Gerlach).

   - Fix a powernv cpufreq driver regression leading to incorrect CPU
     frequency reporting, fix that driver to deal with non-continguous
     P-states correctly and clean it up (Gautham Shenoy, Shilpasri
     Bhat).

   - Add support for frequency scaling on Armada 37xx SoCs through the
     generic DT cpufreq driver (Gregory CLEMENT).

   - Fix error code paths in the mvebu cpufreq driver (Gregory CLEMENT).

   - Fix a transition delay setting regression in the longhaul cpufreq
     driver (Viresh Kumar).

   - Add Skylake X (server) support to the intel_pstate cpufreq driver
     and clean up that driver somewhat (Srinivas Pandruvada).

   - Clean up the cpufreq statistics collection code (Viresh Kumar).

   - Drop cluster terminology and dependency on physical_package_id from
     the PSCI driver and drop dependency on arm_big_little from the SCPI
     cpufreq driver (Sudeep Holla).

   - Add support for system-wide suspend and resume to the RAPL power
     capping driver and drop a redundant semicolon from it (Zhen Han,
     Luis de Bethencourt).

   - Make SPI domain validation (in the SCSI SPI transport driver) and
     system-wide suspend mutually exclusive as they rely on the same
     underlying mechanism and cannot be carried out at the same time
     (Bart Van Assche).

   - Fix the computation of the amount of memory to preallocate in the
     hibernation core and clean up one function in there (Rainer Fiebig,
     Kyungsik Lee).

   - Prepare the Operating Performance Points (OPP) framework for being
     used with power domains and clean up one function in it (Viresh
     Kumar, Wei Yongjun).

   - Clean up the generic sysfs interface for device PM (Andy
     Shevchenko).

   - Fix several minor issues in power management frameworks and clean
     them up a bit (Arvind Yadav, Bjorn Andersson, Geert Uytterhoeven,
     Gustavo Silva, Julia Lawall, Luis de Bethencourt, Paul Gortmaker,
     Sergey Senozhatsky, gaurav jindal).

   - Make it easier to disable PM via Kconfig (Mark Brown).

   - Clean up the cpupower and intel_pstate_tracer utilities (Doug
     Smythies, Laura Abbott)"

* tag 'pm-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (89 commits)
  PCI / PM: Remove spurious semicolon
  cpufreq: scpi: remove arm_big_little dependency
  drivers: psci: remove cluster terminology and dependency on physical_package_id
  powercap: intel_rapl: Fix trailing semicolon
  dmaengine: rcar-dmac: Make DMAC reinit during system resume explicit
  PM / runtime: Allow no callbacks in pm_runtime_force_suspend|resume()
  PM / hibernate: Drop unused parameter of enough_swap
  PM / runtime: Check ignore_children in pm_runtime_need_not_resume()
  PM / runtime: Rework pm_runtime_force_suspend/resume()
  PM / genpd: Stop/start devices without pm_runtime_force_suspend/resume()
  cpufreq: powernv: Dont assume distinct pstate values for nominal and pmin
  cpufreq: intel_pstate: Add Skylake servers support
  cpufreq: intel_pstate: Replace bxt_funcs with core_funcs
  platform/x86: surfacepro3: Support for wakeup from suspend-to-idle
  ACPI / PM: Use Low Power S0 Idle on more systems
  PM / wakeup: Print warn if device gets enabled as wakeup source during sleep
  PM / domains: Don't skip driver's ->suspend|resume_noirq() callbacks
  PM / core: Propagate wakeup_path status flag in __device_suspend_late()
  PM / core: Re-structure code for clearing the direct_complete flag
  powercap: add suspend and resume mechanism for SOC power limit
  ...
2018-01-29 09:47:41 -08:00
Linus Torvalds
32c6cdf75c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A set of small fixes for 4.15:

   - Fix vmapped stack synchronization on systems with 4-level paging
     and a large amount of memory caused by a missing 5-level folding
     which made the pgd synchronization logic to fail and causing double
     faults.

   - Add a missing sanity check in the vmalloc_fault() logic on 5-level
     paging systems.

   - Bring back protection against accessing a freed initrd in the
     microcode loader which was lost by a wrong merge conflict
     resolution.

   - Extend the Broadwell micro code loading sanity check.

   - Add a missing ENDPROC annotation in ftrace assembly code which
     makes ORC unhappy.

   - Prevent loading the AMD power module on !AMD platforms. The load
     itself is uncritical, but an unload attempt results in a kernel
     crash.

   - Update Peter Anvins role in the MAINTAINERS file"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ftrace: Add one more ENDPROC annotation
  x86: Mark hpa as a "Designated Reviewer" for the time being
  x86/mm/64: Tighten up vmalloc_fault() sanity checks on 5-level kernels
  x86/mm/64: Fix vmapped stack syncing on very-large-memory 4-level systems
  x86/microcode: Fix again accessing initrd after having been freed
  x86/microcode/intel: Extend BDW late-loading further with LLC size check
  perf/x86/amd/power: Do not load AMD power module on !AMD platforms
2018-01-28 12:19:23 -08:00
Josh Poimboeuf
dd085168a7 x86/ftrace: Add one more ENDPROC annotation
When ORC support was added for the ftrace_64.S code, an ENDPROC
for function_hook() was missed. This results in the following warning:

  arch/x86/kernel/ftrace_64.o: warning: objtool: .entry.text+0x0: unreachable instruction

Fixes: e2ac83d74a ("x86/ftrace: Fix ORC unwinding from ftrace handlers")
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20180128022150.dqierscqmt3uwwsr@treble
2018-01-28 09:19:12 +01:00
Borislav Petkov
64e16720ea x86/speculation: Simplify indirect_branch_prediction_barrier()
Make it all a function which does the WRMSR instead of having a hairy
inline asm.

[dwmw2: export it, fix CONFIG_RETPOLINE issues]

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ak@linux.intel.com
Cc: dave.hansen@intel.com
Cc: karahmed@amazon.de
Cc: arjan@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: pbonzini@redhat.com
Cc: tim.c.chen@linux.intel.com
Cc: gregkh@linux-foundation.org
Link: https://lkml.kernel.org/r/1517070274-12128-4-git-send-email-dwmw@amazon.co.uk
2018-01-27 19:10:45 +01:00
David Woodhouse
2961298efe x86/cpufeatures: Clean up Spectre v2 related CPUID flags
We want to expose the hardware features simply in /proc/cpuinfo as "ibrs",
"ibpb" and "stibp". Since AMD has separate CPUID bits for those, use them
as the user-visible bits.

When the Intel SPEC_CTRL bit is set which indicates both IBRS and IBPB
capability, set those (AMD) bits accordingly. Likewise if the Intel STIBP
bit is set, set the AMD STIBP that's used for the generic hardware
capability.

Hide the rest from /proc/cpuinfo by putting "" in the comments. Including
RETPOLINE and RETPOLINE_AMD which shouldn't be visible there. There are
patches to make the sysfs vulnerabilities information non-readable by
non-root, and the same should apply to all information about which
mitigations are actually in use. Those *shouldn't* appear in /proc/cpuinfo.

The feature bit for whether IBPB is actually used, which is needed for
ALTERNATIVEs, is renamed to X86_FEATURE_USE_IBPB.

Originally-by: Borislav Petkov <bp@suse.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ak@linux.intel.com
Cc: dave.hansen@intel.com
Cc: karahmed@amazon.de
Cc: arjan@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: pbonzini@redhat.com
Cc: tim.c.chen@linux.intel.com
Cc: gregkh@linux-foundation.org
Link: https://lkml.kernel.org/r/1517070274-12128-2-git-send-email-dwmw@amazon.co.uk
2018-01-27 19:10:44 +01:00
Thomas Gleixner
e383095c7f x86/cpu/bugs: Make retpoline module warning conditional
If sysfs is disabled and RETPOLINE not defined:

arch/x86/kernel/cpu/bugs.c:97:13: warning: ‘spectre_v2_bad_module’ defined but not used
[-Wunused-variable]
 static bool spectre_v2_bad_module;

Hide it.

Fixes: caf7501a1b ("module/retpoline: Warn about missing retpoline in module")
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
2018-01-27 15:45:14 +01:00
Borislav Petkov
55fa19d3e5 x86/bugs: Drop one "mitigation" from dmesg
Make

[    0.031118] Spectre V2 mitigation: Mitigation: Full generic retpoline

into

[    0.031118] Spectre V2: Mitigation: Full generic retpoline

to reduce the mitigation mitigations strings.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: riel@redhat.com
Cc: ak@linux.intel.com
Cc: peterz@infradead.org
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: jikos@kernel.org
Cc: luto@amacapital.net
Cc: dave.hansen@intel.com
Cc: torvalds@linux-foundation.org
Cc: keescook@google.com
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: tim.c.chen@linux.intel.com
Cc: pjt@google.com
Link: https://lkml.kernel.org/r/20180126121139.31959-5-bp@alien8.de
2018-01-26 15:53:19 +01:00
Borislav Petkov
0e6c16c652 x86/alternative: Print unadorned pointers
After commit ad67b74d24 ("printk: hash addresses printed with %p")
pointers are being hashed when printed. However, this makes the alternative
debug output completely useless. Switch to %px in order to see the
unadorned kernel pointers.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: riel@redhat.com
Cc: ak@linux.intel.com
Cc: peterz@infradead.org
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: jikos@kernel.org
Cc: luto@amacapital.net
Cc: dave.hansen@intel.com
Cc: torvalds@linux-foundation.org
Cc: keescook@google.com
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: tim.c.chen@linux.intel.com
Cc: gregkh@linux-foundation.org
Cc: pjt@google.com
Link: https://lkml.kernel.org/r/20180126121139.31959-2-bp@alien8.de
2018-01-26 15:53:19 +01:00
David Woodhouse
20ffa1caec x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support
Expose indirect_branch_prediction_barrier() for use in subsequent patches.

[ tglx: Add IBPB status to spectre_v2 sysfs file ]

Co-developed-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: ak@linux.intel.com
Cc: ashok.raj@intel.com
Cc: dave.hansen@intel.com
Cc: arjan@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: pbonzini@redhat.com
Cc: tim.c.chen@linux.intel.com
Cc: gregkh@linux-foundation.org
Link: https://lkml.kernel.org/r/1516896855-7642-8-git-send-email-dwmw@amazon.co.uk
2018-01-26 15:53:18 +01:00
David Woodhouse
a5b2966364 x86/cpufeature: Blacklist SPEC_CTRL/PRED_CMD on early Spectre v2 microcodes
This doesn't refuse to load the affected microcodes; it just refuses to
use the Spectre v2 mitigation features if they're detected, by clearing
the appropriate feature bits.

The AMD CPUID bits are handled here too, because hypervisors *may* have
been exposing those bits even on Intel chips, for fine-grained control
of what's available.

It is non-trivial to use x86_match_cpu() for this table because that
doesn't handle steppings. And the approach taken in commit bd9240a18
almost made me lose my lunch.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: ak@linux.intel.com
Cc: ashok.raj@intel.com
Cc: dave.hansen@intel.com
Cc: karahmed@amazon.de
Cc: arjan@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: pbonzini@redhat.com
Cc: tim.c.chen@linux.intel.com
Cc: gregkh@linux-foundation.org
Link: https://lkml.kernel.org/r/1516896855-7642-7-git-send-email-dwmw@amazon.co.uk
2018-01-26 15:53:18 +01:00
David Woodhouse
fec9434a12 x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown
Also, for CPUs which don't speculate at all, don't report that they're
vulnerable to the Spectre variants either.

Leave the cpu_no_meltdown[] match table with just X86_VENDOR_AMD in it
for now, even though that could be done with a simple comparison, on the
assumption that we'll have more to add.

Based on suggestions from Dave Hansen and Alan Cox.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: ak@linux.intel.com
Cc: ashok.raj@intel.com
Cc: karahmed@amazon.de
Cc: arjan@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: pbonzini@redhat.com
Cc: tim.c.chen@linux.intel.com
Cc: gregkh@linux-foundation.org
Link: https://lkml.kernel.org/r/1516896855-7642-6-git-send-email-dwmw@amazon.co.uk
2018-01-26 15:53:18 +01:00
David Woodhouse
95ca0ee863 x86/cpufeatures: Add CPUID_7_EDX CPUID leaf
This is a pure feature bits leaf. There are two AVX512 feature bits in it
already which were handled as scattered bits, and three more from this leaf
are going to be added for speculation control features.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: ak@linux.intel.com
Cc: ashok.raj@intel.com
Cc: dave.hansen@intel.com
Cc: karahmed@amazon.de
Cc: arjan@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: pbonzini@redhat.com
Cc: tim.c.chen@linux.intel.com
Cc: gregkh@linux-foundation.org
Link: https://lkml.kernel.org/r/1516896855-7642-2-git-send-email-dwmw@amazon.co.uk
2018-01-26 15:53:16 +01:00
Andi Kleen
caf7501a1b module/retpoline: Warn about missing retpoline in module
There's a risk that a kernel which has full retpoline mitigations becomes
vulnerable when a module gets loaded that hasn't been compiled with the
right compiler or the right option.

To enable detection of that mismatch at module load time, add a module info
string "retpoline" at build time when the module was compiled with
retpoline support. This only covers compiled C source, but assembler source
or prebuilt object files are not checked.

If a retpoline enabled kernel detects a non retpoline protected module at
load time, print a warning and report it in the sysfs vulnerability file.

[ tglx: Massaged changelog ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: gregkh@linuxfoundation.org
Cc: torvalds@linux-foundation.org
Cc: jeyu@kernel.org
Cc: arjan@linux.intel.com
Link: https://lkml.kernel.org/r/20180125235028.31211-1-andi@firstfloor.org
2018-01-26 15:03:56 +01:00
davidwang
fe6daab1ee x86/centaur: Mark TSC invariant
Centaur CPU has a constant frequency TSC and that TSC does not stop in
C-States. But because the corresponding TSC feature flags are not set for
that CPU, the TSC is treated as not constant frequency and assumed to stop
in C-States, which makes it an unreliable and unusable clock source.

Setting those flags tells the kernel that the TSC is usable, so it will
select it over HPET.  The effect of this is that reading time stamps (from
kernel or user space) will be faster and more efficent.

Signed-off-by: davidwang <davidwang@zhaoxin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: qiyuanwang@zhaoxin.com
Cc: linux-pm@vger.kernel.org
Cc: brucechang@via-alliance.com
Cc: cooperyan@zhaoxin.com
Cc: benjaminpan@viatech.com
Link: https://lkml.kernel.org/r/1516616057-5158-1-git-send-email-davidwang@zhaoxin.com
2018-01-24 13:38:10 +01:00
Borislav Petkov
1d080f096f x86/microcode: Fix again accessing initrd after having been freed
Commit 24c2503255 ("x86/microcode: Do not access the initrd after it has
been freed") fixed attempts to access initrd from the microcode loader
after it has been freed. However, a similar KASAN warning was reported
(stack trace edited):

  smpboot: Booting Node 0 Processor 1 APIC 0x11
  ==================================================================
  BUG: KASAN: use-after-free in find_cpio_data+0x9b5/0xa50
  Read of size 1 at addr ffff880035ffd000 by task swapper/1/0

  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.8-slack #7
  Hardware name: System manufacturer System Product Name/A88X-PLUS, BIOS 3003 03/10/2016
  Call Trace:
   dump_stack
   print_address_description
   kasan_report
   ? find_cpio_data
   __asan_report_load1_noabort
   find_cpio_data
   find_microcode_in_initrd
   __load_ucode_amd
   load_ucode_amd_ap
      load_ucode_ap

After some investigation, it turned out that a merge was done using the
wrong side to resolve, leading to picking up the previous state, before
the 24c2503255 fix. Therefore the Fixes tag below contains a merge
commit.

Revert the mismerge by catching the save_microcode_in_initrd_amd()
retval and thus letting the function exit with the last return statement
so that initrd_gone can be set to true.

Fixes: f26483eaed ("Merge branch 'x86/urgent' into x86/microcode, to resolve conflicts")
Reported-by: <higuita@gmx.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=198295
Link: https://lkml.kernel.org/r/20180123104133.918-2-bp@alien8.de
2018-01-24 13:00:35 +01:00
Jia Zhang
7e702d17ed x86/microcode/intel: Extend BDW late-loading further with LLC size check
Commit b94b737331 ("x86/microcode/intel: Extend BDW late-loading with a
revision check") reduced the impact of erratum BDF90 for Broadwell model
79.

The impact can be reduced further by checking the size of the last level
cache portion per core.

Tony: "The erratum says the problem only occurs on the large-cache SKUs.
So we only need to avoid the update if we are on a big cache SKU that is
also running old microcode."

For more details, see erratum BDF90 in document #334165 (Intel Xeon
Processor E7-8800/4800 v4 Product Family Specification Update) from
September 2017.

Fixes: b94b737331 ("x86/microcode/intel: Extend BDW late-loading with a revision check")
Signed-off-by: Jia Zhang <zhang.jia@linux.alibaba.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1516321542-31161-1-git-send-email-zhang.jia@linux.alibaba.com
2018-01-24 13:00:35 +01:00
Steven Rostedt (VMware)
6be7fa3c74 ftrace, orc, x86: Handle ftrace dynamically allocated trampolines
The function tracer can create a dynamically allocated trampoline that is
called by the function mcount or fentry hook that is used to call the
function callback that is registered. The problem is that the orc undwinder
will bail if it encounters one of these trampolines. This breaks the stack
trace of function callbacks, which include the stack tracer and setting the
stack trace for individual functions.

Since these dynamic trampolines are basically copies of the static ftrace
trampolines defined in ftrace_*.S, we do not need to create new orc entries
for the dynamic trampolines. Finding the return address on the stack will be
identical as the functions that were copied to create the dynamic
trampolines. When encountering a ftrace dynamic trampoline, we can just use
the orc entry of the ftrace static function that was copied for that
trampoline.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23 15:56:55 -05:00
Josh Poimboeuf
e2ac83d74a x86/ftrace: Fix ORC unwinding from ftrace handlers
Steven Rostedt discovered that the ftrace stack tracer is broken when
it's used with the ORC unwinder.  The problem is that objtool is
instructed by the Makefile to ignore the ftrace_64.S code, so it doesn't
generate any ORC data for it.

Fix it by making the asm code objtool-friendly:

- Objtool doesn't like the fact that save_mcount_regs pushes RBP at the
  beginning, but it's never restored (directly, at least).  So just skip
  the original RBP push, which is only needed for frame pointers anyway.

- Annotate some functions as normal callable functions with
  ENTRY/ENDPROC.

- Add an empty unwind hint to return_to_handler().  The return address
  isn't on the stack, so there's nothing ORC can do there.  It will just
  punt in the unlikely case it tries to unwind from that code.

With all that fixed, remove the OBJECT_FILES_NON_STANDARD Makefile
annotation so objtool can read the file.

Link: http://lkml.kernel.org/r/20180123040746.ih4ep3tk4pbjvg7c@treble

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23 13:24:19 -05:00
Eric W. Biederman
83b57531c5 mm/memory_failure: Remove unused trapno from memory_failure
Today 4 architectures set ARCH_SUPPORTS_MEMORY_FAILURE (arm64, parisc,
powerpc, and x86), while 4 other architectures set __ARCH_SI_TRAPNO
(alpha, metag, sparc, and tile).  These two sets of architectures do
not interesect so remove the trapno paramater to remove confusion.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-23 12:17:42 -06:00
Linus Torvalds
5515114211 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 pti fixes from Thomas Gleixner:
 "A small set of fixes for the meltdown/spectre mitigations:

   - Make kprobes aware of retpolines to prevent probes in the retpoline
     thunks.

   - Make the machine check exception speculation protected. MCE used to
     issue an indirect call directly from the ASM entry code. Convert
     that to a direct call into a C-function and issue the indirect call
     from there so the compiler can add the retpoline protection,

   - Make the vmexit_fill_RSB() assembly less stupid

   - Fix a typo in the PTI documentation"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/retpoline: Optimize inline assembler for vmexit_fill_RSB
  x86/pti: Document fix wrong index
  kprobes/x86: Disable optimizing on the function jumps to indirect thunk
  kprobes/x86: Blacklist indirect thunk functions for kprobes
  retpoline: Introduce start/end markers of indirect thunk
  x86/mce: Make machine check speculation protected
2018-01-21 10:48:35 -08:00
Jan Kiszka
3b42349d56 x86/jailhouse: Respect pci=lastbus command line settings
Limiting the scan width to the known last bus via the command line can
accelerate the boot noteworthy.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jailhouse <jailhouse-dev@googlegroups.com>
Link: https://lkml.kernel.org/r/51f5fe62-ca8f-9286-5cdb-39df3fad78b4@siemens.com
2018-01-20 08:15:44 +01:00
Jan Kiszka
11f19ec025 x86/jailhouse: Set X86_FEATURE_TSC_KNOWN_FREQ
Otherwise, Linux will not recognize precalibrated_tsc_khz and disable
the tsc as clocksource.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jailhouse <jailhouse-dev@googlegroups.com>
Link: https://lkml.kernel.org/r/975fbfc9-2a64-cc56-40d5-164992ec3916@siemens.com
2018-01-20 08:15:43 +01:00
Masami Hiramatsu
c86a32c09f kprobes/x86: Disable optimizing on the function jumps to indirect thunk
Since indirect jump instructions will be replaced by jump
to __x86_indirect_thunk_*, those jmp instruction must be
treated as an indirect jump. Since optprobe prohibits to
optimize probes in the function which uses an indirect jump,
it also needs to find out the function which jump to
__x86_indirect_thunk_* and disable optimization.

Add a check that the jump target address is between the
__indirect_thunk_start/end when optimizing kprobe.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/151629212062.10241.6991266100233002273.stgit@devbox
2018-01-19 16:31:29 +01:00
Masami Hiramatsu
736e80a421 retpoline: Introduce start/end markers of indirect thunk
Introduce start/end markers of __x86_indirect_thunk_* functions.
To make it easy, consolidate .text.__x86.indirect_thunk.* sections
to one .text.__x86.indirect_thunk section and put it in the
end of kernel text section and adds __indirect_thunk_start/end
so that other subsystem (e.g. kprobes) can identify it.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/151629206178.10241.6828804696410044771.stgit@devbox
2018-01-19 16:31:28 +01:00
Thomas Gleixner
6f41c34d69 x86/mce: Make machine check speculation protected
The machine check idtentry uses an indirect branch directly from the low
level code. This evades the speculation protection.

Replace it by a direct call into C code and issue the indirect call there
so the compiler can apply the proper speculation protection.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by:Borislav Petkov <bp@alien8.de>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Niced-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801181626290.1847@nanos
2018-01-19 16:31:28 +01:00
Tom Lendacky
f23d74f6c6 x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()
Some issues have been reported with the for loop in stop_this_cpu() that
issues the 'wbinvd; hlt' sequence.  Reverting this sequence to halt()
has been shown to resolve the issue.

However, the wbinvd is needed when running with SME.  The reason for the
wbinvd is to prevent cache flush races between encrypted and non-encrypted
entries that have the same physical address.  This can occur when
kexec'ing from memory encryption active to inactive or vice-versa.  The
important thing is to not have outside of kernel text memory references
(such as stack usage), so the usage of the native_*() functions is needed
since these expand as inline asm sequences.  So instead of reverting the
change, rework the sequence.

Move the wbinvd instruction outside of the for loop as native_wbinvd()
and make its execution conditional on X86_FEATURE_SME.  In the for loop,
change the asm 'wbinvd; hlt' sequence back to a halt sequence but use
the native_halt() call.

Fixes: bba4ed011a ("x86/mm, kexec: Allow kexec to be used with SME")
Reported-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yu Chen <yu.c.chen@intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: kexec@lists.infradead.org
Cc: ebiederm@redhat.com
Cc: Borislav Petkov <bp@alien8.de>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180117234141.21184.44067.stgit@tlendack-t1.amdoffice.net
2018-01-18 11:48:59 +01:00
Fenghua Yu
31516de306 x86/intel_rdt: Add command line parameter to control L2_CDP
L2 CDP can be controlled by kernel parameter "rdt=".
If "rdt=l2cdp", L2 CDP is turned on.
If "rdt=!l2cdp", L2 CDP is turned off.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: Vikas" <vikas.shivappa@intel.com>
Cc: Sai Praneeth" <sai.praneeth.prakhya@intel.com>
Cc: Reinette" <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/1513810644-78015-7-git-send-email-fenghua.yu@intel.com
2018-01-18 09:33:32 +01:00
Fenghua Yu
99adde9b37 x86/intel_rdt: Enable L2 CDP in MSR IA32_L2_QOS_CFG
Bit 0 in MSR IA32_L2_QOS_CFG (0xc82) is L2 CDP enable bit. By default,
the bit is zero, i.e. L2 CAT is enabled, and L2 CDP is disabled. When
the resctrl mount parameter "cdpl2" is given, the bit is set to 1 and L2
CDP is enabled.

In L2 CDP mode, the L2 CAT mask MSRs are re-mapped into interleaved pairs
of mask MSRs for code (referenced by an odd CLOSID) and data (referenced by
an even CLOSID).

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: Vikas" <vikas.shivappa@intel.com>
Cc: Sai Praneeth" <sai.praneeth.prakhya@intel.com>
Cc: Reinette" <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/1513810644-78015-6-git-send-email-fenghua.yu@intel.com
2018-01-18 09:33:31 +01:00
Fenghua Yu
def1085393 x86/intel_rdt: Add two new resources for L2 Code and Data Prioritization (CDP)
L2 data and L2 code are added as new resources in rdt_resources_all[]
and data in the resources are configured.

When L2 CDP is enabled, the schemata will have the two resources in
this format:
L2DATA:l2id0=xxxx;l2id1=xxxx;....
L2CODE:l2id0=xxxx;l2id1=xxxx;....

xxxx represent CBM (Cache Bit Mask) values in the schemata, similar to all
others (L2 CAT/L3 CAT/L3 CDP).

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: Vikas" <vikas.shivappa@intel.com>
Cc: Sai Praneeth" <sai.praneeth.prakhya@intel.com>
Cc: Reinette" <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/1513810644-78015-5-git-send-email-fenghua.yu@intel.com
2018-01-18 09:33:31 +01:00
Fenghua Yu
a511e79353 x86/intel_rdt: Enumerate L2 Code and Data Prioritization (CDP) feature
L2 Code and Data Prioritization (CDP) is enumerated in
CPUID(EAX=0x10, ECX=0x2):ECX.bit2

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: Vikas" <vikas.shivappa@intel.com>
Cc: Sai Praneeth" <sai.praneeth.prakhya@intel.com>
Cc: Reinette" <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/1513810644-78015-4-git-send-email-fenghua.yu@intel.com
2018-01-18 09:33:30 +01:00
Rafael J. Wysocki
0c81e26e86 Merge branches 'acpi-x86', 'acpi-apei' and 'acpi-ec'
* acpi-x86:
  ACPI / x86: boot: Propagate error code in acpi_gsi_to_irq()
  ACPI / x86: boot: Don't setup SCI on HW-reduced platforms
  ACPI / x86: boot: Use INVALID_ACPI_IRQ instead of 0 for acpi_sci_override_gsi
  ACPI / x86: boot: Get rid of ACPI_INVALID_GSI
  ACPI / x86: boot: Swap variables in condition in acpi_register_gsi_ioapic()

* acpi-apei:
  ACPI / APEI: remove redundant variables len and node_len
  ACPI: APEI: call into AER handling regardless of severity
  ACPI: APEI: handle PCIe AER errors in separate function

* acpi-ec:
  ACPI: EC: Fix debugfs_create_*() usage
2018-01-18 03:01:55 +01:00
Rafael J. Wysocki
bcaea4678f Merge branches 'acpi-pm' and 'pm-sleep'
* acpi-pm:
  platform/x86: surfacepro3: Support for wakeup from suspend-to-idle
  ACPI / PM: Use Low Power S0 Idle on more systems
  ACPI / PM: Make it possible to ignore the system sleep blacklist

* pm-sleep:
  PM / hibernate: Drop unused parameter of enough_swap
  block, scsi: Fix race between SPI domain validation and system suspend
  PM / sleep: Make lock/unlock_system_sleep() available to kernel modules
  PM: hibernate: Do not subtract NR_FILE_MAPPED in minimum_image_size()
2018-01-18 02:55:28 +01:00
Dave Airlie
4a6cc7a44e Linux 4.15-rc8
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJaW+iVAAoJEHm+PkMAQRiGCDsIAJALNpX7odTx/8y+yCSWbpBH
 E57iwr4rmnI6tXJY6gqBUWTYnjAcf4b8IsHGCO6q3WIE3l/kt+m3eA21a32mF2Db
 /bfPGTOWu5LoOnFqzgH2kiFuC3Y474toxpld2YtkQWYxi5W7SUtIHi/jGgkUprth
 g15yPfwYgotJd/gpmPfBDMPlYDYvLlnPYbTG6ZWdMbg39m2RF2m0BdQ6aBFLHvbJ
 IN0tjCM6hrLFBP0+6Zn60pevUW9/AFYotZn2ankNTk5QVCQm14rgQIP+Pfoa5WpE
 I25r0DbkG2jKJCq+tlgIJjxHKD37GEDMc4T8/5Y8CNNeT9Q8si9EWvznjaAPazw=
 =o5gx
 -----END PGP SIGNATURE-----

BackMerge tag 'v4.15-rc8' into drm-next

Linux 4.15-rc8

Daniel requested this for so the intel CI won't fall over on drm-next
so often.
2018-01-18 09:32:15 +10:00
Linus Torvalds
1d966eb4d6 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - A rather involved set of memory hardware encryption fixes to
     support the early loading of microcode files via the initrd. These
     are larger than what we normally take at such a late -rc stage, but
     there are two mitigating factors: 1) much of the changes are
     limited to the SME code itself 2) being able to early load
     microcode has increased importance in the post-Meltdown/Spectre
     era.

   - An IRQ vector allocator fix

   - An Intel RDT driver use-after-free fix

   - An APIC driver bug fix/revert to make certain older systems boot
     again

   - A pkeys ABI fix

   - TSC calibration fixes

   - A kdump fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic/vector: Fix off by one in error path
  x86/intel_rdt/cqm: Prevent use after free
  x86/mm: Encrypt the initrd earlier for BSP microcode update
  x86/mm: Prepare sme_encrypt_kernel() for PAGE aligned encryption
  x86/mm: Centralize PMD flags in sme_encrypt_kernel()
  x86/mm: Use a struct to reduce parameters for SME PGD mapping
  x86/mm: Clean up register saving in the __enc_copy() assembly code
  x86/idt: Mark IDT tables __initconst
  Revert "x86/apic: Remove init_bsp_APIC()"
  x86/mm/pkeys: Fix fill_sig_info_pkey
  x86/tsc: Print tsc_khz, when it differs from cpu_khz
  x86/tsc: Fix erroneous TSC rate on Skylake Xeon
  x86/tsc: Future-proof native_calibrate_tsc()
  kdump: Write the correct address of mem_section into vmcoreinfo
2018-01-17 12:30:06 -08:00
Linus Torvalds
88dc7fca18 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 pti bits and fixes from Thomas Gleixner:
 "This last update contains:

   - An objtool fix to prevent a segfault with the gold linker by
     changing the invocation order. That's not just for gold, it's a
     general robustness improvement.

   - An improved error message for objtool which spares tearing hairs.

   - Make KASAN fail loudly if there is not enough memory instead of
     oopsing at some random place later

   - RSB fill on context switch to prevent RSB underflow and speculation
     through other units.

   - Make the retpoline/RSB functionality work reliably for both Intel
     and AMD

   - Add retpoline to the module version magic so mismatch can be
     detected

   - A small (non-fix) update for cpufeatures which prevents cpu feature
     clashing for the upcoming extra mitigation bits to ease
     backporting"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  module: Add retpoline tag to VERMAGIC
  x86/cpufeature: Move processor tracing out of scattered features
  objtool: Improve error message for bad file argument
  objtool: Fix seg fault with gold linker
  x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros
  x86/retpoline: Fill RSB on context switch for affected CPUs
  x86/kasan: Panic if there is not enough memory to boot
2018-01-17 11:54:56 -08:00
Thomas Gleixner
45d55e7bac x86/apic/vector: Fix off by one in error path
Keith reported the following warning:

WARNING: CPU: 28 PID: 1420 at kernel/irq/matrix.c:222 irq_matrix_remove_managed+0x10f/0x120
  x86_vector_free_irqs+0xa1/0x180
  x86_vector_alloc_irqs+0x1e4/0x3a0
  msi_domain_alloc+0x62/0x130

The reason for this is that if the vector allocation fails the error
handling code tries to free the failed vector as well, which causes the
above imbalance warning to trigger.

Adjust the error path to handle this correctly.

Fixes: b5dc8e6c21 ("x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors")
Reported-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Keith Busch <keith.busch@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801161217300.1823@nanos
2018-01-17 12:11:36 +01:00
Thomas Gleixner
d479244173 x86/intel_rdt/cqm: Prevent use after free
intel_rdt_iffline_cpu() -> domain_remove_cpu() frees memory first and then
proceeds accessing it.

 BUG: KASAN: use-after-free in find_first_bit+0x1f/0x80
 Read of size 8 at addr ffff883ff7c1e780 by task cpuhp/31/195
 find_first_bit+0x1f/0x80
 has_busy_rmid+0x47/0x70
 intel_rdt_offline_cpu+0x4b4/0x510

 Freed by task 195:
 kfree+0x94/0x1a0
 intel_rdt_offline_cpu+0x17d/0x510

Do the teardown first and then free memory.

Fixes: 24247aeeab ("x86/intel_rdt/cqm: Improve limbo list processing")
Reported-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Peter Zilstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "Roderick W. Smith" <rod.smith@canonical.com>
Cc: 1733662@bugs.launchpad.net
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801161957510.2366@nanos
2018-01-17 11:56:47 +01:00
Paolo Bonzini
4fdec2034b x86/cpufeature: Move processor tracing out of scattered features
Processor tracing is already enumerated in word 9 (CPUID[7,0].EBX),
so do not duplicate it in the scattered features word.

Besides being more tidy, this will be useful for KVM when it presents
processor tracing to the guests.  KVM selects host features that are
supported by both the host kernel (depending on command line options,
CPU errata, or whatever) and KVM.  Whenever a full feature word exists,
KVM's code is written in the expectation that the CPUID bit number
matches the X86_FEATURE_* bit number, but this is not the case for
X86_FEATURE_INTEL_PT.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luwei Kang <luwei.kang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/1516117345-34561-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-17 07:38:39 +01:00
Paolo Bonzini
65e38583c3 Merge branch 'sev-v9-p2' of https://github.com/codomania/kvm
This part of Secure Encrypted Virtualization (SEV) patch series focuses on KVM
changes required to create and manage SEV guests.

SEV is an extension to the AMD-V architecture which supports running encrypted
virtual machine (VMs) under the control of a hypervisor. Encrypted VMs have their
pages (code and data) secured such that only the guest itself has access to
unencrypted version. Each encrypted VM is associated with a unique encryption key;
if its data is accessed to a different entity using a different key the encrypted
guest's data will be incorrectly decrypted, leading to unintelligible data.
This security model ensures that hypervisor will no longer able to inspect or
alter any guest code or data.

The key management of this feature is handled by a separate processor known as
the AMD Secure Processor (AMD-SP) which is present on AMD SOCs. The SEV Key
Management Specification (see below) provides a set of commands which can be
used by hypervisor to load virtual machine keys through the AMD-SP driver.

The patch series adds a new ioctl in KVM driver (KVM_MEMORY_ENCRYPT_OP). The
ioctl will be used by qemu to issue SEV guest-specific commands defined in Key
Management Specification.

The following links provide additional details:

AMD Memory Encryption white paper:
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf

AMD64 Architecture Programmer's Manual:
    http://support.amd.com/TechDocs/24593.pdf
    SME is section 7.10
    SEV is section 15.34

SEV Key Management:
http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf

KVM Forum Presentation:
http://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf

SEV Guest BIOS support:
  SEV support has been add to EDKII/OVMF BIOS
  https://github.com/tianocore/edk2

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-01-16 16:35:32 +01:00
Wanpeng Li
858a43aae2 KVM: X86: use paravirtualized TLB Shootdown
Remote TLB flush does a busy wait which is fine in bare-metal
scenario. But with-in the guest, the vcpus might have been pre-empted or
blocked. In this scenario, the initator vcpu would end up busy-waiting
for a long amount of time; it also consumes CPU unnecessarily to wake
up the target of the shootdown.

This patch set adds support for KVM's new paravirtualized TLB flush;
remote TLB flush does not wait for vcpus that are sleeping, instead
KVM will flush the TLB as soon as the vCPU starts running again.

The improvement is clearly visible when the host is overcommitted; in this
case, the PV TLB flush (in addition to avoiding the wait on the main CPU)
prevents preempted vCPUs from stealing precious execution time from the
running ones.

Testing on a Xeon Gold 6142 2.6GHz 2 sockets, 32 cores, 64 threads,
so 64 pCPUs, and each VM is 64 vCPUs.

ebizzy -M
              vanilla    optimized     boost
1VM            46799       48670         4%
2VM            23962       42691        78%
3VM            16152       37539       132%

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-01-16 16:34:13 +01:00
Wanpeng Li
fa55eedd63 KVM: X86: Add KVM_VCPU_PREEMPTED
The next patch will add another bit to the preempted field in
kvm_steal_time.  Define a constant for bit 0 (the only one that is
currently used).

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-01-16 16:34:13 +01:00
Mike Travis
09c3ae12b2 x86/platform/UV: Fix GAM MMR references in the UV x2apic code
Along with the fixes in UV4A (rev2) MMRs, the code to access those
MMRs also was modified by the fixes.  UV3, UV4, and UV4A no longer
have compatible setups for Global Address Memory (GAM).

Correct the new mistakes.

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Acked-by: Andrew Banman <abanman@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dimitri Sivanich <sivanich@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1515440405-20880-6-git-send-email-mike.travis@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16 03:58:37 +01:00
Mike Travis
8078d1951d x86/platform/UV: Add references to access fixed UV4A HUB MMRs
Add references to enable access to fixed UV4A (rev2) HUB MMRs.

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Acked-by: Andrew Banman <abanman@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dimitri Sivanich <sivanich@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1515440405-20880-4-git-send-email-mike.travis@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16 03:58:37 +01:00
Mike Travis
62807106c3 x86/platform/UV: Fix UV4A support on new Intel Processors
Upcoming Intel CascadeLake and IceLake processors have some architecture
changes that required fixes in the UV4 HUB bringing that chip to
revision 2.  The nomenclature for that new chip is "UV4A".

This patch fixes the references for the expanded MMR definitions in the
previous (automated) patch.

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Acked-by: Andrew Banman <abanman@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dimitri Sivanich <sivanich@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1515440405-20880-3-git-send-email-mike.travis@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16 03:58:37 +01:00
Eric W. Biederman
ea64d5acc8 signal: Unify and correct copy_siginfo_to_user32
Among the existing architecture specific versions of
copy_siginfo_to_user32 there are several different implementation
problems.  Some architectures fail to handle all of the cases in in
the siginfo union.  Some architectures perform a blind copy of the
siginfo union when the si_code is negative.  A blind copy suggests the
data is expected to be in 32bit siginfo format, which means that
receiving such a signal via signalfd won't work, or that the data is
in 64bit siginfo and the code is copying nonsense to userspace.

Create a single instance of copy_siginfo_to_user32 that all of the
architectures can share, and teach it to handle all of the cases in
the siginfo union correctly, with the assumption that siginfo is
stored internally to the kernel is 64bit siginfo format.

A special case is made for x86 x32 format.  This is needed as presence
of both x32 and ia32 on x86_64 results in two different 32bit signal
formats.  By allowing this small special case there winds up being
exactly one code base that needs to be maintained between all of the
architectures.  Vastly increasing the testing base and the chances of
finding bugs.

As the x86 copy of copy_siginfo_to_user32 the call of the x86
signal_compat_build_tests were moved into sigaction_compat_abi, so
that they will keep running.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-15 19:56:20 -06:00
Tom Lendacky
107cd25321 x86/mm: Encrypt the initrd earlier for BSP microcode update
Currently the BSP microcode update code examines the initrd very early
in the boot process.  If SME is active, the initrd is treated as being
encrypted but it has not been encrypted (in place) yet.  Update the
early boot code that encrypts the kernel to also encrypt the initrd so
that early BSP microcode updates work.

Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180110192634.6026.10452.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-16 01:50:59 +01:00
Eric W. Biederman
212a36a17e signal: Unify and correct copy_siginfo_from_user32
The function copy_siginfo_from_user32 is used for two things, in ptrace
since the dawn of siginfo for arbirarily modifying a signal that
user space sees, and in sigqueueinfo to send a signal with arbirary
siginfo data.

Create a single copy of copy_siginfo_from_user32 that all architectures
share, and teach it to handle all of the cases in the siginfo union.

In the generic version of copy_siginfo_from_user32 ensure that all
of the fields in siginfo are initialized so that the siginfo structure
can be safely copied to userspace if necessary.

When copying the embedded sigval union copy the si_int member.  That
ensures the 32bit values passes through the kernel unchanged.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-15 17:55:59 -06:00
Eric W. Biederman
ac54058d77 signal/ia64: Move the ia64 specific si_codes to asm-generic/siginfo.h
Having si_codes in many different files simply encourages duplicate
definitions that can cause problems later.  To avoid that merge the
ia64 specific si_codes into uapi/asm-generic/siginfo.h

Update the sanity checks in arch/x86/kernel/signal_compat.c to expect
the now lager NSIGILL and NSIGFPE.  As nothing excpe the larger count
is exposed on x86 no additional code needs to be updated.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-15 17:42:33 -06:00
Al Viro
b713da69e4 signal: unify compat_siginfo_t
--EWB Added #ifdef CONFIG_X86_X32_ABI to arch/x86/kernel/signal_compat.c
      Changed #ifdef CONFIG_X86_X32 to #ifdef CONFIG_X86_X32_ABI in
      linux/compat.h

      CONFIG_X86_X32 is set when the user requests X32 support.

      CONFIG_X86_X32_ABI is set when the user requests X32 support
      and the tool-chain has X32 allowing X32 support to be built.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2018-01-15 17:40:31 -06:00
Thomas Gleixner
be6d447e4f x86/jailhouse: Hide x2apic code when CONFIG_X86_X2APIC=n
x2apic_phys is not available when CONFIG_X86_X2APIC=n and the code is not
optimized out resulting in a build fail:

jailhouse.c: In function ‘jailhouse_get_smp_config’:
jailhouse.c:73:3: error: ‘x2apic_phys’ undeclared (first use in this function)

Fixes: 11c8dc419b ("x86/jailhouse: Enable APIC and SMP support")
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: jailhouse-dev@googlegroups.com
2018-01-15 10:26:18 +01:00
Christoph Hellwig
7f2c8bbd32 swiotlb: rename swiotlb_free to swiotlb_exit
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-01-15 09:35:39 +01:00
Christoph Hellwig
9ce9765a09 x86: rename swiotlb_dma_ops
We'll need that name for a generic implementation soon.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-01-15 09:35:33 +01:00
Christoph Hellwig
cea9d03c82 dma-mapping: add an arch_dma_supported hook
To implement the x86 forbid_dac and iommu_sac_force we want an arch hook
so that it can apply the global options across all dma_map_ops
implementations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-01-15 09:34:59 +01:00
Christoph Hellwig
57bf5a8963 dma-mapping: clear harmful GFP_* flags in common code
Lift the code from x86 so that we behave consistently.  In the future we
should probably warn if any of these is set.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
2018-01-15 09:34:55 +01:00
David Woodhouse
c995efd5a7 x86/retpoline: Fill RSB on context switch for affected CPUs
On context switch from a shallow call stack to a deeper one, as the CPU
does 'ret' up the deeper side it may encounter RSB entries (predictions for
where the 'ret' goes to) which were populated in userspace.

This is problematic if neither SMEP nor KPTI (the latter of which marks
userspace pages as NX for the kernel) are active, as malicious code in
userspace may then be executed speculatively.

Overwrite the CPU's return prediction stack with calls which are predicted
to return to an infinite loop, to "capture" speculation if this
happens. This is required both for retpoline, and also in conjunction with
IBRS for !SMEP && !KPTI.

On Skylake+ the problem is slightly different, and an *underflow* of the
RSB may cause errant branch predictions to occur. So there it's not so much
overwrite, as *filling* the RSB to attempt to prevent it getting
empty. This is only a partial solution for Skylake+ since there are many
other conditions which may result in the RSB becoming empty. The full
solution on Skylake+ is to use IBRS, which will prevent the problem even
when the RSB becomes empty. With IBRS, the RSB-stuffing will not be
required on context switch.

[ tglx: Added missing vendor check and slighty massaged comments and
  	changelog ]

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515779365-9032-1-git-send-email-dwmw@amazon.co.uk
2018-01-15 00:32:44 +01:00
Jan Kiszka
a0c01e4bb9 x86/jailhouse: Initialize PCI support
With this change, PCI devices can be detected and used inside a non-root
cell.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/e8d19494b96b68a749bcac514795d864ad9c28c3.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:57 +01:00
Jan Kiszka
cf878e169d x86/jailhouse: Wire up IOAPIC for legacy UART ports
The typical I/O interrupts in non-root cells are MSI-based. However, the
platform UARTs do not support MSI. In order to run a non-root cell that
shall use one of them, the standard IOAPIC must be registered and 1:1
routing for IRQ 3 and 4 set up.

If an IOAPIC is not available, the boot loader clears standard_ioapic in
the setup data, so registration is skipped. If the guest is not allowed to
to use one of those pins, Jailhouse will simply ignore the access.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/90d942dda9d48a8046e00bb3c1bb6757c83227be.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:57 +01:00
Jan Kiszka
fd49807682 x86/jailhouse: Halt instead of failing to restart
Jailhouse provides no guest-initiated restart. So, do not even try to.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/ef8a0ef95c2b17c21066e5f28ea56b58bf7eaa82.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:57 +01:00
Jan Kiszka
5ae4443010 x86/jailhouse: Silence ACPI warning
Jailhouse support does not depend on ACPI, and does not even use it. But
if it should be enabled, avoid warning about its absence in the
platform.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/939687007cbd7643b02fd330e8616e7e5944063f.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:56 +01:00
Jan Kiszka
0d7c1e2218 x86/jailhouse: Avoid access of unsupported platform resources
Non-root cells do not have CMOS access, thus the warm reset cannot be
enabled. There is no RTC, thus also no wall clock. Furthermore, there
are no ISA IRQs and no PIC.

Also disable probing of i8042 devices that are typically blocked for
non-root cells. In theory, access could also be granted to a non-root
cell, provided the root cell is not using the devices. But there is no
concrete scenario in sight, and disabling probing over Jailhouse allows
to build generic kernels that keep CONFIG_SERIO enabled for use in
normal systems.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/39b68cc2c496501c9d95e6f40e5d76e3053c3908.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:56 +01:00
Jan Kiszka
e85eb632f6 x86/jailhouse: Set up timekeeping
Get the precalibrated frequencies for the TSC and the APIC timer from
the Jailhouse platform info and set the kernel values accordingly.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/b2557426332fc337a74d3141cb920f7dce9ad601.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:56 +01:00
Jan Kiszka
87e65d05bb x86/jailhouse: Enable PMTIMER
Jailhouse exposes the PMTIMER as only reference clock to all cells. Pick
up its address from the setup data. Allow to enable the Linux support of
it by relaxing its strict dependency on ACPI.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/6d5c3fadd801eb3fba9510e2d3db14a9c404a1a0.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:55 +01:00
Jan Kiszka
11c8dc419b x86/jailhouse: Enable APIC and SMP support
Register the APIC which Jailhouse always exposes at 0xfee00000 if in
xAPIC mode or via MSRs as x2APIC. The latter is only available if it was
already activated because there is no support for switching its mode
during runtime.

Jailhouse requires the APIC to be operated in phys-flat mode. Ensure
that this mode is selected by Linux.

The available CPUs are taken from the setup data structure that the
loader filled and registered with the kernel.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/8b2255da0a9856c530293a67aa9d6addfe102a2b.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:55 +01:00
Jan Kiszka
4a362601ba x86/jailhouse: Add infrastructure for running in non-root cell
The Jailhouse hypervisor is able to statically partition a multicore
system into multiple so-called cells. Linux is used as boot loader and
continues to run in the root cell after Jailhouse is enabled. Linux can
also run in non-root cells.

Jailhouse does not emulate usual x86 devices. It also provides no
complex ACPI but basic platform information that the boot loader
forwards via setup data. This adds the infrastructure to detect when
running in a non-root cell so that the platform can be configured as
required in succeeding steps.

Support is limited to x86-64 so far, primarily because no boot loader
stub exists for i386 and, thus, we wouldn't be able to test the 32-bit
path.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/7f823d077b38b1a70c526b40b403f85688c137d3.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:54 +01:00
Jan Kiszka
a09c5ec00a x86: Introduce and use MP IRQ trigger and polarity defines
MP_IRQDIR_* constants pointed in the right direction but remained unused so
far: It's cleaner to use symbolic values for the IRQ flags in the MP config
table. That also saves some comments.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/60809926663a1d38e2a5db47d020d6e2e7a70019.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:54 +01:00
Jan Kiszka
e348caef8b x86/platform: Control warm reset setup via legacy feature flag
Allow to turn off the setup of BIOS-managed warm reset via a new flag in
x86_legacy_features. Besides the UV1, the upcoming jailhose guest support
needs this switched off.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/44376558129d70a2c1527959811371ef4b82e829.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:53 +01:00
Jan Kiszka
32c9c801a8 x86/apic: Install an empty physflat_init_apic_ldr
As the comment already stated, there is no need for setting up LDR (and
DFR) in physflat mode as it remains unused (see SDM, 10.6.2.1).
flat_init_apic_ldr only served as a placeholder for a nop operation so
far, causing no harm.

That will change when running over the Jailhouse hypervisor. Here we
must not touch LDR in a way that destroys the mapping originally set up
by the Linux root cell. Jailhouse enforces this setting in order to
efficiently validate any IPI requests sent by a cell.

Avoid a needless clash caused by flat_init_apic_ldr by installing a true
nop handler.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/f9867d294cdae4d45ed89d3a2e6adb524f4f6794.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:53 +01:00
Peter Zijlstra
aa83c45762 x86/tsc: Introduce early tsc clocksource
Without TSC_KNOWN_FREQ the TSC clocksource is registered so late that the
kernel first switches to the HPET. Using HPET on large CPU count machines is
undesirable.

Therefore register a tsc-early clocksource using the preliminary tsc_khz
from quick calibration. Then when the final TSC calibration is done, it
can switch to the tuned frequency.

The only notably problem is that the real tsc clocksource must be marked
with CLOCK_SOURCE_VALID_FOR_HRES, otherwise it will not be selected when
unregistering tsc-early. tsc-early cannot be left registered, because then
the clocksource code would fall back to it when we tsc clocksource is
marked unstable later.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: len.brown@intel.com
Cc: rui.zhang@intel.com
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20171222092243.431585460@infradead.org
2018-01-14 20:18:23 +01:00
Peter Zijlstra
6d671e1b85 x86/time: Unconditionally register legacy timer interrupt
Even without a PIC/PIT the legacy timer interrupt is required for HPET in
legacy replacement mode.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: len.brown@intel.com
Cc: rui.zhang@intel.com
Link: https://lkml.kernel.org/r/20171222092243.382623763@infradead.org
2018-01-14 20:18:23 +01:00
Peter Zijlstra
30c7e5b123 x86/tsc: Allow TSC calibration without PIT
Zhang Rui reported that a Surface Pro 4 will fail to boot with
lapic=notscdeadline. Part of the problem is that that machine doesn't have
a PIT.

If, for some reason, the TSC init has to fall back to TSC calibration, it
relies on the PIT to be present.

Allow TSC calibration to reliably fall back to HPET.

The below results in an accurate TSC measurement when forced on a IVB:

  tsc: Unable to calibrate against PIT
  tsc: No reference (HPET/PMTIMER) available
  tsc: Unable to calibrate against PIT
  tsc: using HPET reference calibration
  tsc: Detected 2792.451 MHz processor

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: len.brown@intel.com
Cc: rui.zhang@intel.com
Link: https://lkml.kernel.org/r/20171222092243.333145937@infradead.org
2018-01-14 20:18:23 +01:00
Andi Kleen
327867faa4 x86/idt: Mark IDT tables __initconst
const variables must use __initconst, not __initdata.

Fix this up for the IDT tables, which got it consistently wrong.

Fixes: 16bc18d895 ("x86/idt: Move 32-bit idt_descr to C code")
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20171222001821.2157-7-andi@firstfloor.org
2018-01-14 20:09:45 +01:00
Linus Torvalds
40548c6b6c Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 pti updates from Thomas Gleixner:
 "This contains:

   - a PTI bugfix to avoid setting reserved CR3 bits when PCID is
     disabled. This seems to cause issues on a virtual machine at least
     and is incorrect according to the AMD manual.

   - a PTI bugfix which disables the perf BTS facility if PTI is
     enabled. The BTS AUX buffer is not globally visible and causes the
     CPU to fault when the mapping disappears on switching CR3 to user
     space. A full fix which restores BTS on PTI is non trivial and will
     be worked on.

   - PTI bugfixes for EFI and trusted boot which make sure that the user
     space visible page table entries have the NX bit cleared

   - removal of dead code in the PTI pagetable setup functions

   - add PTI documentation

   - add a selftest for vsyscall to verify that the kernel actually
     implements what it advertises.

   - a sysfs interface to expose vulnerability and mitigation
     information so there is a coherent way for users to retrieve the
     status.

   - the initial spectre_v2 mitigations, aka retpoline:

      + The necessary ASM thunk and compiler support

      + The ASM variants of retpoline and the conversion of affected ASM
        code

      + Make LFENCE serializing on AMD so it can be used as speculation
        trap

      + The RSB fill after vmexit

   - initial objtool support for retpoline

  As I said in the status mail this is the most of the set of patches
  which should go into 4.15 except two straight forward patches still on
  hold:

   - the retpoline add on of LFENCE which waits for ACKs

   - the RSB fill after context switch

  Both should be ready to go early next week and with that we'll have
  covered the major holes of spectre_v2 and go back to normality"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
  x86,perf: Disable intel_bts when PTI
  security/Kconfig: Correct the Documentation reference for PTI
  x86/pti: Fix !PCID and sanitize defines
  selftests/x86: Add test_vsyscall
  x86/retpoline: Fill return stack buffer on vmexit
  x86/retpoline/irq32: Convert assembler indirect jumps
  x86/retpoline/checksum32: Convert assembler indirect jumps
  x86/retpoline/xen: Convert Xen hypercall indirect jumps
  x86/retpoline/hyperv: Convert assembler indirect jumps
  x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
  x86/retpoline/entry: Convert entry assembler indirect jumps
  x86/retpoline/crypto: Convert crypto assembler indirect jumps
  x86/spectre: Add boot time option to select Spectre v2 mitigation
  x86/retpoline: Add initial retpoline support
  objtool: Allow alternatives to be ignored
  objtool: Detect jumps to retpoline thunks
  x86/pti: Make unpoison of pgd for trusted boot work for real
  x86/alternatives: Fix optimize_nops() checking
  sysfs/cpu: Fix typos in vulnerability documentation
  x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
  ...
2018-01-14 09:51:25 -08:00
Ville Syrjälä
fc90ccfd28 Revert "x86/apic: Remove init_bsp_APIC()"
This reverts commit b371ae0d4a. It causes
boot hangs on old P3/P4 systems when the local APIC is enforced in UP mode.

Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: yinghai@kernel.org
Cc: bhe@redhat.com
Link: https://lkml.kernel.org/r/20171128145350.21560-1-ville.syrjala@linux.intel.com
2018-01-14 12:14:51 +01:00
Len Brown
4b5b212723 x86/tsc: Print tsc_khz, when it differs from cpu_khz
If CPU and TSC frequency are the same the printout of the CPU frequency is
valid for the TSC as well:

      tsc: Detected 2900.000 MHz processor

If the TSC frequency is different there is no information in dmesg. Add a
conditional printout:

  tsc: Detected 2904.000 MHz TSC

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Link: https://lkml.kernel.org/r/537b342debcd8e8aebc8d631015dcdf9f9ba8a26.1513920414.git.len.brown@intel.com
2018-01-14 12:14:50 +01:00
Len Brown
b511203093 x86/tsc: Fix erroneous TSC rate on Skylake Xeon
The INTEL_FAM6_SKYLAKE_X hardcoded crystal_khz value of 25MHZ is
problematic:

 - SKX workstations (with same model # as server variants) use a 24 MHz
   crystal.  This results in a -4.0% time drift rate on SKX workstations.

 - SKX servers subject the crystal to an EMI reduction circuit that reduces its
   actual frequency by (approximately) -0.25%.  This results in -1 second per
   10 minute time drift as compared to network time.

This issue can also trigger a timer and power problem, on configurations
that use the LAPIC timer (versus the TSC deadline timer).  Clock ticks
scheduled with the LAPIC timer arrive a few usec before the time they are
expected (according to the slow TSC).  This causes Linux to poll-idle, when
it should be in an idle power saving state.  The idle and clock code do not
graciously recover from this error, sometimes resulting in significant
polling and measurable power impact.

Stop using native_calibrate_tsc() for INTEL_FAM6_SKYLAKE_X.
native_calibrate_tsc() will return 0, boot will run with tsc_khz = cpu_khz,
and the TSC refined calibration will update tsc_khz to correct for the
difference.

[ tglx: Sanitized change log ]

Fixes: 6baf3d6182 ("x86/tsc: Add additional Intel CPU models to the crystal quirk list")
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/ff6dcea166e8ff8f2f6a03c17beab2cb436aa779.1513920414.git.len.brown@intel.com
2018-01-14 12:14:50 +01:00
Len Brown
da4ae6c4a0 x86/tsc: Future-proof native_calibrate_tsc()
If the crystal frequency cannot be determined via CPUID(15).crystal_khz or
the built-in table then native_calibrate_tsc() will still set the
X86_FEATURE_TSC_KNOWN_FREQ flag which prevents the refined TSC calibration.

As a consequence such systems use cpu_khz for the TSC frequency which is
incorrect when cpu_khz != tsc_khz resulting in time drift.

Return early when the crystal frequency cannot be retrieved without setting
the X86_FEATURE_TSC_KNOWN_FREQ flag. This ensures that the refined TSC
calibration is invoked.

[ tglx: Steam-blastered changelog. Sigh ]

Fixes: 4ca4df0b7e ("x86/tsc: Mark TSC frequency determined by CPUID as known")
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Cc: Bin Gao <bin.gao@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/0fe2503aa7d7fc69137141fc705541a78101d2b9.1513920414.git.len.brown@intel.com
2018-01-14 12:14:50 +01:00
Eric W. Biederman
2f82a46f66 signal: Remove _sys_private and _overrun_incr from struct compat_siginfo
We have never passed either field to or from userspace so just remove them.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-12 14:34:46 -06:00
Andi Kleen
7614e913db x86/retpoline/irq32: Convert assembler indirect jumps
Convert all indirect jumps in 32bit irq inline asm code to use non
speculative sequences.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-12-git-send-email-dwmw@amazon.co.uk
2018-01-12 00:14:32 +01:00
David Woodhouse
9351803bd8 x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
Convert all indirect jumps in ftrace assembler code to use non-speculative
sequences when CONFIG_RETPOLINE is enabled.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-8-git-send-email-dwmw@amazon.co.uk
2018-01-12 00:14:30 +01:00
David Woodhouse
da28512156 x86/spectre: Add boot time option to select Spectre v2 mitigation
Add a spectre_v2= option to select the mitigation used for the indirect
branch speculation vulnerability.

Currently, the only option available is retpoline, in its various forms.
This will be expanded to cover the new IBRS/IBPB microcode features.

The RETPOLINE_AMD feature relies on a serializing LFENCE for speculation
control. For AMD hardware, only set RETPOLINE_AMD if LFENCE is a
serializing instruction, which is indicated by the LFENCE_RDTSC feature.

[ tglx: Folded back the LFENCE/AMD fixes and reworked it so IBRS
  	integration becomes simple ]

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-5-git-send-email-dwmw@amazon.co.uk
2018-01-12 00:14:29 +01:00
David Woodhouse
76b043848f x86/retpoline: Add initial retpoline support
Enable the use of -mindirect-branch=thunk-extern in newer GCC, and provide
the corresponding thunks. Provide assembler macros for invoking the thunks
in the same way that GCC does, from native and inline assembler.

This adds X86_FEATURE_RETPOLINE and sets it by default on all CPUs. In
some circumstances, IBRS microcode features may be used instead, and the
retpoline can be disabled.

On AMD CPUs if lfence is serialising, the retpoline can be dramatically
simplified to a simple "lfence; jmp *\reg". A future patch, after it has
been verified that lfence really is serialising in all circumstances, can
enable this by setting the X86_FEATURE_RETPOLINE_AMD feature bit in addition
to X86_FEATURE_RETPOLINE.

Do not align the retpoline in the altinstr section, because there is no
guarantee that it stays aligned when it's copied over the oldinstr during
alternative patching.

[ Andi Kleen: Rename the macros, add CONFIG_RETPOLINE option, export thunks]
[ tglx: Put actual function CALL/JMP in front of the macros, convert to
  	symbolic labels ]
[ dwmw2: Convert back to numeric labels, merge objtool fixes ]

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-4-git-send-email-dwmw@amazon.co.uk
2018-01-12 00:14:28 +01:00
Dave Hansen
445b69e3b7 x86/pti: Make unpoison of pgd for trusted boot work for real
The inital fix for trusted boot and PTI potentially misses the pgd clearing
if pud_alloc() sets a PGD.  It probably works in *practice* because for two
adjacent calls to map_tboot_page() that share a PGD entry, the first will
clear NX, *then* allocate and set the PGD (without NX clear).  The second
call will *not* allocate but will clear the NX bit.

Defer the NX clearing to a point after it is known that all top-level
allocations have occurred.  Add a comment to clarify why.

[ tglx: Massaged changelog ]

Fixes: 262b6b3008 ("x86/tboot: Unbreak tboot with PTI enabled")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: "Tim Chen" <tim.c.chen@linux.intel.com>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: peterz@infradead.org
Cc: ning.sun@intel.com
Cc: tboot-devel@lists.sourceforge.net
Cc: andi@firstfloor.org
Cc: luto@kernel.org
Cc: law@redhat.com
Cc: pbonzini@redhat.com
Cc: torvalds@linux-foundation.org
Cc: gregkh@linux-foundation.org
Cc: dwmw@amazon.co.uk
Cc: nickc@redhat.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180110224939.2695CD47@viggo.jf.intel.com
2018-01-11 23:36:59 +01:00
Jiri Bohac
2a3e83c6f9 x86/gart: Exclude GART aperture from vmcore
On machines where the GART aperture is mapped over physical RAM
/proc/vmcore contains the remapped range and reading it may cause hangs or
reboots.

In the past, the GART region was added into the resource map, implemented
by commit 56dd669a13 ("[PATCH] Insert GART region into resource map")

However, inserting the iomem_resource from the early GART code caused
resource conflicts with some AGP drivers (bko#72201), which got avoided by
reverting the patch in commit 707d4eefbd ("Revert [PATCH] Insert GART
region into resource map"). This revert introduced the /proc/vmcore bug.

The vmcore ELF header is either prepared by the kernel (when using the
kexec_file_load syscall) or by the kexec userspace (when using the kexec_load
syscall). Since we no longer have the GART iomem resource, the userspace
kexec has no way of knowing which region to exclude from the ELF header.

Changes from v1 of this patch:
Instead of excluding the aperture from the ELF header, this patch
makes /proc/vmcore return zeroes in the second kernel when attempting to
read the aperture region. This is done by reusing the
gart_oldmem_pfn_is_ram infrastructure originally intended to exclude XEN
balooned memory. This works for both, the kexec_file_load and kexec_load
syscalls.

[Note that the GART region is the same in the first and second kernels:
regardless whether the first kernel fixed up the northbridge/bios setting
and mapped the aperture over physical memory, the second kernel finds the
northbridge properly configured by the first kernel and the aperture
never overlaps with e820 memory because the second kernel has a fake e820
map created from the crashkernel memory regions. Thus, the second kernel
keeps the aperture address/size as configured by the first kernel.]

register_oldmem_pfn_is_ram can only register one callback and returns an error
if the callback has been registered already. Since XEN used to be the only user
of this function, it never checks the return value. Now that we have more than
one user, I added a WARN_ON just in case agp, XEN, or any other future user of
register_oldmem_pfn_is_ram were to step on each other's toes.

Fixes: 707d4eefbd ("Revert [PATCH] Insert GART region into resource map")
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Baoquan He <bhe@redhat.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: David Airlie <airlied@linux.ie>
Cc: yinghai@kernel.org
Cc: joro@8bytes.org
Cc: kexec@lists.infradead.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: https://lkml.kernel.org/r/20180106010013.73suskgxm7lox7g6@dwarf.suse.cz
2018-01-11 15:09:24 +01:00
Borislav Petkov
612e8e9350 x86/alternatives: Fix optimize_nops() checking
The alternatives code checks only the first byte whether it is a NOP, but
with NOPs in front of the payload and having actual instructions after it
breaks the "optimized' test.

Make sure to scan all bytes before deciding to optimize the NOPs in there.

Reported-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/20180110112815.mgciyf5acwacphkq@pd.tnic
2018-01-10 19:36:22 +01:00
Christoph Hellwig
ea8c64ace8 dma-mapping: move swiotlb arch helpers to a new header
phys_to_dma, dma_to_phys and dma_capable are helpers published by
architecture code for use of swiotlb and xen-swiotlb only.  Drivers are
not supposed to use these directly, but use the DMA API instead.

Move these to a new asm/dma-direct.h helper, included by a
linux/dma-direct.h wrapper that provides the default linear mapping
unless the architecture wants to override it.

In the MIPS case the existing dma-coherent.h is reused for now as
untangling it will take a bit of work.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Robin Murphy <robin.murphy@arm.com>
2018-01-10 16:40:54 +01:00
Joe Perches
6cbaefb4bf treewide: Use DEVICE_ATTR_WO
Convert DEVICE_ATTR uses to DEVICE_ATTR_WO where possible.

Done with perl script:

$ git grep -w --name-only DEVICE_ATTR | \
  xargs perl -i -e 'local $/; while (<>) { s/\bDEVICE_ATTR\s*\(\s*(\w+)\s*,\s*\(?(?:\s*S_IWUSR\s*|\s*0200\s*)\)?\s*,\s*NULL\s*,\s*\s_store\s*\)/DEVICE_ATTR_WO(\1)/g; print;}'

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-09 16:34:35 +01:00
Tom Lendacky
9c6a73c758 x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
With LFENCE now a serializing instruction, use LFENCE_RDTSC in preference
to MFENCE_RDTSC.  However, since the kernel could be running under a
hypervisor that does not support writing that MSR, read the MSR back and
verify that the bit has been set successfully.  If the MSR can be read
and the bit is set, then set the LFENCE_RDTSC feature, otherwise set the
MFENCE_RDTSC feature.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/20180108220932.12580.52458.stgit@tlendack-t1.amdoffice.net
2018-01-09 01:43:11 +01:00
Tom Lendacky
e4d0e84e49 x86/cpu/AMD: Make LFENCE a serializing instruction
To aid in speculation control, make LFENCE a serializing instruction
since it has less overhead than MFENCE.  This is done by setting bit 1
of MSR 0xc0011029 (DE_CFG).  Some families that support LFENCE do not
have this MSR.  For these families, the LFENCE instruction is already
serializing.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/20180108220921.12580.71694.stgit@tlendack-t1.amdoffice.net
2018-01-09 01:43:10 +01:00
Dave Hansen
262b6b3008 x86/tboot: Unbreak tboot with PTI enabled
This is another case similar to what EFI does: create a new set of
page tables, map some code at a low address, and jump to it.  PTI
mistakes this low address for userspace and mistakenly marks it
non-executable in an effort to make it unusable for userspace.

Undo the poison to allow execution.

Fixes: 385ce0ea4c ("x86/mm/pti: Add Kconfig")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jeff Law <law@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: David" <dwmw@amazon.co.uk>
Cc: Nick Clifton <nickc@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180108102805.GK25546@redhat.com
2018-01-08 17:29:18 +01:00
Thomas Gleixner
61dc0f555b x86/cpu: Implement CPU vulnerabilites sysfs functions
Implement the CPU vulnerabilty show functions for meltdown, spectre_v1 and
spectre_v2.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/20180107214913.177414879@linutronix.de
2018-01-08 11:10:40 +01:00
David Woodhouse
99c6fa2511 x86/cpufeatures: Add X86_BUG_SPECTRE_V[12]
Add the bug bits for spectre v1/2 and force them unconditionally for all
cpus.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1515239374-23361-2-git-send-email-dwmw@amazon.co.uk
2018-01-06 21:57:19 +01:00
Jia Zhang
b94b737331 x86/microcode/intel: Extend BDW late-loading with a revision check
Instead of blacklisting all model 79 CPUs when attempting a late
microcode loading, limit that only to CPUs with microcode revisions <
0x0b000021 because only on those late loading may cause a system hang.

For such processors either:

a) a BIOS update which might contain a newer microcode revision

or

b) the early microcode loading method

should be considered.

Processors with revisions 0x0b000021 or higher will not experience such
hangs.

For more details, see erratum BDF90 in document #334165 (Intel Xeon
Processor E7-8800/4800 v4 Product Family Specification Update) from
September 2017.

[ bp: Heavily massage commit message and pr_* statements. ]

Fixes: 723f2828a9 ("x86/microcode/intel: Disable late loading on model 79")
Signed-off-by: Jia Zhang <qianyue.zj@alibaba-inc.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Cc: <stable@vger.kernel.org> # v4.14
Link: http://lkml.kernel.org/r/1514772287-92959-1-git-send-email-qianyue.zj@alibaba-inc.com
2018-01-06 14:44:57 +01:00
Ingo Molnar
b6815f3545 Merge branch 'linus' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-06 12:07:10 +01:00
Linus Torvalds
abb7099dbc Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull  more x86 pti fixes from Thomas Gleixner:
 "Another small stash of fixes for fallout from the PTI work:

   - Fix the modules vs. KASAN breakage which was caused by making
     MODULES_END depend of the fixmap size. That was done when the cpu
     entry area moved into the fixmap, but now that we have a separate
     map space for that this is causing more issues than it solves.

   - Use the proper cache flush methods for the debugstore buffers as
     they are mapped/unmapped during runtime and not statically mapped
     at boot time like the rest of the cpu entry area.

   - Make the map layout of the cpu_entry_area consistent for 4 and 5
     level paging and fix the KASLR vaddr_end wreckage.

   - Use PER_CPU_EXPORT for per cpu variable and while at it unbreak
     nvidia gfx drivers by dropping the GPL export. The subject line of
     the commit tells it the other way around, but I noticed that too
     late.

   - Fix the ASM alternative macros so they can be used in the middle of
     an inline asm block.

   - Rename the BUG_CPU_INSECURE flag to BUG_CPU_MELTDOWN so the attack
     vector is properly identified. The Spectre mitigations will come
     with their own bug bits later"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN
  x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm
  x86/tlb: Drop the _GPL from the cpu_tlbstate export
  x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers
  x86/kaslr: Fix the vaddr_end mess
  x86/mm: Map cpu_entry_area at the same place on 4/5 level
  x86/mm: Set MODULES_END to 0xffffffffff000000
2018-01-05 12:23:57 -08:00