Commit Graph

28502 Commits

Author SHA1 Message Date
Jan Kiszka
fd49807682 x86/jailhouse: Halt instead of failing to restart
Jailhouse provides no guest-initiated restart. So, do not even try to.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/ef8a0ef95c2b17c21066e5f28ea56b58bf7eaa82.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:57 +01:00
Jan Kiszka
5ae4443010 x86/jailhouse: Silence ACPI warning
Jailhouse support does not depend on ACPI, and does not even use it. But
if it should be enabled, avoid warning about its absence in the
platform.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/939687007cbd7643b02fd330e8616e7e5944063f.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:56 +01:00
Jan Kiszka
0d7c1e2218 x86/jailhouse: Avoid access of unsupported platform resources
Non-root cells do not have CMOS access, thus the warm reset cannot be
enabled. There is no RTC, thus also no wall clock. Furthermore, there
are no ISA IRQs and no PIC.

Also disable probing of i8042 devices that are typically blocked for
non-root cells. In theory, access could also be granted to a non-root
cell, provided the root cell is not using the devices. But there is no
concrete scenario in sight, and disabling probing over Jailhouse allows
to build generic kernels that keep CONFIG_SERIO enabled for use in
normal systems.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/39b68cc2c496501c9d95e6f40e5d76e3053c3908.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:56 +01:00
Jan Kiszka
e85eb632f6 x86/jailhouse: Set up timekeeping
Get the precalibrated frequencies for the TSC and the APIC timer from
the Jailhouse platform info and set the kernel values accordingly.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/b2557426332fc337a74d3141cb920f7dce9ad601.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:56 +01:00
Jan Kiszka
87e65d05bb x86/jailhouse: Enable PMTIMER
Jailhouse exposes the PMTIMER as only reference clock to all cells. Pick
up its address from the setup data. Allow to enable the Linux support of
it by relaxing its strict dependency on ACPI.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/6d5c3fadd801eb3fba9510e2d3db14a9c404a1a0.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:55 +01:00
Jan Kiszka
11c8dc419b x86/jailhouse: Enable APIC and SMP support
Register the APIC which Jailhouse always exposes at 0xfee00000 if in
xAPIC mode or via MSRs as x2APIC. The latter is only available if it was
already activated because there is no support for switching its mode
during runtime.

Jailhouse requires the APIC to be operated in phys-flat mode. Ensure
that this mode is selected by Linux.

The available CPUs are taken from the setup data structure that the
loader filled and registered with the kernel.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/8b2255da0a9856c530293a67aa9d6addfe102a2b.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:55 +01:00
Jan Kiszka
4a362601ba x86/jailhouse: Add infrastructure for running in non-root cell
The Jailhouse hypervisor is able to statically partition a multicore
system into multiple so-called cells. Linux is used as boot loader and
continues to run in the root cell after Jailhouse is enabled. Linux can
also run in non-root cells.

Jailhouse does not emulate usual x86 devices. It also provides no
complex ACPI but basic platform information that the boot loader
forwards via setup data. This adds the infrastructure to detect when
running in a non-root cell so that the platform can be configured as
required in succeeding steps.

Support is limited to x86-64 so far, primarily because no boot loader
stub exists for i386 and, thus, we wouldn't be able to test the 32-bit
path.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/7f823d077b38b1a70c526b40b403f85688c137d3.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:54 +01:00
Jan Kiszka
a09c5ec00a x86: Introduce and use MP IRQ trigger and polarity defines
MP_IRQDIR_* constants pointed in the right direction but remained unused so
far: It's cleaner to use symbolic values for the IRQ flags in the MP config
table. That also saves some comments.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/60809926663a1d38e2a5db47d020d6e2e7a70019.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:54 +01:00
Jan Kiszka
e348caef8b x86/platform: Control warm reset setup via legacy feature flag
Allow to turn off the setup of BIOS-managed warm reset via a new flag in
x86_legacy_features. Besides the UV1, the upcoming jailhose guest support
needs this switched off.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/44376558129d70a2c1527959811371ef4b82e829.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:53 +01:00
Jan Kiszka
32c9c801a8 x86/apic: Install an empty physflat_init_apic_ldr
As the comment already stated, there is no need for setting up LDR (and
DFR) in physflat mode as it remains unused (see SDM, 10.6.2.1).
flat_init_apic_ldr only served as a placeholder for a nop operation so
far, causing no harm.

That will change when running over the Jailhouse hypervisor. Here we
must not touch LDR in a way that destroys the mapping originally set up
by the Linux root cell. Jailhouse enforces this setting in order to
efficiently validate any IPI requests sent by a cell.

Avoid a needless clash caused by flat_init_apic_ldr by installing a true
nop handler.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/f9867d294cdae4d45ed89d3a2e6adb524f4f6794.1511770314.git.jan.kiszka@siemens.com
2018-01-14 21:11:53 +01:00
Linus Torvalds
c92a9a461d Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Two pending (non-PTI) x86 fixes:

   - an Intel-MID crash fix

   - and an Intel microcode loader blacklist quirk to avoid a
     problematic revision"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/platform/intel-mid: Revert "Make 'bt_sfi_data' const"
  x86/microcode/intel: Extend BDW late-loading with a revision check
2018-01-12 10:32:11 -08:00
Linus Torvalds
02776b9b53 Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
 "No functional effects intended: removes leftovers from recent lockdep
  and refcounts work"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/refcounts: Remove stale comment from the ARCH_HAS_REFCOUNT Kconfig entry
  locking/lockdep: Remove cross-release leftovers
  locking/Documentation: Remove stale crossrelease_fullstack parameter
2018-01-12 10:14:09 -08:00
Linus Torvalds
aefd140437 xen: fixes for 4.15-rc8
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJaWMzkAAoJELDendYovxMvyRQH/34SP3oe7EtapIwDg0YqXBgJ
 0nX2ZgMvhylR1ANZ4mPwP5z/CQZvrFWWAqhPZdGTntrzkvsNf5uZ0s38f3fk/eu8
 3XxSDoZRrr/RFms42smS3NZRMFgagzLRrCVZrFWgFaDmsa/2/e6k46GAPRHI8Chg
 tEB6h8yHp1R9s4Z232I5pkilA7/ggABk8oRgKWQzrVqnfuD0G+byDm0h0kgan6br
 a63XVD1JTGCmEFI2KaRCthd+r5H34oDMEUS4anJXVdWwid23rSbsAsbO7Cfb9qKR
 XDA/ls371IrcqiHhnWepuXyUs811hF8qRoSTmG6gDtPXxazONcw0x0L8lK+S5Wo=
 =gLWS
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.15-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "This contains two build fixes for clang and two fixes for rather
  unlikely situations in the Xen gntdev driver"

* tag 'for-linus-4.15-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/gntdev: Fix partial gntdev_mmap() cleanup
  xen/gntdev: Fix off-by-one error when unmapping with holes
  x86: xen: remove the use of VLAIS
  x86/xen/time: fix section mismatch for xen_init_time_ops()
2018-01-12 10:00:15 -08:00
Paolo Bonzini
2aad9b3e07 Merge branch 'kvm-insert-lfence' into kvm-master
Topic branch for CVE-2017-5753, avoiding conflicts in the next merge window.
2018-01-11 18:20:48 +01:00
Andrew Honig
75f139aaf8 KVM: x86: Add memory barrier on vmcs field lookup
This adds a memory barrier when performing a lookup into
the vmcs_field_to_offset_table.  This is related to
CVE-2017-5753.

Signed-off-by: Andrew Honig <ahonig@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-01-11 18:20:31 +01:00
Paolo Bonzini
bd89525a82 KVM: x86: emulate #UD while in guest mode
This reverts commits ae1f576707
and ac9b305caa.

If the hardware doesn't support MOVBE, but L0 sets CPUID.01H:ECX.MOVBE
in L1's emulated CPUID information, then L1 is likely to pass that
CPUID bit through to L2. L2 will expect MOVBE to work, but if L1
doesn't intercept #UD, then any MOVBE instruction executed in L2 will
raise #UD, and the exception will be delivered in L2.

Commit ac9b305caa is a better and more
complete version of ae1f576707 ("KVM: nVMX: Do not emulate #UD while
in guest mode"); however, neither considers the above case.

Suggested-by: Jim Mattson <jmattson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-01-11 16:55:24 +01:00
Arnd Bergmann
ab271bd4df x86: kvm: propagate register_shrinker return code
Patch "mm,vmscan: mark register_shrinker() as __must_check" is
queued for 4.16 in linux-mm and adds a warning about the unchecked
call to register_shrinker:

arch/x86/kvm/mmu.c:5485:2: warning: ignoring return value of 'register_shrinker', declared with attribute warn_unused_result [-Wunused-result]

This changes the kvm_mmu_module_init() function to fail itself
when the call to register_shrinker fails.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-01-11 16:53:13 +01:00
Haozhong Zhang
2a266f2355 KVM MMU: check pending exception before injecting APF
For example, when two APF's for page ready happen after one exit and
the first one becomes pending, the second one will result in #DF.
Instead, just handle the second page fault synchronously.

Reported-by: Ross Zwisler <zwisler@gmail.com>
Message-ID: <CAOxpaSUBf8QoOZQ1p4KfUp0jq76OKfGY4Uxs-Gg8ngReD99xww@mail.gmail.com>
Reported-by: Alec Blayne <ab@tevsa.net>
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-01-11 14:05:19 +01:00
Eric Biggers
f328299e54 locking/refcounts: Remove stale comment from the ARCH_HAS_REFCOUNT Kconfig entry
ARCH_HAS_REFCOUNT is no longer marked as broken ('if BROKEN'), so remove
the stale comment regarding it being broken.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171229195303.17781-1-ebiggers3@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-08 20:05:04 +01:00
Andy Shevchenko
9d0513d82f x86/platform/intel-mid: Revert "Make 'bt_sfi_data' const"
So one of the constification patches unearthed a type casting fragility
of the underlying code:

  276c870547 ("x86/platform/intel-mid: Make 'bt_sfi_data' const")

converted the struct to be const while it is also used as a temporary
container for important data that is used to fill 'parent' and 'name'
fields in struct platform_device_info.

The compiler doesn't notice this due to an explicit type cast that loses
the const - which fragility will be fixed separately.

This type cast turned a seemingly trivial const propagation patch into a
hard to debug data corruptor and crasher bug.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Bhumika Goyal <bhumirks@gmail.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: julia.lawall@lip6.fr
Cc: platform-driver-x86@vger.kernel.org
Link: http://lkml.kernel.org/r/20171228122523.21802-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-08 20:01:44 +01:00
Nick Desaulniers
66a640e782 x86: xen: remove the use of VLAIS
Variable Length Arrays In Structs (VLAIS) is not supported by Clang, and
frowned upon by others.

https://lkml.org/lkml/2013/9/23/500

Here, the VLAIS was used because the size of the bitmap returned from
xen_mc_entry() depended on possibly (based on kernel configuration)
runtime sized data. Rather than declaring args as a VLAIS then calling
sizeof on *args, we calculate the appropriate sizeof args manually.
Further, we can get rid of the #ifdef's and rely on num_possible_cpus()
(thanks to a helpful checkpatch warning from an earlier version of this
patch).

Suggested-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Nick Desaulniers <nick.desaulniers@gmail.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-01-08 09:41:32 -05:00
Nick Desaulniers
0dd6d272d3 x86/xen/time: fix section mismatch for xen_init_time_ops()
The header declares this function as __init but is defined in __ref
section.

Signed-off-by: Nick Desaulniers <nick.desaulniers@gmail.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-01-08 09:40:57 -05:00
Linus Torvalds
5b6c02f383 KVM fixes for v4.15-rc7
s390:
 * Two fixes for potential bitmap overruns in the cmma migration code
 
 x86:
 * Clear guest provided GPRs to defeat the Project Zero PoC for CVE
   2017-5715
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJaUTJ4AAoJEED/6hsPKofohk0IAJAFlMG66u5MxC0kSM61U4Zf
 1vkzRwAkBbcN82LpGQKbqabVyTq0F3aLipyOn6WO5SN0K5m+OI2OV/aAroPyX8bI
 F7nWIqTXLhJ9X6KXINFvyavHMprvWl8PA72tR/B/7GhhfShrZ2wGgqhl0vv/kCUK
 /8q+5e693yJqw8ceemin9a6kPJrLpmjeH+Oy24KIlGbvJWV4UrIE86pRHnAnBtg8
 L7Vbxn5+ezKmakvBh+zF8NKcD1zHDcmQZHoYFPsQT0vX5GPoYqT2bcO6gsh1Grmp
 8ti6KkrnP+j2A/OEna4LBWfwKI/1xHXneB22BYrAxvNjHt+R4JrjaPpx82SEB4Y=
 =URMR
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Radim Krčmář:
 "s390:
   - Two fixes for potential bitmap overruns in the cmma migration code

  x86:
   - Clear guest provided GPRs to defeat the Project Zero PoC for CVE
     2017-5715"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: vmx: Scrub hardware GPRs at VM-exit
  KVM: s390: prevent buffer overrun on memory hotplug during migration
  KVM: s390: fix cmma migration for multiple memory slots
2018-01-06 17:05:05 -08:00
Jia Zhang
b94b737331 x86/microcode/intel: Extend BDW late-loading with a revision check
Instead of blacklisting all model 79 CPUs when attempting a late
microcode loading, limit that only to CPUs with microcode revisions <
0x0b000021 because only on those late loading may cause a system hang.

For such processors either:

a) a BIOS update which might contain a newer microcode revision

or

b) the early microcode loading method

should be considered.

Processors with revisions 0x0b000021 or higher will not experience such
hangs.

For more details, see erratum BDF90 in document #334165 (Intel Xeon
Processor E7-8800/4800 v4 Product Family Specification Update) from
September 2017.

[ bp: Heavily massage commit message and pr_* statements. ]

Fixes: 723f2828a9 ("x86/microcode/intel: Disable late loading on model 79")
Signed-off-by: Jia Zhang <qianyue.zj@alibaba-inc.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Cc: <stable@vger.kernel.org> # v4.14
Link: http://lkml.kernel.org/r/1514772287-92959-1-git-send-email-qianyue.zj@alibaba-inc.com
2018-01-06 14:44:57 +01:00
Linus Torvalds
abb7099dbc Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull  more x86 pti fixes from Thomas Gleixner:
 "Another small stash of fixes for fallout from the PTI work:

   - Fix the modules vs. KASAN breakage which was caused by making
     MODULES_END depend of the fixmap size. That was done when the cpu
     entry area moved into the fixmap, but now that we have a separate
     map space for that this is causing more issues than it solves.

   - Use the proper cache flush methods for the debugstore buffers as
     they are mapped/unmapped during runtime and not statically mapped
     at boot time like the rest of the cpu entry area.

   - Make the map layout of the cpu_entry_area consistent for 4 and 5
     level paging and fix the KASLR vaddr_end wreckage.

   - Use PER_CPU_EXPORT for per cpu variable and while at it unbreak
     nvidia gfx drivers by dropping the GPL export. The subject line of
     the commit tells it the other way around, but I noticed that too
     late.

   - Fix the ASM alternative macros so they can be used in the middle of
     an inline asm block.

   - Rename the BUG_CPU_INSECURE flag to BUG_CPU_MELTDOWN so the attack
     vector is properly identified. The Spectre mitigations will come
     with their own bug bits later"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN
  x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm
  x86/tlb: Drop the _GPL from the cpu_tlbstate export
  x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers
  x86/kaslr: Fix the vaddr_end mess
  x86/mm: Map cpu_entry_area at the same place on 4/5 level
  x86/mm: Set MODULES_END to 0xffffffffff000000
2018-01-05 12:23:57 -08:00
Linus Torvalds
b03acc4cc2 Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Thomas Gleixner:

 - A fix for a add_efi_memmap parameter regression which ensures that
   the parameter is parsed before it is used.

 - Reinstate the virtual capsule mapping as the cached copy turned out
   to break Quark and other things

 - Remove Matt Fleming as EFI co-maintainer. He stepped back a few days
   ago. Thanks Matt for all your great work!

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  MAINTAINERS: Remove Matt Fleming as EFI co-maintainer
  efi/capsule-loader: Reinstate virtual capsule mapping
  x86/efi: Fix kernel param add_efi_memmap regression
2018-01-05 12:20:35 -08:00
Jim Mattson
0cb5b30698 kvm: vmx: Scrub hardware GPRs at VM-exit
Guest GPR values are live in the hardware GPRs at VM-exit.  Do not
leave any guest values in hardware GPRs after the guest GPR values are
saved to the vcpu_vmx structure.

This is a partial mitigation for CVE 2017-5715 and CVE 2017-5753.
Specifically, it defeats the Project Zero PoC for CVE 2017-5715.

Suggested-by: Eric Northup <digitaleric@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Eric Northup <digitaleric@google.com>
Reviewed-by: Benjamin Serebrin <serebrin@google.com>
Reviewed-by: Andrew Honig <ahonig@google.com>
[Paolo: Add AMD bits, Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-01-05 16:48:40 +01:00
Thomas Gleixner
de791821c2 x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN
Use the name associated with the particular attack which needs page table
isolation for mitigation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Jiri Koshina <jikos@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Lutomirski  <luto@amacapital.net>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Greg KH <gregkh@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801051525300.1724@nanos
2018-01-05 15:34:43 +01:00
David Woodhouse
b9e705ef7c x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm
Where an ALTERNATIVE is used in the middle of an inline asm block, this
would otherwise lead to the following instruction being appended directly
to the trailing ".popsection", and a failed compile.

Fixes: 9cebed423c ("x86, alternative: Use .pushsection/.popsection")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: ak@linux.intel.com
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180104143710.8961-8-dwmw@amazon.co.uk
2018-01-05 14:01:15 +01:00
Thomas Gleixner
1e5476815f x86/tlb: Drop the _GPL from the cpu_tlbstate export
The recent changes for PTI touch cpu_tlbstate from various tlb_flush
inlines. cpu_tlbstate is exported as GPL symbol, so this causes a
regression when building out of tree drivers for certain graphics cards.

Aside of that the export was wrong since it was introduced as it should
have been EXPORT_PER_CPU_SYMBOL_GPL().

Use the correct PER_CPU export and drop the _GPL to restore the previous
state which allows users to utilize the cards they payed for.

As always I'm really thrilled to make this kind of change to support the
#friends (or however the hot hashtag of today is spelled) from that closet
sauce graphics corp.

Fixes: 1e02ce4ccc ("x86: Store a per-cpu shadow copy of CR4")
Fixes: 6fd166aae7 ("x86/mm: Use/Fix PCID to optimize user/kernel switches")
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
2018-01-05 00:39:58 +01:00
Peter Zijlstra
42f3bdc5dd x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers
Thomas reported the following warning:

 BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4498
 caller is native_flush_tlb_single+0x57/0xc0
 native_flush_tlb_single+0x57/0xc0
 __set_pte_vaddr+0x2d/0x40
 set_pte_vaddr+0x2f/0x40
 cea_set_pte+0x30/0x40
 ds_update_cea.constprop.4+0x4d/0x70
 reserve_ds_buffers+0x159/0x410
 x86_reserve_hardware+0x150/0x160
 x86_pmu_event_init+0x3e/0x1f0
 perf_try_init_event+0x69/0x80
 perf_event_alloc+0x652/0x740
 SyS_perf_event_open+0x3f6/0xd60
 do_syscall_64+0x5c/0x190

set_pte_vaddr is used to map the ds buffers into the cpu entry area, but
there are two problems with that:

 1) The resulting flush is not supposed to be called in preemptible context

 2) The cpu entry area is supposed to be per CPU, but the debug store
    buffers are mapped for all CPUs so these mappings need to be flushed
    globally.

Add the necessary preemption protection across the mapping code and flush
TLBs globally.

Fixes: c1961a4631 ("x86/events/intel/ds: Map debug buffers in cpu_entry_area")
Reported-by: Thomas Zeitlhofer <thomas.zeitlhofer+lkml@ze-it.at>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Thomas Zeitlhofer <thomas.zeitlhofer+lkml@ze-it.at>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180104170712.GB3040@hirez.programming.kicks-ass.net
2018-01-05 00:39:58 +01:00
Thomas Gleixner
1dddd25125 x86/kaslr: Fix the vaddr_end mess
vaddr_end for KASLR is only documented in the KASLR code itself and is
adjusted depending on config options. So it's not surprising that a change
of the memory layout causes KASLR to have the wrong vaddr_end. This can map
arbitrary stuff into other areas causing hard to understand problems.

Remove the whole ifdef magic and define the start of the cpu_entry_area to
be the end of the KASLR vaddr range.

Add documentation to that effect.

Fixes: 92a0f81d89 ("x86/cpu_entry_area: Move it out of the fixmap")
Reported-by: Benjamin Gilbert <benjamin.gilbert@coreos.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Benjamin Gilbert <benjamin.gilbert@coreos.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable <stable@vger.kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>,
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos
2018-01-05 00:39:57 +01:00
Thomas Gleixner
f207890481 x86/mm: Map cpu_entry_area at the same place on 4/5 level
There is no reason for 4 and 5 level pagetables to have a different
layout. It just makes determining vaddr_end for KASLR harder than
necessary.

Fixes: 92a0f81d89 ("x86/cpu_entry_area: Move it out of the fixmap")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Gilbert <benjamin.gilbert@coreos.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable <stable@vger.kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>,
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos
2018-01-04 23:04:57 +01:00
Andrey Ryabinin
f5a40711fa x86/mm: Set MODULES_END to 0xffffffffff000000
Since f06bdd4001 ("x86/mm: Adapt MODULES_END based on fixmap section size")
kasan_mem_to_shadow(MODULES_END) could be not aligned to a page boundary.

So passing page unaligned address to kasan_populate_zero_shadow() have two
possible effects:

1) It may leave one page hole in supposed to be populated area. After commit
  21506525fb ("x86/kasan/64: Teach KASAN about the cpu_entry_area") that
  hole happens to be in the shadow covering fixmap area and leads to crash:

 BUG: unable to handle kernel paging request at fffffbffffe8ee04
 RIP: 0010:check_memory_region+0x5c/0x190

 Call Trace:
  <NMI>
  memcpy+0x1f/0x50
  ghes_copy_tofrom_phys+0xab/0x180
  ghes_read_estatus+0xfb/0x280
  ghes_notify_nmi+0x2b2/0x410
  nmi_handle+0x115/0x2c0
  default_do_nmi+0x57/0x110
  do_nmi+0xf8/0x150
  end_repeat_nmi+0x1a/0x1e

Note, the crash likely disappeared after commit 92a0f81d89, which
changed kasan_populate_zero_shadow() call the way it was before
commit 21506525fb.

2) Attempt to load module near MODULES_END will fail, because
   __vmalloc_node_range() called from kasan_module_alloc() will hit the
   WARN_ON(!pte_none(*pte)) in the vmap_pte_range() and bail out with error.

To fix this we need to make kasan_mem_to_shadow(MODULES_END) page aligned
which means that MODULES_END should be 8*PAGE_SIZE aligned.

The whole point of commit f06bdd4001 was to move MODULES_END down if
NR_CPUS is big, so the cpu_entry_area takes a lot of space.
But since 92a0f81d89 ("x86/cpu_entry_area: Move it out of the fixmap")
the cpu_entry_area is no longer in fixmap, so we could just set
MODULES_END to a fixed 8*PAGE_SIZE aligned address.

Fixes: f06bdd4001 ("x86/mm: Adapt MODULES_END based on fixmap section size")
Reported-by: Jakub Kicinski <kubakici@wp.pl>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Garnier <thgarnie@google.com>
Link: https://lkml.kernel.org/r/20171228160620.23818-1-aryabinin@virtuozzo.com
2018-01-04 23:04:57 +01:00
Linus Torvalds
00a5ae218d Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 page table isolation fixes from Thomas Gleixner:
 "A couple of urgent fixes for PTI:

   - Fix a PTE mismatch between user and kernel visible mapping of the
     cpu entry area (differs vs. the GLB bit) and causes a TLB mismatch
     MCE on older AMD K8 machines

   - Fix the misplaced CR3 switch in the SYSCALL compat entry code which
     causes access to unmapped kernel memory resulting in double faults.

   - Fix the section mismatch of the cpu_tss_rw percpu storage caused by
     using a different mechanism for declaration and definition.

   - Two fixes for dumpstack which help to decode entry stack issues
     better

   - Enable PTI by default in Kconfig. We should have done that earlier,
     but it slipped through the cracks.

   - Exclude AMD from the PTI enforcement. Not necessarily a fix, but if
     AMD is so confident that they are not affected, then we should not
     burden users with the overhead"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/process: Define cpu_tss_rw in same section as declaration
  x86/pti: Switch to kernel CR3 at early in entry_SYSCALL_compat()
  x86/dumpstack: Print registers for first stack frame
  x86/dumpstack: Fix partial register dumps
  x86/pti: Make sure the user/kernel PTEs match
  x86/cpu, x86/pti: Do not enable PTI on AMD processors
  x86/pti: Enable PTI by default
2018-01-03 16:41:07 -08:00
Nick Desaulniers
2fd9c41aea x86/process: Define cpu_tss_rw in same section as declaration
cpu_tss_rw is declared with DECLARE_PER_CPU_PAGE_ALIGNED
but then defined with DEFINE_PER_CPU_SHARED_ALIGNED
leading to section mismatch warnings.

Use DEFINE_PER_CPU_PAGE_ALIGNED consistently. This is necessary because
it's mapped to the cpu entry area and must be page aligned.

[ tglx: Massaged changelog a bit ]

Fixes: 1a935bc3d4 ("x86/entry: Move SYSENTER_stack to the beginning of struct tss_struct")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: thomas.lendacky@amd.com
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: tklauser@distanz.ch
Cc: minipli@googlemail.com
Cc: me@kylehuey.com
Cc: namit@vmware.com
Cc: luto@kernel.org
Cc: jpoimboe@redhat.com
Cc: tj@kernel.org
Cc: cl@linux.com
Cc: bp@suse.de
Cc: thgarnie@google.com
Cc: kirill.shutemov@linux.intel.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180103203954.183360-1-ndesaulniers@google.com
2018-01-03 23:19:33 +01:00
Thomas Gleixner
d7732ba55c x86/pti: Switch to kernel CR3 at early in entry_SYSCALL_compat()
The preparation for PTI which added CR3 switching to the entry code
misplaced the CR3 switch in entry_SYSCALL_compat().

With PTI enabled the entry code tries to access a per cpu variable after
switching to kernel GS. This fails because that variable is not mapped to
user space. This results in a double fault and in the worst case a kernel
crash.

Move the switch ahead of the access and clobber RSP which has been saved
already.

Fixes: 8a09317b89 ("x86/mm/pti: Prepare the x86/entry assembly code for entry/exit CR3 switching")
Reported-by: Lars Wendler <wendler.lars@web.de>
Reported-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>, 
Cc: Dave Hansen <dave.hansen@linux.intel.com>, 
Cc: Peter Zijlstra <peterz@infradead.org>, 
Cc: Greg KH <gregkh@linuxfoundation.org>, , 
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>, 
Cc: Juergen Gross <jgross@suse.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801031949200.1957@nanos
2018-01-03 23:19:32 +01:00
Josh Poimboeuf
3ffdeb1a02 x86/dumpstack: Print registers for first stack frame
In the stack dump code, if the frame after the starting pt_regs is also
a regs frame, the registers don't get printed.  Fix that.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Tested-by: Alexander Tsoy <alexander@tsoy.me>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toralf Förster <toralf.foerster@gmx.de>
Cc: stable@vger.kernel.org
Fixes: 3b3fa11bc7 ("x86/dumpstack: Print any pt_regs found on the stack")
Link: http://lkml.kernel.org/r/396f84491d2f0ef64eda4217a2165f5712f6a115.1514736742.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-03 16:14:46 +01:00
Josh Poimboeuf
a9cdbe72c4 x86/dumpstack: Fix partial register dumps
The show_regs_safe() logic is wrong.  When there's an iret stack frame,
it prints the entire pt_regs -- most of which is random stack data --
instead of just the five registers at the end.

show_regs_safe() is also poorly named: the on_stack() checks aren't for
safety.  Rename the function to show_regs_if_on_stack() and add a
comment to explain why the checks are needed.

These issues were introduced with the "partial register dump" feature of
the following commit:

  b02fcf9ba1 ("x86/unwinder: Handle stack overflows more gracefully")

That patch had gone through a few iterations of development, and the
above issues were artifacts from a previous iteration of the patch where
'regs' pointed directly to the iret frame rather than to the (partially
empty) pt_regs.

Tested-by: Alexander Tsoy <alexander@tsoy.me>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toralf Förster <toralf.foerster@gmx.de>
Cc: stable@vger.kernel.org
Fixes: b02fcf9ba1 ("x86/unwinder: Handle stack overflows more gracefully")
Link: http://lkml.kernel.org/r/5b05b8b344f59db2d3d50dbdeba92d60f2304c54.1514736742.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-03 16:14:46 +01:00
Thomas Gleixner
52994c256d x86/pti: Make sure the user/kernel PTEs match
Meelis reported that his K8 Athlon64 emits MCE warnings when PTI is
enabled:

[Hardware Error]: Error Addr: 0x0000ffff81e000e0
[Hardware Error]: MC1 Error: L1 TLB multimatch.
[Hardware Error]: cache level: L1, tx: INSN

The address is in the entry area, which is mapped into kernel _AND_ user
space. That's special because we switch CR3 while we are executing
there. 

User mapping:
0xffffffff81e00000-0xffffffff82000000           2M     ro         PSE     GLB x  pmd

Kernel mapping:
0xffffffff81000000-0xffffffff82000000          16M     ro         PSE         x  pmd

So the K8 is complaining that the TLB entries differ. They differ in the
GLB bit.

Drop the GLB bit when installing the user shared mapping.

Fixes: 6dc72c3cbc ("x86/mm/pti: Share entry text PMD")
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Meelis Roos <mroos@linux.ee>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801031407180.1957@nanos
2018-01-03 15:57:59 +01:00
Tom Lendacky
694d99d409 x86/cpu, x86/pti: Do not enable PTI on AMD processors
AMD processors are not subject to the types of attacks that the kernel
page table isolation feature protects against.  The AMD microarchitecture
does not allow memory references, including speculative references, that
access higher privileged data when running in a lesser privileged mode
when that access would result in a page fault.

Disable page table isolation by default on AMD processors by not setting
the X86_BUG_CPU_INSECURE feature, which controls whether X86_FEATURE_PTI
is set.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20171227054354.20369.94587.stgit@tlendack-t1.amdoffice.net
2018-01-03 15:57:59 +01:00
Ard Biesheuvel
f24c4d4780 efi/capsule-loader: Reinstate virtual capsule mapping
Commit:

  82c3768b8d ("efi/capsule-loader: Use a cached copy of the capsule header")

... refactored the capsule loading code that maps the capsule header,
to avoid having to map it several times.

However, as it turns out, the vmap() call we ended up removing did not
just map the header, but the entire capsule image, and dropping this
virtual mapping breaks capsules that are processed by the firmware
immediately (i.e., without a reboot).

Unfortunately, that change was part of a larger refactor that allowed
a quirk to be implemented for Quark, which has a non-standard memory
layout for capsules, and we have slightly painted ourselves into a
corner by allowing quirk code to mangle the capsule header and memory
layout.

So we need to fix this without breaking Quark. Fortunately, Quark does
not appear to care about the virtual mapping, and so we can simply
do a partial revert of commit:

  2a457fb31d ("efi/capsule-loader: Use page addresses rather than struct page pointers")

... and create a vmap() mapping of the entire capsule (including header)
based on the reinstated struct page array, unless running on Quark, in
which case we pass the capsule header copy as before.

Reported-by: Ge Song <ge.song@hxt-semitech.com>
Tested-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Tested-by: Ge Song <ge.song@hxt-semitech.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: <stable@vger.kernel.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Fixes: 82c3768b8d ("efi/capsule-loader: Use a cached copy of the capsule header")
Link: http://lkml.kernel.org/r/20180102172110.17018-3-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-03 13:54:31 +01:00
Dave Young
835bcec5fd x86/efi: Fix kernel param add_efi_memmap regression
'add_efi_memmap' is an early param, but do_add_efi_memmap() has no
chance to run because the code path is before parse_early_param().
I believe it worked when the param was introduced but probably later
some other changes caused the wrong order and nobody noticed it.

Move efi_memblock_x86_reserve_range() after parse_early_param()
to fix it.

Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Cc: Ge Song <ge.song@hxt-semitech.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180102172110.17018-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-03 13:54:31 +01:00
Linus Torvalds
f39d7d78b7 Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A couple of fixlets for x86:

   - Fix the ESPFIX double fault handling for 5-level pagetables

   - Fix the commandline parsing for 'apic=' on 32bit systems and update
     documentation

   - Make zombie stack traces reliable

   - Fix kexec with stack canary

   - Fix the delivery mode for APICs which was missed when the x86
     vector management was converted to single target delivery. Caused a
     regression due to the broken hardware which ignores affinity
     settings in lowest prio delivery mode.

   - Unbreak modules when AMD memory encryption is enabled

   - Remove an unused parameter of prepare_switch_to"

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Switch all APICs to Fixed delivery mode
  x86/apic: Update the 'apic=' description of setting APIC driver
  x86/apic: Avoid wrong warning when parsing 'apic=' in X86-32 case
  x86-32: Fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR)
  x86: Remove unused parameter of prepare_switch_to
  x86/stacktrace: Make zombie stack traces reliable
  x86/mm: Unbreak modules that use the DMA API
  x86/build: Make isoimage work on Debian
  x86/espfix/64: Fix espfix double-fault handling on 5-level systems
2017-12-31 13:13:56 -08:00
Linus Torvalds
52c90f2d32 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 page table isolation fixes from Thomas Gleixner:
 "Four patches addressing the PTI fallout as discussed and debugged
  yesterday:

   - Remove stale and pointless TLB flush invocations from the hotplug
     code

   - Remove stale preempt_disable/enable from __native_flush_tlb()

   - Plug the memory leak in the write_ldt() error path"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ldt: Make LDT pgtable free conditional
  x86/ldt: Plug memory leak in error path
  x86/mm: Remove preempt_disable/enable() from __native_flush_tlb()
  x86/smpboot: Remove stale TLB flush invocations
2017-12-31 13:03:05 -08:00
Linus Torvalds
e7c632fc47 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:

 - plug a memory leak in the intel pmu init code

 - clang fixes

 - tooling fix to avoid including kernel headers

 - a fix for jvmti to generate correct debug information for inlined
   code

 - replace backtick with a regular shell function

 - fix the build in hardened environments

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Plug memory leak in intel_pmu_init()
  x86/asm: Allow again using asm.h when building for the 'bpf' clang target
  tools arch s390: Do not include header files from the kernel sources
  perf jvmti: Generate correct debug information for inlined code
  perf tools: Fix up build in hardened environments
  perf tools: Use shell function for perl cflags retrieval
2017-12-31 11:47:24 -08:00
Linus Torvalds
88fa025d30 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
 "A rather large update after the kaisered maintainer finally found time
  to handle regression reports.

   - The larger part addresses a regression caused by the x86 vector
     management rework.

     The reservation based model does not work reliably for MSI
     interrupts, if they cannot be masked (yes, yet another hw
     engineering trainwreck). The reason is that the reservation mode
     assigns a dummy vector when the interrupt is allocated and switches
     to a real vector when the interrupt is requested.

     If the MSI entry cannot be masked then the initialization might
     raise an interrupt before the interrupt is requested, which ends up
     as spurious interrupt and causes device malfunction and worse. The
     fix is to exclude MSI interrupts which do not support masking from
     reservation mode and assign a real vector right away.

   - Extend the extra lockdep class setup for nested interrupts with a
     class for the recently added irq_desc::request_mutex so lockdep can
     differeniate and does not emit false positive warnings.

   - A ratelimit guard for the bad irq printout so in case a bad irq
     comes back immediately the system does not drown in dmesg spam"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/msi, x86/vector: Prevent reservation mode for non maskable MSI
  genirq/irqdomain: Rename early argument of irq_domain_activate_irq()
  x86/vector: Use IRQD_CAN_RESERVE flag
  genirq: Introduce IRQD_CAN_RESERVE flag
  genirq/msi: Handle reactivation only on success
  gpio: brcmstb: Make really use of the new lockdep class
  genirq: Guard handle_bad_irq log messages
  kernel/irq: Extend lockdep class for request mutex
2017-12-31 11:23:11 -08:00
Thomas Gleixner
7f414195b0 x86/ldt: Make LDT pgtable free conditional
Andy prefers to be paranoid about the pagetable free in the error path of
write_ldt(). Make it conditional and warn whenever the installment of a
secondary LDT fails.

Requested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-12-31 16:55:09 +01:00
Thomas Gleixner
a62d69857a x86/ldt: Plug memory leak in error path
The error path in write_ldt() tries to free 'old_ldt' instead of the newly
allocated 'new_ldt', resulting in a memory leak. It also misses to clean up a
half populated LDT pagetable, which is not a leak as it gets cleaned up
when the process exits.

Free both the potentially half populated LDT pagetable and the newly
allocated LDT struct. This can be done unconditionally because once an LDT
is mapped subsequent maps will succeed, because the PTE page is already
populated and the two LDTs fit into that single page.

Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: f55f0501cb ("x86/pti: Put the LDT in its own PGD if PTI is on")
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1712311121340.1899@nanos
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-31 12:14:07 +01:00
Thomas Gleixner
decab0888e x86/mm: Remove preempt_disable/enable() from __native_flush_tlb()
The preempt_disable/enable() pair in __native_flush_tlb() was added in
commit:

  5cf0791da5 ("x86/mm: Disable preemption during CR3 read+write")

... to protect the UP variant of flush_tlb_mm_range().

That preempt_disable/enable() pair should have been added to the UP variant
of flush_tlb_mm_range() instead.

The UP variant was removed with commit:

  ce4a4e565f ("x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code")

... but the preempt_disable/enable() pair stayed around.

The latest change to __native_flush_tlb() in commit:

  6fd166aae7 ("x86/mm: Use/Fix PCID to optimize user/kernel switches")

... added an access to a per CPU variable outside the preempt disabled
regions, which makes no sense at all. __native_flush_tlb() must always
be called with at least preemption disabled.

Remove the preempt_disable/enable() pair and add a WARN_ON_ONCE() to catch
bad callers independent of the smp_processor_id() debugging.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171230211829.679325424@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-31 12:12:51 +01:00