Commit Graph

7877 Commits

Author SHA1 Message Date
Linus Torvalds
86bbbebac1 Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Ingo Molnar:
 "The main changes in this cycle were:

   - AMD MCE support/decoding improvements (Yazen Ghannam)

   - general MCE header cleanups and reorganization (Borislav Petkov)"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "x86/mce/AMD: Collect error info even if valid bits are not set"
  x86/MCE: Cleanup and complete struct mce fields definitions
  x86/mce/AMD: Carve out SMCA get_block_address() code
  x86/mce/AMD: Get address from already initialized block
  x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type
  x86/mce/AMD: Pass the bank number to smca_get_bank_type()
  x86/mce/AMD: Collect error info even if valid bits are not set
  x86/mce: Issue the 'mcelog --ascii' message only on !AMD
  x86/mce: Convert 'struct mca_config' bools to a bitfield
  x86/mce: Put private structures and definitions into the internal header
2018-04-02 11:47:07 -07:00
Linus Torvalds
486adcea4a Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "The main kernel side changes were:

   - Modernize the kprobe and uprobe creation/destruction tooling ABIs:

     The existing text based APIs (kprobe_events and uprobe_events in
     tracefs), are naive, limited ABIs in that they require user-space
     to clean up after themselves, which is both difficult and fragile
     if the tool is buggy or exits unexpectedly. In other words they are
     not really suited for modern, robust tooling.

     So introduce a modern, file descriptor based ABI that does not have
     these limitations: introduce the 'perf_kprobe' and 'perf_uprobe'
     PMUs and extend the perf_event_open() syscall to create events with
     a kprobe/uprobe attached to them. These [k,u]probe are associated
     with this file descriptor, so they are not available in tracefs.

     (Song Liu)

   - Intel Cannon Lake CPU support (Harry Pan)

   - Intel PT cleanups (Alexander Shishkin)

   - Improve the performance of pinned/flexible event groups by using RB
     trees (Alexey Budankov)

   - Add PERF_EVENT_IOC_MODIFY_ATTRIBUTES which allows the modification
     of hardware breakpoints, which new ABI variant massively speeds up
     existing tooling that uses hardware breakpoints to instrument (and
     debug) memory usage.

     (Milind Chabbi, Jiri Olsa)

   - Various Intel PEBS handling fixes and improvements, and other Intel
     PMU improvements (Kan Liang)

   - Various perf core improvements and optimizations (Peter Zijlstra)

   - ... misc cleanups, fixes and updates.

  There's over 200 tooling commits, here's an (imperfect) list of
  highlights:

   - 'perf annotate' improvements:

      * Recognize and handle jumps to other functions as calls, which
        improves the navigation along jumps and back. (Arnaldo Carvalho
        de Melo)

      * Add the 'P' hotkey in TUI annotation to dump annotation output
        into a file, to ease e-mail reporting of annotation details.
        (Arnaldo Carvalho de Melo)

      * Add an IPC/cycles column to the TUI (Jin Yao)

      * Improve s390 assembly annotation (Thomas Richter)

      * Refactor the output formatting logic to better separate it into
        interactive and non-interactive features and add the --stdio2
        output variant to demonstrate this. (Arnaldo Carvalho de Melo)

   - 'perf script' improvements:

      * Add Python 3 support (Jaroslav Škarvada)

      * Add --show-round-event (Jiri Olsa)

   - 'perf c2c' improvements:

      * Add NUMA analysis support (Jiri Olsa)

   - 'perf trace' improvements:

      * Improve PowerPC support (Ravi Bangoria)

   - 'perf inject' improvements:

      * Integrate ARM CoreSight traces (Robert Walker)

   - 'perf stat' improvements:

      * Add the --interval-count option (yuzhoujian)

      * Add the --timeout option (yuzhoujian)

   - 'perf sched' improvements (Changbin Du)

   - Vendor events improvements :

      * Add IBM s390 vendor events (Thomas Richter)

      * Add and improve arm64 vendor events (John Garry, Ganapatrao
        Kulkarni)

      * Update POWER9 vendor events (Sukadev Bhattiprolu)

   - Intel PT tooling improvements (Adrian Hunter)

   - PMU handling improvements (Agustin Vega-Frias)

   - Record machine topology in perf.data (Jiri Olsa)

   - Various overwrite related cleanups (Kan Liang)

   - Add arm64 dwarf post unwind support (Kim Phillips, Jean Pihet)

   - ... and lots of other changes, cleanups and fixes, see the shortlog
     and Git history for details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (262 commits)
  perf/x86/intel: Enable C-state residency events for Cannon Lake
  perf/x86/intel: Add Cannon Lake support for RAPL profiling
  perf/x86/pt, coresight: Clean up address filter structure
  perf vendor events s390: Add JSON files for IBM z14
  perf vendor events s390: Add JSON files for IBM z13
  perf vendor events s390: Add JSON files for IBM zEC12 zBC12
  perf vendor events s390: Add JSON files for IBM z196
  perf vendor events s390: Add JSON files for IBM z10EC z10BC
  perf mmap: Be consistent when checking for an unmaped ring buffer
  perf mmap: Fix accessing unmapped mmap in perf_mmap__read_done()
  perf build: Fix check-headers.sh opts assignment
  perf/x86: Update rdpmc_always_available static key to the modern API
  perf annotate: Use absolute addresses to calculate jump target offsets
  perf annotate: Defer searching for comma in raw line till it is needed
  perf annotate: Support jumping from one function to another
  perf annotate: Add "_local" to jump/offset validation routines
  perf python: Reference Py_None before returning it
  perf annotate: Mark jumps to outher functions with the call arrow
  perf annotate: Pass function descriptor to its instruction parsing routines
  perf annotate: No need to calculate notes->start twice
  ...
2018-04-02 11:06:34 -07:00
Linus Torvalds
701f3b3149 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in the locking subsystem in this cycle were:

   - Add the Linux Kernel Memory Consistency Model (LKMM) subsystem,
     which is an an array of tools in tools/memory-model/ that formally
     describe the Linux memory coherency model (a.k.a.
     Documentation/memory-barriers.txt), and also produce 'litmus tests'
     in form of kernel code which can be directly executed and tested.

     Here's a high level background article about an earlier version of
     this work on LWN.net:

        https://lwn.net/Articles/718628/

     The design principles:

      "There is reason to believe that Documentation/memory-barriers.txt
       could use some help, and a major purpose of this patch is to
       provide that help in the form of a design-time tool that can
       produce all valid executions of a small fragment of concurrent
       Linux-kernel code, which is called a "litmus test". This tool's
       functionality is roughly similar to a full state-space search.
       Please note that this is a design-time tool, not useful for
       regression testing. However, we hope that the underlying
       Linux-kernel memory model will be incorporated into other tools
       capable of analyzing large bodies of code for regression-testing
       purposes."

     [...]

      "A second tool is klitmus7, which converts litmus tests to
       loadable kernel modules for direct testing. As with herd7, the
       klitmus7 code is freely available from

         http://diy.inria.fr/sources/index.html

       (and via "git" at https://github.com/herd/herdtools7)"

     [...]

     Credits go to:

      "This patch was the result of a most excellent collaboration
       founded by Jade Alglave and also including Alan Stern, Andrea
       Parri, and Luc Maranget."

     ... and to the gents listed in the MAINTAINERS entry:

        LINUX KERNEL MEMORY CONSISTENCY MODEL (LKMM)
        M:      Alan Stern <stern@rowland.harvard.edu>
        M:      Andrea Parri <parri.andrea@gmail.com>
        M:      Will Deacon <will.deacon@arm.com>
        M:      Peter Zijlstra <peterz@infradead.org>
        M:      Boqun Feng <boqun.feng@gmail.com>
        M:      Nicholas Piggin <npiggin@gmail.com>
        M:      David Howells <dhowells@redhat.com>
        M:      Jade Alglave <j.alglave@ucl.ac.uk>
        M:      Luc Maranget <luc.maranget@inria.fr>
        M:      "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

     The LKMM project already found several bugs in Linux locking
     primitives and improved the understanding and the documentation of
     the Linux memory model all around.

   - Add KASAN instrumentation to atomic APIs (Dmitry Vyukov)

   - Add RWSEM API debugging and reorganize the lock debugging Kconfig
     (Waiman Long)

   - ... misc cleanups and other smaller changes"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
  locking/Kconfig: Restructure the lock debugging menu
  locking/Kconfig: Add LOCK_DEBUGGING_SUPPORT to make it more readable
  locking/rwsem: Add DEBUG_RWSEMS to look for lock/unlock mismatches
  lockdep: Make the lock debug output more useful
  locking/rtmutex: Handle non enqueued waiters gracefully in remove_waiter()
  locking/atomic, asm-generic, x86: Add comments for atomic instrumentation
  locking/atomic, asm-generic: Add KASAN instrumentation to atomic operations
  locking/atomic/x86: Switch atomic.h to use atomic-instrumented.h
  locking/atomic, asm-generic: Add asm-generic/atomic-instrumented.h
  locking/xchg/alpha: Remove superfluous memory barriers from the _local() variants
  tools/memory-model: Finish the removal of rb-dep, smp_read_barrier_depends(), and lockless_dereference()
  tools/memory-model: Add documentation of new litmus test
  tools/memory-model: Remove mention of docker/gentoo image
  locking/memory-barriers: De-emphasize smp_read_barrier_depends() some more
  locking/lockdep: Show unadorned pointers
  mutex: Drop linkage.h from mutex.h
  tools/memory-model: Remove rb-dep, smp_read_barrier_depends, and lockless_dereference
  tools/memory-model: Convert underscores to hyphens
  tools/memory-model: Add a S lock-based external-view litmus test
  tools/memory-model: Add required herd7 version to README file
  ...
2018-04-02 10:27:16 -07:00
Linus Torvalds
ad0500ca87 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Two UV platform fixes, and a kbuild fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/platform/UV: Fix critical UV MMR address error
  x86/platform/uv/BAU: Add APIC idt entry
  x86/purgatory: Avoid creating stray .<pid>.d files, remove -MD from KBUILD_CFLAGS
2018-03-31 07:50:30 -10:00
Linus Torvalds
93e04d4ad7 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 PTI fixes from Ingo Molnar:
 "Two fixes: a relatively simple objtool fix that makes Clang built
  kernels work with ORC debug info, plus an alternatives macro fix"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/alternatives: Fixup alternative_call_2
  objtool: Add Clang support
2018-03-31 07:26:48 -10:00
Ingo Molnar
169310f71f Merge branch 'linus' into locking/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-31 07:30:17 +02:00
Ingo Molnar
2d074918fb Merge branch 'perf/urgent' into perf/core
Conflicts:
	kernel/events/hw_breakpoint.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-29 16:03:48 +02:00
mike.travis@hpe.com
bd47a85acd x86/platform/UV: Fix critical UV MMR address error
A critical error was found testing the fixed UV4 HUB in that an MMR address
was found to be incorrect.  This causes the virtual address space for
accessing the MMIOH1 region to be allocated with the incorrect size.

Fixes: 673aa20c55 ("x86/platform/UV: Update uv_mmrs.h to prepare for UV4A fixes")
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Andrew Banman <andrew.banman@hpe.com>
Link: https://lkml.kernel.org/r/20180328174011.041801248@stormcage.americas.sgi.com
2018-03-28 20:19:45 +02:00
Andrew Banman
151ad17fbe x86/platform/uv/BAU: Add APIC idt entry
BAU uses the old alloc_initr_gate90 method to setup its interrupt. This
fails silently as the BAU vector is in the range of APIC vectors that are
registered to the spurious interrupt handler. As a consequence BAU
broadcasts are not handled, and the broadcast source CPU hangs.

Update BAU to use new idt structure.

Fixes: dc20b2d526 ("x86/idt: Move interrupt gate initialization to IDT code")
Signed-off-by: Andrew Banman <abanman@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <mike.travis@hpe.com>
Cc: Dimitri Sivanich <sivanich@hpe.com>
Cc: Russ Anderson <rja@hpe.com>
Cc: stable@vger.kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1522188546-196177-1-git-send-email-abanman@hpe.com
2018-03-28 10:40:55 +02:00
Alexey Dobriyan
bd6271039e x86/alternatives: Fixup alternative_call_2
The following pattern fails to compile while the same pattern
with alternative_call() does:

	if (...)
		alternative_call_2(...);
	else
		alternative_call_2(...);

as it expands into

	if (...)
	{
	};	<===
	else
	{
	};

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20180114120504.GA11368@avx2
2018-03-27 09:47:53 +02:00
Davidlohr Bueso
631fe154ed perf/x86: Update rdpmc_always_available static key to the modern API
No changes in refcount semantics -- use DEFINE_STATIC_KEY_FALSE()
for initialization and replace:

  static_key_slow_inc|dec()   =>   static_branch_inc|dec()
  static_key_false()          =>   static_branch_unlikely()

Added a '_key' suffix to rdpmc_always_available, for better self-documentation.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Link: http://lkml.kernel.org/r/20180326210929.5244-5-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-27 07:53:00 +02:00
Linus Torvalds
d2862360bf Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 and PTI fixes from Ingo Molnar:
 "Misc fixes:

   - fix EFI pagetables freeing

   - fix vsyscall pagetable setting on Xen PV guests

   - remove ancient CONFIG_X86_PPRO_FENCE=y - x86 is TSO again

   - fix two binutils (ld) development version related incompatibilities

   - clean up breakpoint handling

   - fix an x86 self-test"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry/64: Don't use IST entry for #BP stack
  x86/efi: Free efi_pgd with free_pages()
  x86/vsyscall/64: Use proper accessor to update P4D entry
  x86/cpu: Remove the CONFIG_X86_PPRO_FENCE=y quirk
  x86/boot/64: Verify alignment of the LOAD segment
  x86/build/64: Force the linker to use 2MB page size
  selftests/x86/ptrace_syscall: Fix for yet more glibc interference
2018-03-25 07:36:02 -10:00
Linus Torvalds
32d43cd391 kvm/x86: fix icebp instruction handling
The undocumented 'icebp' instruction (aka 'int1') works pretty much like
'int3' in the absense of in-circuit probing equipment (except,
obviously, that it raises #DB instead of raising #BP), and is used by
some validation test-suites as such.

But Andy Lutomirski noticed that his test suite acted differently in kvm
than on bare hardware.

The reason is that kvm used an inexact test for the icebp instruction:
it just assumed that an all-zero VM exit qualification value meant that
the VM exit was due to icebp.

That is not unlike the guess that do_debug() does for the actual
exception handling case, but it's purely a heuristic, not an absolute
rule.  do_debug() does it because it wants to ascribe _some_ reasons to
the #DB that happened, and an empty %dr6 value means that 'icebp' is the
most likely casue and we have no better information.

But kvm can just do it right, because unlike the do_debug() case, kvm
actually sees the real reason for the #DB in the VM-exit interruption
information field.

So instead of relying on an inexact heuristic, just use the actual VM
exit information that says "it was 'icebp'".

Right now the 'icebp' instruction isn't technically documented by Intel,
but that will hopefully change.  The special "privileged software
exception" information _is_ actually mentioned in the Intel SDM, even
though the cause of it isn't enumerated.

Reported-by: Andy Lutomirski <luto@kernel.org>
Tested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-03-20 14:58:34 -07:00
Christoph Hellwig
5927145efd x86/cpu: Remove the CONFIG_X86_PPRO_FENCE=y quirk
There were only a few Pentium Pro multiprocessors systems where this
errata applied. They are more than 20 years old now, and we've slowly
dropped places which put the workarounds in and discouraged anyone
from enabling the workaround.

Get rid of it for good.

Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jon Mason <jdmason@kudzu.us>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Muli Ben-Yehuda <mulix@mulix.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: iommu@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20180319103826.12853-2-hch@lst.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20 10:01:05 +01:00
Linus Torvalds
9e1909b9da Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/pti updates from Thomas Gleixner:
 "Another set of melted spectrum updates:

   - Iron out the last late microcode loading issues by actually
     checking whether new microcode is present and preventing the CPU
     synchronization to run into a timeout induced hang.

   - Remove Skylake C2 from the microcode blacklist according to the
     latest Intel documentation

   - Fix the VM86 POPF emulation which traps if VIP is set, but VIF is
     not. Enhance the selftests to catch that kind of issue

   - Annotate indirect calls/jumps for objtool on 32bit. This is not a
     functional issue, but for consistency sake its the right thing to
     do.

   - Fix a jump label build warning observed on SPARC64 which uses 32bit
     storage for the code location which is casted to 64 bit pointer w/o
     extending it to 64bit first.

   - Add two new cpufeature bits. Not really an urgent issue, but
     provides them for both x86 and x86/kvm work. No impact on the
     current kernel"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode: Fix CPU synchronization routine
  x86/microcode: Attempt late loading only when new microcode is present
  x86/speculation: Remove Skylake C2 from Speculation Control microcode blacklist
  jump_label: Fix sparc64 warning
  x86/speculation, objtool: Annotate indirect calls/jumps for objtool on 32-bit kernels
  x86/vm86/32: Fix POPF emulation
  selftests/x86/entry_from_vm86: Add test cases for POPF
  selftests/x86/entry_from_vm86: Exit with 1 if we fail
  x86/cpufeatures: Add Intel PCONFIG cpufeature
  x86/cpufeatures: Add Intel Total Memory Encryption cpufeature
2018-03-18 12:03:15 -07:00
Borislav Petkov
2613f36ed9 x86/microcode: Attempt late loading only when new microcode is present
Return UCODE_NEW from the scanning functions to denote that new microcode
was found and only then attempt the expensive synchronization dance.

Reported-by: Emanuel Czirai <xftroxgpx@protonmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Emanuel Czirai <xftroxgpx@protonmail.com>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20180314183615.17629-1-bp@alien8.de
2018-03-16 20:55:51 +01:00
Andy Whitcroft
a14bff1311 x86/speculation, objtool: Annotate indirect calls/jumps for objtool on 32-bit kernels
In the following commit:

  9e0e3c5130 ("x86/speculation, objtool: Annotate indirect calls/jumps for objtool")

... we added annotations for CALL_NOSPEC/JMP_NOSPEC on 64-bit x86 kernels,
but we did not annotate the 32-bit path.

Annotate it similarly.

Signed-off-by: Andy Whitcroft <apw@canonical.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180314112427.22351-1-apw@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-14 13:24:31 +01:00
Dmitry Vyukov
ac605bee0b locking/atomic, asm-generic, x86: Add comments for atomic instrumentation
The comments are factored out from the code changes to make them
easier to read. Add them separately to explain some non-obvious
aspects.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: kasan-dev@googlegroups.com
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/cc595efc644bb905407012d82d3eb8bac3368e7a.1517246437.git.dvyukov@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12 12:15:35 +01:00
Dmitry Vyukov
8bf705d130 locking/atomic/x86: Switch atomic.h to use atomic-instrumented.h
Add arch_ prefix to all atomic operations and include
<asm-generic/atomic-instrumented.h>. This will allow
to add KASAN instrumentation to all atomic ops.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: kasan-dev@googlegroups.com
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/54f0eb64260b84199e538652e079a89b5423ad41.1517246437.git.dvyukov@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12 12:15:35 +01:00
Kirill A. Shutemov
7958b2246f x86/cpufeatures: Add Intel PCONFIG cpufeature
CPUID.0x7.0x0:EDX[18] indicates whether Intel CPU support PCONFIG instruction.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kai Huang <kai.huang@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180305162610.37510-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12 12:09:53 +01:00
Kirill A. Shutemov
1da961d72a x86/cpufeatures: Add Intel Total Memory Encryption cpufeature
CPUID.0x7.0x0:ECX[13] indicates whether CPU supports Intel Total Memory
Encryption.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kai Huang <kai.huang@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180305162610.37510-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12 12:09:53 +01:00
Linus Torvalds
ed58d66f60 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/pti updates from Thomas Gleixner:
 "Yet another pile of melted spectrum related updates:

   - Drop native vsyscall support finally as it causes more trouble than
     benefit.

   - Make microcode loading more robust. There were a few issues
     especially related to late loading which are now surfacing because
     late loading of the IB* microcodes addressing spectre issues has
     become more widely used.

   - Simplify and robustify the syscall handling in the entry code

   - Prevent kprobes on the entry trampoline code which lead to kernel
     crashes when the probe hits before CR3 is updated

   - Don't check microcode versions when running on hypervisors as they
     are considered as lying anyway.

   - Fix the 32bit objtool build and a coment typo"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kprobes: Fix kernel crash when probing .entry_trampoline code
  x86/pti: Fix a comment typo
  x86/microcode: Synchronize late microcode loading
  x86/microcode: Request microcode on the BSP
  x86/microcode/intel: Look into the patch cache first
  x86/microcode: Do not upload microcode if CPUs are offline
  x86/microcode/intel: Writeback and invalidate caches before updating microcode
  x86/microcode/intel: Check microcode revision before updating sibling threads
  x86/microcode: Get rid of struct apply_microcode_ctx
  x86/spectre_v2: Don't check microcode versions when running under hypervisors
  x86/vsyscall/64: Drop "native" vsyscalls
  x86/entry/64/compat: Save one instruction in entry_INT80_compat()
  x86/entry: Do not special-case clone(2) in compat entry
  x86/syscalls: Use COMPAT_SYSCALL_DEFINEx() macros for x86-only compat syscalls
  x86/syscalls: Use proper syscall definition for sys_ioperm()
  x86/entry: Remove stale syscall prototype
  x86/syscalls/32: Simplify $entry == $compat entries
  objtool: Fix 32-bit build
2018-03-11 14:59:23 -07:00
Francis Deslauriers
c07a8f8b08 x86/kprobes: Fix kernel crash when probing .entry_trampoline code
Disable the kprobe probing of the entry trampoline:

.entry_trampoline is a code area that is used to ensure page table
isolation between userspace and kernelspace.

At the beginning of the execution of the trampoline, we load the
kernel's CR3 register. This has the effect of enabling the translation
of the kernel virtual addresses to physical addresses. Before this
happens most kernel addresses can not be translated because the running
process' CR3 is still used.

If a kprobe is placed on the trampoline code before that change of the
CR3 register happens the kernel crashes because int3 handling pages are
not accessible.

To fix this, add the .entry_trampoline section to the kprobe blacklist
to prohibit the probing of code before all the kernel pages are
accessible.

Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: mathieu.desnoyers@efficios.com
Cc: mhiramat@kernel.org
Link: http://lkml.kernel.org/r/1520565492-4637-2-git-send-email-francis.deslauriers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09 09:58:36 +01:00
Borislav Petkov
24193c5de4 x86/MCE: Cleanup and complete struct mce fields definitions
The struct is part of the uapi, document that fact and all fields properly
and fix formatting.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20180306142143.19990-3-bp@alien8.de
2018-03-08 15:52:59 +01:00
Thomas Gleixner
422caa5f7a Merge branch 'ras/urgent' into ras/core
Pick up urgent fixes to apply further development changes.
2018-03-08 15:52:08 +01:00
Tony Luck
fa94d0c6e0 x86/MCE: Save microcode revision in machine check records
Updating microcode used to be relatively rare. Now that it has become
more common we should save the microcode version in a machine check
record to make sure that those people looking at the error have this
important information bundled with the rest of the logged information.

[ Borislav: Simplify a bit. ]

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20180301233449.24311-1-tony.luck@intel.com
2018-03-08 15:34:49 +01:00
Andy Lutomirski
076ca272a1 x86/vsyscall/64: Drop "native" vsyscalls
Since Linux v3.2, vsyscalls have been deprecated and slow.  From v3.2
on, Linux had three vsyscall modes: "native", "emulate", and "none".

"emulate" is the default.  All known user programs work correctly in
emulate mode, but vsyscalls turn into page faults and are emulated.
This is very slow.  In "native" mode, the vsyscall page is easily
usable as an exploit gadget, but vsyscalls are a bit faster -- they
turn into normal syscalls.  (This is in contrast to vDSO functions,
which can be much faster than syscalls.)  In "none" mode, there are
no vsyscalls.

For all practical purposes, "native" was really just a chicken bit
in case something went wrong with the emulation.  It's been over six
years, and nothing has gone wrong.  Delete it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Kernel Hardening <kernel-hardening@lists.openwall.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/519fee5268faea09ae550776ce969fa6e88668b0.1520449896.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-08 06:48:15 +01:00
Dominik Brodowski
af52201d99 x86/entry: Do not special-case clone(2) in compat entry
With the CPU renaming registers on its own, and all the overhead of the
syscall entry/exit, it is doubtful whether the compiled output of

	mov	%r8, %rax
	mov	%rcx, %r8
	mov	%rax, %rcx
	jmpq	sys_clone

is measurably slower than the hand-crafted version of

	xchg	%r8, %rcx

So get rid of this special case.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: luto@amacapital.net
Cc: viro@zeniv.linux.org.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-07 07:57:31 +01:00
Dominik Brodowski
4ddb45db30 x86/syscalls: Use COMPAT_SYSCALL_DEFINEx() macros for x86-only compat syscalls
While at it, convert declarations of type "unsigned" to "unsigned int".

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: luto@amacapital.net
Cc: viro@zeniv.linux.org.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-07 07:57:30 +01:00
Dominik Brodowski
a41e2ab08e x86/entry: Remove stale syscall prototype
sys32_vm86_warning() is long gone.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: luto@amacapital.net
Cc: viro@zeniv.linux.org.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-07 07:57:30 +01:00
Linus Torvalds
e64b9562ba Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A small set of fixes for x86:

   - Add missing instruction suffixes to assembly code so it can be
     compiled by newer GAS versions without warnings.

   - Switch refcount WARN exceptions to UD2 as we did in general

   - Make the reboot on Intel Edison platforms work

   - A small documentation update so text and sample command match"

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation, x86, resctrl: Make text and sample command match
  x86/platform/intel-mid: Handle Intel Edison reboot correctly
  x86/asm: Add instruction suffixes to bitops
  x86/entry/64: Add instruction suffix
  x86/refcounts: Switch to UD2 for exceptions
2018-03-04 12:12:48 -08:00
Linus Torvalds
7225a44278 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/pti fixes from Thomas Gleixner:
 "Three fixes related to melted spectrum:

   - Sync the cpu_entry_area page table to initial_page_table on 32 bit.

     Otherwise suspend/resume fails because resume uses
     initial_page_table and triggers a triple fault when accessing the
     cpu entry area.

   - Zero the SPEC_CTL MRS on XEN before suspend to address a
     shortcoming in the hypervisor.

   - Fix another switch table detection issue in objtool"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu_entry_area: Sync cpu_entry_area to initial_page_table
  objtool: Fix another switch table detection issue
  x86/xen: Zero MSR_IA32_SPEC_CTRL before suspend
2018-03-04 11:40:16 -08:00
Linus Torvalds
03a6c2592f KVM fixes for v4.16-rc4
x86:
 - fix NULL dereference when using userspace lapic
 - optimize spectre v1 mitigations by allowing guests to use LFENCE
 - make microcode revision configurable to prevent guests from
   unnecessarily blacklisting spectre v2 mitigation features
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJambvzAAoJEED/6hsPKofo9HwH/2il8xNSLIYf9pJtxZo/puyQ
 ZSwByGdeLKBZ1GP1dhdZ8kMk3eBoci0a/sQJmhDiEG6GDf1Mrgri/xj3p60sWwXT
 iReG+ZhBKg4QMj/IgOJQrh+53JT73QQP14wIhzc/DSi0Fo0ziqDA/lINxqMKc7oF
 b5qratjsb4xF1db4d1g8Ii1VRk64UoBEVpEoP37OOyAu1rgXgDr+9C832KkP0rb+
 pVYT8hLFiaYiwVN+WN52/NrIkqBlMvMp3ouRtMAajCQ9OznnraDJE6eTkPGDSDBM
 RizSuQbev5R7pcmzCAP9/XKfTbeSUZQei2ZFXQXAOvIXMrQd/ITjbPvnObsceSE=
 =wtQd
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Radim Krčmář:
 "x86:

   - fix NULL dereference when using userspace lapic

   - optimize spectre v1 mitigations by allowing guests to use LFENCE

   - make microcode revision configurable to prevent guests from
     unnecessarily blacklisting spectre v2 mitigation feature"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: fix vcpu initialization with userspace lapic
  KVM: X86: Allow userspace to define the microcode version
  KVM: X86: Introduce kvm_get_msr_feature()
  KVM: SVM: Add MSR-based feature support for serializing LFENCE
  KVM: x86: Add a framework for supporting MSR-based features
2018-03-02 19:40:43 -08:00
Wanpeng Li
518e7b9481 KVM: X86: Allow userspace to define the microcode version
Linux (among the others) has checks to make sure that certain features
aren't enabled on a certain family/model/stepping if the microcode version
isn't greater than or equal to a known good version.

By exposing the real microcode version, we're preventing buggy guests that
don't check that they are running virtualized (i.e., they should trust the
hypervisor) from disabling features that are effectively not buggy.

Suggested-by: Filippo Sironi <sironi@amazon.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-01 22:32:44 +01:00
Tom Lendacky
801e459a6f KVM: x86: Add a framework for supporting MSR-based features
Provide a new KVM capability that allows bits within MSRs to be recognized
as features.  Two new ioctls are added to the /dev/kvm ioctl routine to
retrieve the list of these MSRs and then retrieve their values. A kvm_x86_ops
callback is used to determine support for the listed MSR-based features.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[Tweaked documentation. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-01 19:00:28 +01:00
Thomas Gleixner
945fd17ab6 x86/cpu_entry_area: Sync cpu_entry_area to initial_page_table
The separation of the cpu_entry_area from the fixmap missed the fact that
on 32bit non-PAE kernels the cpu_entry_area mapping might not be covered in
initial_page_table by the previous synchronizations.

This results in suspend/resume failures because 32bit utilizes initial page
table for resume. The absence of the cpu_entry_area mapping results in a
triple fault, aka. insta reboot.

With PAE enabled this works by chance because the PGD entry which covers
the fixmap and other parts incindentally provides the cpu_entry_area
mapping as well.

Synchronize the initial page table after setting up the cpu entry
area. Instead of adding yet another copy of the same code, move it to a
function and invoke it from the various places.

It needs to be investigated if the existing calls in setup_arch() and
setup_per_cpu_areas() can be replaced by the later invocation from
setup_cpu_entry_areas(), but that's beyond the scope of this fix.

Fixes: 92a0f81d89 ("x86/cpu_entry_area: Move it out of the fixmap")
Reported-by: Woody Suwalski <terraluna977@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Woody Suwalski <terraluna977@gmail.com>
Cc: William Grant <william.grant@canonical.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1802282137290.1392@nanos.tec.linutronix.de
2018-03-01 09:48:27 +01:00
Jan Beulich
22636f8c95 x86/asm: Add instruction suffixes to bitops
Omitting suffixes from instructions in AT&T mode is bad practice when
operand size cannot be determined by the assembler from register
operands, and is likely going to be warned about by upstream gas in the
future (mine does already). Add the missing suffixes here. Note that for
64-bit this means some operations change from being 32-bit to 64-bit.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/5A93F98702000078001ABACC@prv-mh.provo.novell.com
2018-02-28 15:18:41 +01:00
Kees Cook
cb097be703 x86/refcounts: Switch to UD2 for exceptions
As done in commit 3b3a371cc9 ("x86/debug: Use UD2 for WARN()"), this
switches to UD2 from UD0 to keep disassembly readable.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20180225165056.GA11719@beast
2018-02-28 15:18:40 +01:00
Linus Torvalds
85a2d939c0 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "Yet another pile of melted spectrum related changes:

   - sanitize the array_index_nospec protection mechanism: Remove the
     overengineered array_index_nospec_mask_check() magic and allow
     const-qualified types as index to avoid temporary storage in a
     non-const local variable.

   - make the microcode loader more robust by properly propagating error
     codes. Provide information about new feature bits after micro code
     was updated so administrators can act upon.

   - optimizations of the entry ASM code which reduce code footprint and
     make the code simpler and faster.

   - fix the {pmd,pud}_{set,clear}_flags() implementations to work
     properly on paravirt kernels by removing the address translation
     operations.

   - revert the harmful vmexit_fill_RSB() optimization

   - use IBRS around firmware calls

   - teach objtool about retpolines and add annotations for indirect
     jumps and calls.

   - explicitly disable jumplabel patching in __init code and handle
     patching failures properly instead of silently ignoring them.

   - remove indirect paravirt calls for writing the speculation control
     MSR as these calls are obviously proving the same attack vector
     which is tried to be mitigated.

   - a few small fixes which address build issues with recent compiler
     and assembler versions"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
  KVM/VMX: Optimize vmx_vcpu_run() and svm_vcpu_run() by marking the RDMSR path as unlikely()
  KVM/x86: Remove indirect MSR op calls from SPEC_CTRL
  objtool, retpolines: Integrate objtool with retpoline support more closely
  x86/entry/64: Simplify ENCODE_FRAME_POINTER
  extable: Make init_kernel_text() global
  jump_label: Warn on failed jump_label patching attempt
  jump_label: Explicitly disable jump labels in __init code
  x86/entry/64: Open-code switch_to_thread_stack()
  x86/entry/64: Move ASM_CLAC to interrupt_entry()
  x86/entry/64: Remove 'interrupt' macro
  x86/entry/64: Move the switch_to_thread_stack() call to interrupt_entry()
  x86/entry/64: Move ENTER_IRQ_STACK from interrupt macro to interrupt_entry
  x86/entry/64: Move PUSH_AND_CLEAR_REGS from interrupt macro to helper function
  x86/speculation: Move firmware_restrict_branch_speculation_*() from C to CPP
  objtool: Add module specific retpoline rules
  objtool: Add retpoline validation
  objtool: Use existing global variables for options
  x86/mm/sme, objtool: Annotate indirect call in sme_encrypt_execute()
  x86/boot, objtool: Annotate indirect jump in secondary_startup_64()
  x86/paravirt, objtool: Annotate indirect calls
  ...
2018-02-26 09:34:21 -08:00
Linus Torvalds
d4858aaf6b s390:
- optimization for the exitless interrupt support that was merged in 4.16-rc1
 - improve the branch prediction blocking for nested KVM
 - replace some jump tables with switch statements to improve expoline performance
 - fixes for multiple epoch facility
 
 ARM:
 - fix the interaction of userspace irqchip VMs with in-kernel irqchip VMs
 - make sure we can build 32-bit KVM/ARM with gcc-8.
 
 x86:
 - fixes for AMD SEV
 - fixes for Intel nested VMX, emulated UMIP and a dump_stack() on VM startup
 - fixes for async page fault migration
 - small optimization to PV TLB flush (new in 4.16-rc1)
 - syzkaller fixes
 
 Generic:
 - compiler warning fixes
 - syzkaller fixes
 - more improvements to the kvm_stat tool
 
 Two more small Spectre fixes are going to reach you via Ingo.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEbBAABAgAGBQJakL/fAAoJEL/70l94x66Dzp4H9j6qMzgOTAQ0bYmupQp81tad
 V8lNabVSNi0UBYwk2D44oNigtNjQckE18KGnjuJ4tZW+GZ+D7zrrHrKXWtATXgxP
 SIfHj+raSd/lgJoy6HLu/N0oT6wS+PdZMYFgSu600Vi618lGKGX1SIAwBhjoxdMX
 7QKKAuPcDZ1qgGddhWaLnof28nQQEWcCAVfFeVojmM0TyhvSbgSysh/Gq10ydybh
 NVUfgP3fzLtT9gVngX/ZtbogNkltPYmucpI+wT3nWfsgBic783klfWrfpnC/GM85
 OeXLVhHwVLG6tXUGhb4ULO+F9HwRGX31+er6iIxmwH9PvqnQMRcQ0Xxf2gbNXg==
 =YmH6
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "s390:
   - optimization for the exitless interrupt support that was merged in 4.16-rc1
   - improve the branch prediction blocking for nested KVM
   - replace some jump tables with switch statements to improve expoline performance
   - fixes for multiple epoch facility

  ARM:
   - fix the interaction of userspace irqchip VMs with in-kernel irqchip VMs
   - make sure we can build 32-bit KVM/ARM with gcc-8.

  x86:
   - fixes for AMD SEV
   - fixes for Intel nested VMX, emulated UMIP and a dump_stack() on VM startup
   - fixes for async page fault migration
   - small optimization to PV TLB flush (new in 4.16-rc1)
   - syzkaller fixes

  Generic:
   - compiler warning fixes
   - syzkaller fixes
   - more improvements to the kvm_stat tool

  Two more small Spectre fixes are going to reach you via Ingo"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (40 commits)
  KVM: SVM: Fix SEV LAUNCH_SECRET command
  KVM: SVM: install RSM intercept
  KVM: SVM: no need to call access_ok() in LAUNCH_MEASURE command
  include: psp-sev: Capitalize invalid length enum
  crypto: ccp: Fix sparse, use plain integer as NULL pointer
  KVM: X86: Avoid traversing all the cpus for pv tlb flush when steal time is disabled
  x86/kvm: Make parse_no_xxx __init for kvm
  KVM: x86: fix backward migration with async_PF
  kvm: fix warning for non-x86 builds
  kvm: fix warning for CONFIG_HAVE_KVM_EVENTFD builds
  tools/kvm_stat: print 'Total' line for multiple events only
  tools/kvm_stat: group child events indented after parent
  tools/kvm_stat: separate drilldown and fields filtering
  tools/kvm_stat: eliminate extra guest/pid selection dialog
  tools/kvm_stat: mark private methods as such
  tools/kvm_stat: fix debugfs handling
  tools/kvm_stat: print error on invalid regex
  tools/kvm_stat: fix crash when filtering out all non-child trace events
  tools/kvm_stat: avoid 'is' for equality checks
  tools/kvm_stat: use a more pythonic way to iterate over dictionaries
  ...
2018-02-26 09:28:35 -08:00
Linus Torvalds
c23a757591 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A small set of fixes:

   - UAPI data type correction for hyperv

   - correct the cpu cores field in /proc/cpuinfo on CPU hotplug

   - return proper error code in the resctrl file system failure path to
     avoid silent subsequent failures

   - correct a subtle accounting issue in the new vector allocation code
     which went unnoticed for a while and caused suspend/resume
     failures"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across CPU hotplug operations
  x86/topology: Fix function name in documentation
  x86/intel_rdt: Fix incorrect returned value when creating rdgroup sub-directory in resctrl file system
  x86/apic/vector: Handle vector release on CPU unplug correctly
  genirq/matrix: Handle CPU offlining proper
  x86/headers/UAPI: Use __u64 instead of u64 in <uapi/asm/hyperv.h>
2018-02-25 16:58:55 -08:00
Radim Krčmář
fe2a3027e7 KVM: x86: fix backward migration with async_PF
Guests on new hypersiors might set KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT
bit when enabling async_PF, but this bit is reserved on old hypervisors,
which results in a failure upon migration.

To avoid breaking different cases, we are checking for CPUID feature bit
before enabling the feature and nothing else.

Fixes: 52a5c155cf ("KVM: async_pf: Let guest support delivery of async_pf from guest mode")
Cc: <stable@vger.kernel.org>
Reviewed-by: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-02-24 01:43:48 +01:00
Sebastian Ott
f75e4924f0 kvm: fix warning for non-x86 builds
Fix the following sparse warning by moving the prototype
of kvm_arch_mmu_notifier_invalidate_range() to linux/kvm_host.h .

  CHECK   arch/s390/kvm/../../../virt/kvm/kvm_main.c
arch/s390/kvm/../../../virt/kvm/kvm_main.c:138:13: warning: symbol 'kvm_arch_mmu_notifier_invalidate_range' was not declared. Should it be static?

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-02-24 01:43:47 +01:00
Daniel Borkmann
a493a87f38 bpf, x64: implement retpoline for tail call
Implement a retpoline [0] for the BPF tail call JIT'ing that converts
the indirect jump via jmp %rax that is used to make the long jump into
another JITed BPF image. Since this is subject to speculative execution,
we need to control the transient instruction sequence here as well
when CONFIG_RETPOLINE is set, and direct it into a pause + lfence loop.
The latter aligns also with what gcc / clang emits (e.g. [1]).

JIT dump after patch:

  # bpftool p d x i 1
   0: (18) r2 = map[id:1]
   2: (b7) r3 = 0
   3: (85) call bpf_tail_call#12
   4: (b7) r0 = 2
   5: (95) exit

With CONFIG_RETPOLINE:

  # bpftool p d j i 1
  [...]
  33:	cmp    %edx,0x24(%rsi)
  36:	jbe    0x0000000000000072  |*
  38:	mov    0x24(%rbp),%eax
  3e:	cmp    $0x20,%eax
  41:	ja     0x0000000000000072  |
  43:	add    $0x1,%eax
  46:	mov    %eax,0x24(%rbp)
  4c:	mov    0x90(%rsi,%rdx,8),%rax
  54:	test   %rax,%rax
  57:	je     0x0000000000000072  |
  59:	mov    0x28(%rax),%rax
  5d:	add    $0x25,%rax
  61:	callq  0x000000000000006d  |+
  66:	pause                      |
  68:	lfence                     |
  6b:	jmp    0x0000000000000066  |
  6d:	mov    %rax,(%rsp)         |
  71:	retq                       |
  72:	mov    $0x2,%eax
  [...]

  * relative fall-through jumps in error case
  + retpoline for indirect jump

Without CONFIG_RETPOLINE:

  # bpftool p d j i 1
  [...]
  33:	cmp    %edx,0x24(%rsi)
  36:	jbe    0x0000000000000063  |*
  38:	mov    0x24(%rbp),%eax
  3e:	cmp    $0x20,%eax
  41:	ja     0x0000000000000063  |
  43:	add    $0x1,%eax
  46:	mov    %eax,0x24(%rbp)
  4c:	mov    0x90(%rsi,%rdx,8),%rax
  54:	test   %rax,%rax
  57:	je     0x0000000000000063  |
  59:	mov    0x28(%rax),%rax
  5d:	add    $0x25,%rax
  61:	jmpq   *%rax               |-
  63:	mov    $0x2,%eax
  [...]

  * relative fall-through jumps in error case
  - plain indirect jump as before

  [0] https://support.google.com/faqs/answer/7625886
  [1] a31e654fa1

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-02-22 15:31:42 -08:00
Yazen Ghannam
68627a697c x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type
Currently, bank 4 is reserved on Fam17h, so we chose not to initialize
bank 4 in the smca_banks array. This means that when we check if a bank
is initialized, like during boot or resume, we will see that bank 4 is
not initialized and try to initialize it.

This will cause a call trace, when resuming from suspend, due to
rdmsr_*on_cpu() calls in the init path. The rdmsr_*on_cpu() calls issue
an IPI but we're running with interrupts disabled. This triggers:

  WARNING: CPU: 0 PID: 11523 at kernel/smp.c:291 smp_call_function_single+0xdc/0xe0
  ...

Reserved banks will be read-as-zero, so their MCA_IPID register will be
zero. So, like the smca_banks array, the threshold_banks array will not
have an entry for a reserved bank since all its MCA_MISC* registers will
be zero.

Enumerate a "Reserved" bank type that matches on a HWID_MCATYPE of 0,0.

Use the "Reserved" type when checking if a bank is reserved. It's
possible that other bank numbers may be reserved on future systems.

Don't try to find the block address on reserved banks.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.14.x
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180221101900.10326-7-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21 17:00:54 +01:00
Borislav Petkov
a189c03235 x86/mce: Put private structures and definitions into the internal header
... because they don't need to be exported outside of MCE.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180221101900.10326-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21 17:00:53 +01:00
Ingo Molnar
d72f4e29e6 x86/speculation: Move firmware_restrict_branch_speculation_*() from C to CPP
firmware_restrict_branch_speculation_*() recently started using
preempt_enable()/disable(), but those are relatively high level
primitives and cause build failures on some 32-bit builds.

Since we want to keep <asm/nospec-branch.h> low level, convert
them to macros to avoid header hell...

Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: arjan.van.de.ven@intel.com
Cc: bp@alien8.de
Cc: dave.hansen@intel.com
Cc: jmattson@google.com
Cc: karahmed@amazon.de
Cc: kvm@vger.kernel.org
Cc: pbonzini@redhat.com
Cc: rkrcmar@redhat.com
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21 16:54:03 +01:00
Peter Zijlstra
3010a0663f x86/paravirt, objtool: Annotate indirect calls
Paravirt emits indirect calls which get flagged by objtool retpoline
checks, annotate it away because all these indirect calls will be
patched out before we start userspace.

This patching happens through alternative_instructions() ->
apply_paravirt() -> pv_init_ops.patch() which will eventually end up
in paravirt_patch_default(). This function _will_ write direct
alternatives.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21 09:05:03 +01:00
Peter Zijlstra
9e0e3c5130 x86/speculation, objtool: Annotate indirect calls/jumps for objtool
Annotate the indirect calls/jumps in the CALL_NOSPEC/JUMP_NOSPEC
alternatives.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21 09:05:03 +01:00
David Woodhouse
dd84441a79 x86/speculation: Use IBRS if available before calling into firmware
Retpoline means the kernel is safe because it has no indirect branches.
But firmware isn't, so use IBRS for firmware calls if it's available.

Block preemption while IBRS is set, although in practice the call sites
already had to be doing that.

Ignore hpwdt.c for now. It's taking spinlocks and calling into firmware
code, from an NMI handler. I don't want to touch that with a bargepole.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: arjan.van.de.ven@intel.com
Cc: bp@alien8.de
Cc: dave.hansen@intel.com
Cc: jmattson@google.com
Cc: karahmed@amazon.de
Cc: kvm@vger.kernel.org
Cc: pbonzini@redhat.com
Cc: rkrcmar@redhat.com
Link: http://lkml.kernel.org/r/1519037457-7643-2-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-20 09:38:33 +01:00