Commit Graph

3303 Commits

Author SHA1 Message Date
Paolo Bonzini
e951445f4d Merge tag 'kvmarm-fixes-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm fixes for 5.6, take #1

- Fix compilation on 32bit
- Move  VHE guest entry/exit into the VHE-specific entry code
- Make sure all functions called by the non-VHE HYP code is tagged as __always_inline
2020-02-28 11:50:06 +01:00
Mark Rutland
ead5084cdf arm64: add credited/trusted RNG support
Currently arm64 doesn't initialize the primary CRNG in a (potentially)
trusted manner as we only detect the presence of the RNG once secondary
CPUs are up.

Now that the core RNG code distinguishes the early initialization of the
primary CRNG, we can implement arch_get_random_seed_long_early() to
support this.

This patch does so.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200210130015.17664-4-mark.rutland@arm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-02-27 23:21:52 -05:00
Mark Rutland
25b92693a1 arm64: mm: convert cpu_do_switch_mm() to C
There's no reason that cpu_do_switch_mm() needs to be written as an
assembly function, and having it as a C function would make it easier to
maintain.

This patch converts cpu_do_switch_mm() to C, removing code that this
change makes redundant (e.g. the mmid macro). Since the header comment
was stale and the prototype now implies all the necessary information,
this comment is removed. The 'pgd_phys' argument is made a phys_addr_t
to match the return type of virt_to_phys().

At the same time, post_ttbr_update_workaround() is updated to use
IS_ENABLED(), which allows the compiler to figure out it can elide calls
for !CONFIG_CAVIUM_ERRATUM_27456 builds.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
[catalin.marinas@arm.com: change comments from asm-style to C-style]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-02-27 14:30:50 +00:00
Ingo Molnar
e9765680a3 Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/core
Pull EFI updates for v5.7 from Ard Biesheuvel:

This time, the set of changes for the EFI subsystem is much larger than
usual. The main reasons are:

 - Get things cleaned up before EFI support for RISC-V arrives, which will
   increase the size of the validation matrix, and therefore the threshold to
   making drastic changes,

 - After years of defunct maintainership, the GRUB project has finally started
   to consider changes from the distros regarding UEFI boot, some of which are
   highly specific to the way x86 does UEFI secure boot and measured boot,
   based on knowledge of both shim internals and the layout of bootparams and
   the x86 setup header. Having this maintenance burden on other architectures
   (which don't need shim in the first place) is hard to justify, so instead,
   we are introducing a generic Linux/UEFI boot protocol.

Summary of changes:

 - Boot time GDT handling changes (Arvind)

 - Simplify handling of EFI properties table on arm64

 - Generic EFI stub cleanups, to improve command line handling, file I/O,
   memory allocation, etc.

 - Introduce a generic initrd loading method based on calling back into
   the firmware, instead of relying on the x86 EFI handover protocol or
   device tree.

 - Introduce a mixed mode boot method that does not rely on the x86 EFI
   handover protocol either, and could potentially be adopted by other
   architectures (if another one ever surfaces where one execution mode
   is a superset of another)

 - Clean up the contents of struct efi, and move out everything that
   doesn't need to be stored there.

 - Incorporate support for UEFI spec v2.8A changes that permit firmware
   implementations to return EFI_UNSUPPORTED from UEFI runtime services at
   OS runtime, and expose a mask of which ones are supported or unsupported
   via a configuration table.

 - Various documentation updates and minor code cleanups (Heinrich)

 - Partial fix for the lack of by-VA cache maintenance in the decompressor
   on 32-bit ARM. Note that these patches were deliberately put at the
   beginning so they can be used as a stable branch that will be shared with
   a PR containing the complete fix, which I will send to the ARM tree.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-02-26 15:21:22 +01:00
Ard Biesheuvel
184d7e0d7d efi/libstub/arm: Relax FDT alignment requirement
The arm64 kernel no longer requires the FDT blob to fit inside a
naturally aligned 2 MB memory block, so remove the code that aligns
the allocation to 2 MB.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23 21:57:15 +01:00
Ard Biesheuvel
6f05106e20 efi/libstub: Use hidden visibility for all source files
Instead of setting the visibility pragma for a small set of symbol
declarations that could result in absolute references that we cannot
support in the stub, declare hidden visibility for all code in the
EFI stub, which is more robust and future proof.

To ensure that the #pragma is taken into account before any other
includes are processed, put it in a header file of its own and
include it via the compiler command line using the -include option.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23 21:57:12 +01:00
James Morse
e43f1331e2 arm64: Ask the compiler to __always_inline functions used by KVM at HYP
KVM uses some of the static-inline helpers like icache_is_vipt() from
its HYP code. This assumes the function is inlined so that the code is
mapped to EL2. The compiler may decide not to inline these, and the
out-of-line version may not be in the __hyp_text section.

Add the additional __always_ hint to these static-inlines that are used
by KVM.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200220165839.256881-4-james.morse@arm.com
2020-02-22 11:01:47 +00:00
James Morse
8c2d146ee7 KVM: arm64: Define our own swab32() to avoid a uapi static inline
KVM uses swab32() when mediating GIC MMIO accesses if the GICV is badly
aligned, and the host and guest differ in endianness.

arm64 doesn't provide a __arch_swab32(), so __fswab32() is always backed
by the macro implementation that the compiler reduces to a single
instruction. But the static-inline causes problems for KVM if the compiler
chooses not to inline this function, it may not be located in the
__hyp_text where __vgic_v2_perform_cpuif_access() needs it.

Create our own __kvm_swab32() macro that calls ___constant_swab32()
directly. This way we know it will always be inlined.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200220165839.256881-3-james.morse@arm.com
2020-02-22 11:01:47 +00:00
James Morse
5c37f1ae1c KVM: arm64: Ask the compiler to __always_inline functions used at HYP
On non VHE CPUs, KVM's __hyp_text contains code run at EL2 while the rest
of the kernel runs at EL1. This code lives in its own section with start
and end markers so we can map it to EL2.

The compiler may decide not to inline static-inline functions from the
header file. It may also decide not to put these out-of-line functions
in the same section, meaning they aren't mapped when called at EL2.

Clang-9 does exactly this with __kern_hyp_va() and a few others when
x18 is reserved for the shadow call stack. Add the additional __always_
hint to all the static-inlines that are called from a hyp file.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200220165839.256881-2-james.morse@arm.com

----
kvm_get_hyp_vector() pulls in all the regular per-cpu accessors
and this_cpu_has_cap(), fortunately its only called for VHE.
2020-02-22 11:01:47 +00:00
Frederic Weisbecker
320a4fc2d1 arm64: Remove TIF_NOHZ
The syscall slow path is spuriously invoked when context tracking is
activated while the entry code calls context tracking from fast path.

Remove that overhead and the unused flag itself while at it.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
2020-02-20 16:07:19 +01:00
Will Deacon
d0022c0ef2 arm64: memory: Add missing brackets to untagged_addr() macro
Add brackets around the evaluation of the 'addr' parameter to the
untagged_addr() macro so that the cast to 'u64' applies to the result
of the expression.

Cc: <stable@vger.kernel.org>
Fixes: 597399d0cb ("arm64: tags: Preserve tags for addresses translated via TTBR1")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-02-19 13:41:33 +00:00
Vincenzo Frascino
dd1f6308b2 arm64: lse: Fix LSE atomics with LLVM
Commit e0d5896bd3 ("arm64: lse: fix LSE atomics with LLVM's integrated
assembler") broke the build when clang is used in connjunction with the
binutils assembler ("-no-integrated-as"). This happens because
__LSE_PREAMBLE is defined as ".arch armv8-a+lse", which overrides the
version of the CPU architecture passed via the "-march" paramter to gas:

$ aarch64-none-linux-gnu-as -EL -I ./arch/arm64/include
                                -I ./arch/arm64/include/generated
                                -I ./include -I ./include
                                -I ./arch/arm64/include/uapi
                                -I ./arch/arm64/include/generated/uapi
                                -I ./include/uapi -I ./include/generated/uapi
                                -I ./init -I ./init
                                -march=armv8.3-a -o init/do_mounts.o
                                /tmp/do_mounts-d7992a.s
/tmp/do_mounts-d7992a.s: Assembler messages:
/tmp/do_mounts-d7992a.s:1959: Error: selected processor does not support `autiasp'
/tmp/do_mounts-d7992a.s:2021: Error: selected processor does not support `paciasp'
/tmp/do_mounts-d7992a.s:2157: Error: selected processor does not support `autiasp'
/tmp/do_mounts-d7992a.s:2175: Error: selected processor does not support `paciasp'
/tmp/do_mounts-d7992a.s:2494: Error: selected processor does not support `autiasp'

Fix the issue by replacing ".arch armv8-a+lse" with ".arch_extension lse".
Sami confirms that the clang integrated assembler does now support the
'.arch_extension' directive, so this change will be fine even for LTO
builds in future.

Fixes: e0d5896bd3 ("arm64: lse: fix LSE atomics with LLVM's integrated assembler")
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Reported-by: Amit Kachhap <Amit.Kachhap@arm.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2020-02-18 18:10:49 +00:00
Thomas Gleixner
5e3c6a312a ARM/arm64: vdso: Use common vdso clock mode storage
Convert ARM/ARM64 to the generic VDSO clock mode storage. This needs to
happen in one go as they share the clocksource driver.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lkml.kernel.org/r/20200207124403.363235229@linutronix.de
2020-02-17 20:12:16 +01:00
Mark Rutland
b3f15ec3d8 kvm: arm/arm64: Fold VHE entry/exit work into kvm_vcpu_run_vhe()
With VHE, running a vCPU always requires the sequence:

1. kvm_arm_vhe_guest_enter();
2. kvm_vcpu_run_vhe();
3. kvm_arm_vhe_guest_exit()

... and as we invoke this from the shared arm/arm64 KVM code, 32-bit arm
has to provide stubs for all three functions.

To simplify the common code, and make it easier to make further
modifications to the arm64-specific portions in the near future, let's
fold kvm_arm_vhe_guest_enter() and kvm_arm_vhe_guest_exit() into
kvm_vcpu_run_vhe().

The 32-bit stubs for kvm_arm_vhe_guest_enter() and
kvm_arm_vhe_guest_exit() are removed, as they are no longer used. The
32-bit stub for kvm_vcpu_run_vhe() is left as-is.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200210114757.2889-1-mark.rutland@arm.com
2020-02-17 14:38:37 +00:00
Qian Cai
345d52c184 arm64/spinlock: fix a -Wunused-function warning
The commit f5bfdc8e39 ("locking/osq: Use optimized spinning loop for
arm64") introduced a warning from Clang because vcpu_is_preempted() is
compiled away,

kernel/locking/osq_lock.c:25:19: warning: unused function 'node_cpu'
[-Wunused-function]
static inline int node_cpu(struct optimistic_spin_node *node)
                  ^
1 warning generated.

Fix it by converting vcpu_is_preempted() to a static inline function.

Fixes: f5bfdc8e39 ("locking/osq: Use optimized spinning loop for arm64")
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Will Deacon <will@kernel.org>
2020-02-10 11:29:24 +00:00
Anshuman Khandual
5cb7a1113f arm64: Drop do_el0_ia_bp_hardening() & do_sp_pc_abort() declarations
There is a redundant do_sp_pc_abort() declaration in exceptions.h which can
be removed. Also do_el0_ia_bp_hardening() as been already been dropped with
the commit bfe298745a ("arm64: entry-common: don't touch daif before
bp-hardening") and hence does not need a declaration any more. This should
not introduce any functional change.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2020-02-10 11:24:49 +00:00
Thomas Gleixner
2f86e45a7f Merge tag 'irqchip-fixes-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent
Pull irqchip fixes for 5.6, take #1 from Marc Zyngier:

 - Guarantee allocation of L2 vPE table for GICv4.1
 - Fix GICv4.1 VPROPBASER programming
 - Numerous GICv4.1 tidy ups
 - Fix disabled GICv3 redistributor provisioning with ACPI
 - KConfig cleanup for C-SKY
2020-02-08 15:54:03 +01:00
Zenghui Yu
5186a6cc3e irqchip/gic-v3-its: Rename VPENDBASER/VPROPBASER accessors
V{PEND,PROP}BASER registers are actually located in VLPI_base frame
of the *redistributor*. Rename their accessors to reflect this fact.

No functional changes.

Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200206075711.1275-7-yuzenghui@huawei.com
2020-02-08 10:01:33 +00:00
Linus Torvalds
153b5c566d Merge tag 'microblaze-v5.6-rc1' of git://git.monstr.eu/linux-2.6-microblaze
Pull Microblaze update from Michal Simek:

 - enable CMA

 - add support for MB v11

 - defconfig updates

 - minor fixes

* tag 'microblaze-v5.6-rc1' of git://git.monstr.eu/linux-2.6-microblaze:
  microblaze: Add ID for Microblaze v11
  microblaze: Prevent the overflow of the start
  microblaze: Wire CMA allocator
  asm-generic: Make dma-contiguous.h a mandatory include/asm header
  microblaze: Sync defconfig with latest Kconfig layout
  microblaze: defconfig: Disable EXT2 driver and Enable EXT3 & EXT4 drivers
  microblaze: Align comments with register usage
2020-02-04 11:58:07 +00:00
Michal Simek
def3f7cefe asm-generic: Make dma-contiguous.h a mandatory include/asm header
dma-continuguous.h is generic for all architectures except arm32 which has
its own version.

Similar change was done for msi.h by commit a1b39bae16
("asm-generic: Make msi.h a mandatory include/asm header")

Suggested-by: Christoph Hellwig <hch@infradead.org>
Link: https://lore.kernel.org/linux-arm-kernel/20200117080446.GA8980@lst.de/T/#m92bb56b04161057635d4142e1b3b9b6b0a70122e
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Paul Walmsley <paul.walmsley@sifive.com> # for arch/riscv
2020-02-04 11:38:59 +01:00
Steven Price
102f45fdbe arm64: mm: convert mm/dump.c to use walk_page_range()
Now walk_page_range() can walk kernel page tables, we can switch the arm64
ptdump code over to using it, simplifying the code.

Link: http://lkml.kernel.org/r/20191218162402.45610-22-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-04 03:05:25 +00:00
Steven Price
8aa82df3c1 arm64: mm: add p?d_leaf() definitions
walk_page_range() is going to be allowed to walk page tables other than
those of user space.  For this it needs to know when it has reached a
'leaf' entry in the page tables.  This information will be provided by the
p?d_leaf() functions/macros.

For arm64, we already have p?d_sect() macros which we can reuse for
p?d_leaf().

pud_sect() is defined as a dummy function when CONFIG_PGTABLE_LEVELS < 3
or CONFIG_ARM64_64K_PAGES is defined.  However when the kernel is
configured this way then architecturally it isn't allowed to have a large
page at this level, and any code using these page walking macros is
implicitly relying on the page size/number of levels being the same as the
kernel.  So it is safe to reuse this for p?d_leaf() as it is an
architectural restriction.

Link: http://lkml.kernel.org/r/20191218162402.45610-5-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-04 03:05:24 +00:00
Linus Torvalds
e813e65038 Merge tag 'kvm-5.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
 "This is the first batch of KVM changes.

  ARM:
   - cleanups and corner case fixes.

  PPC:
   - Bugfixes

  x86:
   - Support for mapping DAX areas with large nested page table entries.

   - Cleanups and bugfixes here too. A particularly important one is a
     fix for FPU load when the thread has TIF_NEED_FPU_LOAD. There is
     also a race condition which could be used in guest userspace to
     exploit the guest kernel, for which the embargo expired today.

   - Fast path for IPI delivery vmexits, shaving about 200 clock cycles
     from IPI latency.

   - Protect against "Spectre-v1/L1TF" (bring data in the cache via
     speculative out of bound accesses, use L1TF on the sibling
     hyperthread to read it), which unfortunately is an even bigger
     whack-a-mole game than SpectreV1.

  Sean continues his mission to rewrite KVM. In addition to a sizable
  number of x86 patches, this time he contributed a pretty large
  refactoring of vCPU creation that affects all architectures but should
  not have any visible effect.

  s390 will come next week together with some more x86 patches"

* tag 'kvm-5.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
  x86/KVM: Clean up host's steal time structure
  x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed
  x86/kvm: Cache gfn to pfn translation
  x86/kvm: Introduce kvm_(un)map_gfn()
  x86/kvm: Be careful not to clear KVM_VCPU_FLUSH_TLB bit
  KVM: PPC: Book3S PR: Fix -Werror=return-type build failure
  KVM: PPC: Book3S HV: Release lock on page-out failure path
  KVM: arm64: Treat emulated TVAL TimerValue as a signed 32-bit integer
  KVM: arm64: pmu: Only handle supported event counters
  KVM: arm64: pmu: Fix chained SW_INCR counters
  KVM: arm64: pmu: Don't mark a counter as chained if the odd one is disabled
  KVM: arm64: pmu: Don't increment SW_INCR if PMCR.E is unset
  KVM: x86: Use a typedef for fastop functions
  KVM: X86: Add 'else' to unify fastop and execute call path
  KVM: x86: inline memslot_valid_for_gpte
  KVM: x86/mmu: Use huge pages for DAX-backed files
  KVM: x86/mmu: Remove lpage_is_disallowed() check from set_spte()
  KVM: x86/mmu: Fold max_mapping_level() into kvm_mmu_hugepage_adjust()
  KVM: x86/mmu: Zap any compound page when collapsing sptes
  KVM: x86/mmu: Remove obsolete gfn restoration in FNAME(fetch)
  ...
2020-01-31 09:30:41 -08:00
Paolo Bonzini
4cbc418a44 Merge branch 'cve-2019-3016' into kvm-next-5.6
From Boris Ostrovsky:

The KVM hypervisor may provide a guest with ability to defer remote TLB
flush when the remote VCPU is not running. When this feature is used,
the TLB flush will happen only when the remote VPCU is scheduled to run
again. This will avoid unnecessary (and expensive) IPIs.

Under certain circumstances, when a guest initiates such deferred action,
the hypervisor may miss the request. It is also possible that the guest
may mistakenly assume that it has already marked remote VCPU as needing
a flush when in fact that request had already been processed by the
hypervisor. In both cases this will result in an invalid translation
being present in a vCPU, potentially allowing accesses to memory locations
in that guest's address space that should not be accessible.

Note that only intra-guest memory is vulnerable.

The five patches address both of these problems:
1. The first patch makes sure the hypervisor doesn't accidentally clear
a guest's remote flush request
2. The rest of the patches prevent the race between hypervisor
acknowledging a remote flush request and guest issuing a new one.

Conflicts:
	arch/x86/kvm/x86.c [move from kvm_arch_vcpu_free to kvm_arch_vcpu_destroy]
2020-01-30 18:47:59 +01:00
Paolo Bonzini
621ab20c06 Merge tag 'kvmarm-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm updates for Linux 5.6

- Fix MMIO sign extension
- Fix HYP VA tagging on tag space exhaustion
- Fix PSTATE/CPSR handling when generating exception
- Fix MMU notifier's advertizing of young pages
- Fix poisoned page handling
- Fix PMU SW event handling
- Fix TVAL register access
- Fix AArch32 external abort injection
- Fix ITS unmapped collection handling
- Various cleanups
2020-01-30 18:13:14 +01:00
Linus Torvalds
83fa805bcb Merge tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull thread management updates from Christian Brauner:
 "Sargun Dhillon over the last cycle has worked on the pidfd_getfd()
  syscall.

  This syscall allows for the retrieval of file descriptors of a process
  based on its pidfd. A task needs to have ptrace_may_access()
  permissions with PTRACE_MODE_ATTACH_REALCREDS (suggested by Oleg and
  Andy) on the target.

  One of the main use-cases is in combination with seccomp's user
  notification feature. As a reminder, seccomp's user notification
  feature was made available in v5.0. It allows a task to retrieve a
  file descriptor for its seccomp filter. The file descriptor is usually
  handed of to a more privileged supervising process. The supervisor can
  then listen for syscall events caught by the seccomp filter of the
  supervisee and perform actions in lieu of the supervisee, usually
  emulating syscalls. pidfd_getfd() is needed to expand its uses.

  There are currently two major users that wait on pidfd_getfd() and one
  future user:

   - Netflix, Sargun said, is working on a service mesh where users
     should be able to connect to a dns-based VIP. When a user connects
     to e.g. 1.2.3.4:80 that runs e.g. service "foo" they will be
     redirected to an envoy process. This service mesh uses seccomp user
     notifications and pidfd to intercept all connect calls and instead
     of connecting them to 1.2.3.4:80 connects them to e.g.
     127.0.0.1:8080.

   - LXD uses the seccomp notifier heavily to intercept and emulate
     mknod() and mount() syscalls for unprivileged containers/processes.
     With pidfd_getfd() more uses-cases e.g. bridging socket connections
     will be possible.

   - The patchset has also seen some interest from the browser corner.
     Right now, Firefox is using a SECCOMP_RET_TRAP sandbox managed by a
     broker process. In the future glibc will start blocking all signals
     during dlopen() rendering this type of sandbox impossible. Hence,
     in the future Firefox will switch to a seccomp-user-nofication
     based sandbox which also makes use of file descriptor retrieval.
     The thread for this can be found at
     https://sourceware.org/ml/libc-alpha/2019-12/msg00079.html

  With pidfd_getfd() it is e.g. possible to bridge socket connections
  for the supervisee (binding to a privileged port) and taking actions
  on file descriptors on behalf of the supervisee in general.

  Sargun's first version was using an ioctl on pidfds but various people
  pushed for it to be a proper syscall which he duely implemented as
  well over various review cycles. Selftests are of course included.
  I've also added instructions how to deal with merge conflicts below.

  There's also a small fix coming from the kernel mentee project to
  correctly annotate struct sighand_struct with __rcu to fix various
  sparse warnings. We've received a few more such fixes and even though
  they are mostly trivial I've decided to postpone them until after -rc1
  since they came in rather late and I don't want to risk introducing
  build warnings.

  Finally, there's a new prctl() command PR_{G,S}ET_IO_FLUSHER which is
  needed to avoid allocation recursions triggerable by storage drivers
  that have userspace parts that run in the IO path (e.g. dm-multipath,
  iscsi, etc). These allocation recursions deadlock the device.

  The new prctl() allows such privileged userspace components to avoid
  allocation recursions by setting the PF_MEMALLOC_NOIO and
  PF_LESS_THROTTLE flags. The patch carries the necessary acks from the
  relevant maintainers and is routed here as part of prctl()
  thread-management."

* tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim
  sched.h: Annotate sighand_struct with __rcu
  test: Add test for pidfd getfd
  arch: wire up pidfd_getfd syscall
  pid: Implement pidfd_getfd syscall
  vfs, fdtable: Add fget_task helper
2020-01-29 19:38:34 -08:00
Linus Torvalds
33c84e89ab Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
 "This series is slightly unusual because it includes Arnd's compat
  ioctl tree here:

    1c46a2cf2d Merge tag 'block-ioctl-cleanup-5.6' into 5.6/scsi-queue

  Excluding Arnd's changes, this is mostly an update of the usual
  drivers: megaraid_sas, mpt3sas, qla2xxx, ufs, lpfc, hisi_sas.

  There are a couple of core and base updates around error propagation
  and atomicity in the attribute container base we use for the SCSI
  transport classes.

  The rest is minor changes and updates"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (149 commits)
  scsi: hisi_sas: Rename hisi_sas_cq.pci_irq_mask
  scsi: hisi_sas: Add prints for v3 hw interrupt converge and automatic affinity
  scsi: hisi_sas: Modify the file permissions of trigger_dump to write only
  scsi: hisi_sas: Replace magic number when handle channel interrupt
  scsi: hisi_sas: replace spin_lock_irqsave/spin_unlock_restore with spin_lock/spin_unlock
  scsi: hisi_sas: use threaded irq to process CQ interrupts
  scsi: ufs: Use UFS device indicated maximum LU number
  scsi: ufs: Add max_lu_supported in struct ufs_dev_info
  scsi: ufs: Delete is_init_prefetch from struct ufs_hba
  scsi: ufs: Inline two functions into their callers
  scsi: ufs: Move ufshcd_get_max_pwr_mode() to ufshcd_device_params_init()
  scsi: ufs: Split ufshcd_probe_hba() based on its called flow
  scsi: ufs: Delete struct ufs_dev_desc
  scsi: ufs: Fix ufshcd_probe_hba() reture value in case ufshcd_scsi_add_wlus() fails
  scsi: ufs-mediatek: enable low-power mode for hibern8 state
  scsi: ufs: export some functions for vendor usage
  scsi: ufs-mediatek: add dbg_register_dump implementation
  scsi: qla2xxx: Fix a NULL pointer dereference in an error path
  scsi: qla1280: Make checking for 64bit support consistent
  scsi: megaraid_sas: Update driver version to 07.713.01.00-rc1
  ...
2020-01-29 18:16:16 -08:00
Linus Torvalds
6aee4badd8 Merge branch 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull openat2 support from Al Viro:
 "This is the openat2() series from Aleksa Sarai.

  I'm afraid that the rest of namei stuff will have to wait - it got
  zero review the last time I'd posted #work.namei, and there had been a
  leak in the posted series I'd caught only last weekend. I was going to
  repost it on Monday, but the window opened and the odds of getting any
  review during that... Oh, well.

  Anyway, openat2 part should be ready; that _did_ get sane amount of
  review and public testing, so here it comes"

From Aleksa's description of the series:
 "For a very long time, extending openat(2) with new features has been
  incredibly frustrating. This stems from the fact that openat(2) is
  possibly the most famous counter-example to the mantra "don't silently
  accept garbage from userspace" -- it doesn't check whether unknown
  flags are present[1].

  This means that (generally) the addition of new flags to openat(2) has
  been fraught with backwards-compatibility issues (O_TMPFILE has to be
  defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
  kernels gave errors, since it's insecure to silently ignore the
  flag[2]). All new security-related flags therefore have a tough road
  to being added to openat(2).

  Furthermore, the need for some sort of control over VFS's path
  resolution (to avoid malicious paths resulting in inadvertent
  breakouts) has been a very long-standing desire of many userspace
  applications.

  This patchset is a revival of Al Viro's old AT_NO_JUMPS[3] patchset
  (which was a variant of David Drysdale's O_BENEATH patchset[4] which
  was a spin-off of the Capsicum project[5]) with a few additions and
  changes made based on the previous discussion within [6] as well as
  others I felt were useful.

  In line with the conclusions of the original discussion of
  AT_NO_JUMPS, the flag has been split up into separate flags. However,
  instead of being an openat(2) flag it is provided through a new
  syscall openat2(2) which provides several other improvements to the
  openat(2) interface (see the patch description for more details). The
  following new LOOKUP_* flags are added:

  LOOKUP_NO_XDEV:

     Blocks all mountpoint crossings (upwards, downwards, or through
     absolute links). Absolute pathnames alone in openat(2) do not
     trigger this. Magic-link traversal which implies a vfsmount jump is
     also blocked (though magic-link jumps on the same vfsmount are
     permitted).

  LOOKUP_NO_MAGICLINKS:

     Blocks resolution through /proc/$pid/fd-style links. This is done
     by blocking the usage of nd_jump_link() during resolution in a
     filesystem. The term "magic-links" is used to match with the only
     reference to these links in Documentation/, but I'm happy to change
     the name.

     It should be noted that this is different to the scope of
     ~LOOKUP_FOLLOW in that it applies to all path components. However,
     you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it
     will *not* fail (assuming that no parent component was a
     magic-link), and you will have an fd for the magic-link.

     In order to correctly detect magic-links, the introduction of a new
     LOOKUP_MAGICLINK_JUMPED state flag was required.

  LOOKUP_BENEATH:

     Disallows escapes to outside the starting dirfd's
     tree, using techniques such as ".." or absolute links. Absolute
     paths in openat(2) are also disallowed.

     Conceptually this flag is to ensure you "stay below" a certain
     point in the filesystem tree -- but this requires some additional
     to protect against various races that would allow escape using
     "..".

     Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it
     can trivially beam you around the filesystem (breaking the
     protection). In future, there might be similar safety checks done
     as in LOOKUP_IN_ROOT, but that requires more discussion.

  In addition, two new flags are added that expand on the above ideas:

  LOOKUP_NO_SYMLINKS:

     Does what it says on the tin. No symlink resolution is allowed at
     all, including magic-links. Just as with LOOKUP_NO_MAGICLINKS this
     can still be used with NOFOLLOW to open an fd for the symlink as
     long as no parent path had a symlink component.

  LOOKUP_IN_ROOT:

     This is an extension of LOOKUP_BENEATH that, rather than blocking
     attempts to move past the root, forces all such movements to be
     scoped to the starting point. This provides chroot(2)-like
     protection but without the cost of a chroot(2) for each filesystem
     operation, as well as being safe against race attacks that
     chroot(2) is not.

     If a race is detected (as with LOOKUP_BENEATH) then an error is
     generated, and similar to LOOKUP_BENEATH it is not permitted to
     cross magic-links with LOOKUP_IN_ROOT.

     The primary need for this is from container runtimes, which
     currently need to do symlink scoping in userspace[7] when opening
     paths in a potentially malicious container.

     There is a long list of CVEs that could have bene mitigated by
     having RESOLVE_THIS_ROOT (such as CVE-2017-1002101,
     CVE-2017-1002102, CVE-2018-15664, and CVE-2019-5736, just to name a
     few).

  In order to make all of the above more usable, I'm working on
  libpathrs[8] which is a C-friendly library for safe path resolution.
  It features a userspace-emulated backend if the kernel doesn't support
  openat2(2). Hopefully we can get userspace to switch to using it, and
  thus get openat2(2) support for free once it's ready.

  Future work would include implementing things like
  RESOLVE_NO_AUTOMOUNT and possibly a RESOLVE_NO_REMOTE (to allow
  programs to be sure they don't hit DoSes though stale NFS handles)"

* 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  Documentation: path-lookup: include new LOOKUP flags
  selftests: add openat2(2) selftests
  open: introduce openat2(2) syscall
  namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution
  namei: LOOKUP_IN_ROOT: chroot-like scoped resolution
  namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution
  namei: LOOKUP_NO_XDEV: block mountpoint crossing
  namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution
  namei: LOOKUP_NO_SYMLINKS: block symlink resolution
  namei: allow set_root() to produce errors
  namei: allow nd_jump_link() to produce errors
  nsfs: clean-up ns_get_path() signature to return int
  namei: only return -ECHILD from follow_dotdot_rcu()
2020-01-29 11:20:24 -08:00
Linus Torvalds
c677124e63 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "These were the main changes in this cycle:

   - More -rt motivated separation of CONFIG_PREEMPT and
     CONFIG_PREEMPTION.

   - Add more low level scheduling topology sanity checks and warnings
     to filter out nonsensical topologies that break scheduling.

   - Extend uclamp constraints to influence wakeup CPU placement

   - Make the RT scheduler more aware of asymmetric topologies and CPU
     capacities, via uclamp metrics, if CONFIG_UCLAMP_TASK=y

   - Make idle CPU selection more consistent

   - Various fixes, smaller cleanups, updates and enhancements - please
     see the git log for details"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
  sched/fair: Define sched_idle_cpu() only for SMP configurations
  sched/topology: Assert non-NUMA topology masks don't (partially) overlap
  idle: fix spelling mistake "iterrupts" -> "interrupts"
  sched/fair: Remove redundant call to cpufreq_update_util()
  sched/psi: create /proc/pressure and /proc/pressure/{io|memory|cpu} only when psi enabled
  sched/fair: Fix sgc->{min,max}_capacity calculation for SD_OVERLAP
  sched/fair: calculate delta runnable load only when it's needed
  sched/cputime: move rq parameter in irqtime_account_process_tick
  stop_machine: Make stop_cpus() static
  sched/debug: Reset watchdog on all CPUs while processing sysrq-t
  sched/core: Fix size of rq::uclamp initialization
  sched/uclamp: Fix a bug in propagating uclamp value in new cgroups
  sched/fair: Load balance aggressively for SCHED_IDLE CPUs
  sched/fair : Improve update_sd_pick_busiest for spare capacity case
  watchdog: Remove soft_lockup_hrtimer_cnt and related code
  sched/rt: Make RT capacity-aware
  sched/fair: Make EAS wakeup placement consider uclamp restrictions
  sched/fair: Make task_fits_capacity() consider uclamp restrictions
  sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with()
  sched/uclamp: Make uclamp util helpers use and return UL values
  ...
2020-01-28 10:07:09 -08:00
Linus Torvalds
2180f214f4 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "Just a handful of changes in this cycle: an ARM64 performance
  optimization, a comment fix and a debug output fix"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/osq: Use optimized spinning loop for arm64
  locking/qspinlock: Fix inaccessible URL of MCS lock paper
  locking/lockdep: Fix lockdep_stats indentation problem
2020-01-28 09:33:25 -08:00
Linus Torvalds
634cd4b6af Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Cleanup of the GOP [graphics output] handling code in the EFI stub

   - Complete refactoring of the mixed mode handling in the x86 EFI stub

   - Overhaul of the x86 EFI boot/runtime code

   - Increase robustness for mixed mode code

   - Add the ability to disable DMA at the root port level in the EFI
     stub

   - Get rid of RWX mappings in the EFI memory map and page tables,
     where possible

   - Move the support code for the old EFI memory mapping style into its
     only user, the SGI UV1+ support code.

   - plus misc fixes, updates, smaller cleanups.

  ... and due to interactions with the RWX changes, another round of PAT
  cleanups make a guest appearance via the EFI tree - with no side
  effects intended"

* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
  efi/x86: Disable instrumentation in the EFI runtime handling code
  efi/libstub/x86: Fix EFI server boot failure
  efi/x86: Disallow efi=old_map in mixed mode
  x86/boot/compressed: Relax sed symbol type regex for LLVM ld.lld
  efi/x86: avoid KASAN false positives when accessing the 1: 1 mapping
  efi: Fix handling of multiple efi_fake_mem= entries
  efi: Fix efi_memmap_alloc() leaks
  efi: Add tracking for dynamically allocated memmaps
  efi: Add a flags parameter to efi_memory_map
  efi: Fix comment for efi_mem_type() wrt absent physical addresses
  efi/arm: Defer probe of PCIe backed efifb on DT systems
  efi/x86: Limit EFI old memory map to SGI UV machines
  efi/x86: Avoid RWX mappings for all of DRAM
  efi/x86: Don't map the entire kernel text RW for mixed mode
  x86/mm: Fix NX bit clearing issue in kernel_map_pages_in_pgd
  efi/libstub/x86: Fix unused-variable warning
  efi/libstub/x86: Use mandatory 16-byte stack alignment in mixed mode
  efi/libstub/x86: Use const attribute for efi_is_64bit()
  efi: Allow disabling PCI busmastering on bridges during boot
  efi/x86: Allow translating 64-bit arguments for mixed mode calls
  ...
2020-01-28 09:03:40 -08:00
Linus Torvalds
3d3b44a61a Merge tag 'irq-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
 "The interrupt departement provides:

   - A mechanism to shield isolated tasks from managed interrupts:

     The affinity of managed interrupts is completely controlled by the
     kernel and user space has no influence on them. The reason is that
     the automatically assigned affinity correlates to the multi-queue
     CPU handling of block devices.

     If the generated affinity mask spaws both housekeeping and isolated
     CPUs the interrupt could be routed to an isolated CPU which would
     then be disturbed by I/O submitted by a housekeeping CPU.

     The new mechamism ensures that as long as one housekeeping CPU is
     online in the assigned affinity mask the interrupt is routed to a
     housekeeping CPU.

     If there is no online housekeeping CPU in the affinity mask, then
     the interrupt is routed to an isolated CPU to keep the device queue
     intact, but unless the isolated CPU submits I/O by itself these
     interrupts are not raised.

   - A small addon to the device tree irqdomain core code to avoid
     duplication in irq chip drivers

   - Conversion of the SiFive PLIC to hierarchical domains

   - The usual pile of new irq chip drivers: SiFive GPIO, Aspeed SCI,
     NXP INTMUX, Meson A1 GPIO

   - The first cut of support for the new ARM GICv4.1

   - The usual pile of fixes and improvements in core and driver code"

* tag 'irq-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  genirq, sched/isolation: Isolate from handling managed interrupts
  irqchip/gic-v4.1: Allow direct invalidation of VLPIs
  irqchip/gic-v4.1: Suppress per-VLPI doorbell
  irqchip/gic-v4.1: Add VPE INVALL callback
  irqchip/gic-v4.1: Add VPE eviction callback
  irqchip/gic-v4.1: Add VPE residency callback
  irqchip/gic-v4.1: Add mask/unmask doorbell callbacks
  irqchip/gic-v4.1: Plumb skeletal VPE irqchip
  irqchip/gic-v4.1: Implement the v4.1 flavour of VMOVP
  irqchip/gic-v4.1: Don't use the VPE proxy if RVPEID is set
  irqchip/gic-v4.1: Implement the v4.1 flavour of VMAPP
  irqchip/gic-v4.1: VPE table (aka GICR_VPROPBASER) allocation
  irqchip/gic-v3: Add GICv4.1 VPEID size discovery
  irqchip/gic-v3: Detect GICv4.1 supporting RVPEID
  irqchip/gic-v3-its: Fix get_vlpi_map() breakage with doorbells
  irqdomain: Fix a memory leak in irq_domain_push_irq()
  irqchip: Add NXP INTMUX interrupt multiplexer support
  dt-bindings: interrupt-controller: Add binding for NXP INTMUX interrupt multiplexer
  irqchip: Define EXYNOS_IRQ_COMBINER
  irqchip/meson-gpio: Add support for meson a1 SoCs
  ...
2020-01-27 17:22:21 -08:00
Linus Torvalds
e279160f49 Merge tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "The timekeeping and timers departement provides:

   - Time namespace support:

     If a container migrates from one host to another then it expects
     that clocks based on MONOTONIC and BOOTTIME are not subject to
     disruption. Due to different boot time and non-suspended runtime
     these clocks can differ significantly on two hosts, in the worst
     case time goes backwards which is a violation of the POSIX
     requirements.

     The time namespace addresses this problem. It allows to set offsets
     for clock MONOTONIC and BOOTTIME once after creation and before
     tasks are associated with the namespace. These offsets are taken
     into account by timers and timekeeping including the VDSO.

     Offsets for wall clock based clocks (REALTIME/TAI) are not provided
     by this mechanism. While in theory possible, the overhead and code
     complexity would be immense and not justified by the esoteric
     potential use cases which were discussed at Plumbers '18.

     The overhead for tasks in the root namespace (ie where host time
     offsets = 0) is in the noise and great effort was made to ensure
     that especially in the VDSO. If time namespace is disabled in the
     kernel configuration the code is compiled out.

     Kudos to Andrei Vagin and Dmitry Sofanov who implemented this
     feature and kept on for more than a year addressing review
     comments, finding better solutions. A pleasant experience.

   - Overhaul of the alarmtimer device dependency handling to ensure
     that the init/suspend/resume ordering is correct.

   - A new clocksource/event driver for Microchip PIT64

   - Suspend/resume support for the Hyper-V clocksource

   - The usual pile of fixes, updates and improvements mostly in the
     driver code"

* tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
  alarmtimer: Make alarmtimer_get_rtcdev() a stub when CONFIG_RTC_CLASS=n
  alarmtimer: Use wakeup source from alarmtimer platform device
  alarmtimer: Make alarmtimer platform device child of RTC device
  alarmtimer: Update alarmtimer_get_rtcdev() docs to reflect reality
  hrtimer: Add missing sparse annotation for __run_timer()
  lib/vdso: Only read hrtimer_res when needed in __cvdso_clock_getres()
  MIPS: vdso: Define BUILD_VDSO32 when building a 32bit kernel
  clocksource/drivers/hyper-v: Set TSC clocksource as default w/ InvariantTSC
  clocksource/drivers/hyper-v: Untangle stimers and timesync from clocksources
  clocksource/drivers/timer-microchip-pit64b: Fix sparse warning
  clocksource/drivers/exynos_mct: Rename Exynos to lowercase
  clocksource/drivers/timer-ti-dm: Fix uninitialized pointer access
  clocksource/drivers/timer-ti-dm: Switch to platform_get_irq
  clocksource/drivers/timer-ti-dm: Convert to devm_platform_ioremap_resource
  clocksource/drivers/em_sti: Fix variable declaration in em_sti_probe
  clocksource/drivers/em_sti: Convert to devm_platform_ioremap_resource
  clocksource/drivers/bcm2835_timer: Fix memory leak of timer
  clocksource/drivers/cadence-ttc: Use ttc driver as platform driver
  clocksource/drivers/timer-microchip-pit64b: Add Microchip PIT64B support
  clocksource/drivers/hyper-v: Reserve PAGE_SIZE space for tsc page
  ...
2020-01-27 16:47:05 -08:00
Paolo Bonzini
7495e22bb1 KVM: Move running VCPU from ARM to common code
For ring-based dirty log tracking, it will be more efficient to account
writes during schedule-out or schedule-in to the currently running VCPU.
We would like to do it even if the write doesn't use the current VCPU's
address space, as is the case for cached writes (see commit 4e335d9e7d,
"Revert "KVM: Support vCPU-based gfn->hva cache"", 2017-05-02).

Therefore, add a mechanism to track the currently-loaded kvm_vcpu struct.
There is already something similar in KVM/ARM; one important difference
is that kvm_arch_vcpu_{load,put} have two callers in virt/kvm/kvm_main.c:
we have to update both the architecture-independent vcpu_{load,put} and
the preempt notifiers.

Another change made in the process is to allow using kvm_get_running_vcpu()
in preemptible code.  This is allowed because preempt notifiers ensure
that the value does not change even after the VCPU thread is migrated.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27 19:59:54 +01:00
Sean Christopherson
ddd259c9aa KVM: Drop kvm_arch_vcpu_init() and kvm_arch_vcpu_uninit()
Remove kvm_arch_vcpu_init() and kvm_arch_vcpu_uninit() now that all
arch specific implementations are nops.

Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27 19:59:33 +01:00
Sean Christopherson
19bcc89eb8 KVM: arm64: Free sve_state via arm specific hook
Add an arm specific hook to free the arm64-only sve_state.  Doing so
eliminates the last functional code from kvm_arch_vcpu_uninit() across
all architectures and paves the way for removing kvm_arch_vcpu_init()
and kvm_arch_vcpu_uninit() entirely.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27 19:59:32 +01:00
Linus Torvalds
0238d3c753 Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
 "The changes are a real mixed bag this time around.

  The only scary looking one from the diffstat is the uapi change to
  asm-generic/mman-common.h, but this has been acked by Arnd and is
  actually just adding a pair of comments in an attempt to prevent
  allocation of some PROT values which tend to get used for
  arch-specific purposes. We'll be using them for Branch Target
  Identification (a CFI-like hardening feature), which is currently
  under review on the mailing list.

  New architecture features:

   - Support for Armv8.5 E0PD, which benefits KASLR in the same way as
     KPTI but without the overhead. This allows KPTI to be disabled on
     CPUs that are not affected by Meltdown, even is KASLR is enabled.

   - Initial support for the Armv8.5 RNG instructions, which claim to
     provide access to a high bandwidth, cryptographically secure
     hardware random number generator. As well as exposing these to
     userspace, we also use them as part of the KASLR seed and to seed
     the crng once all CPUs have come online.

   - Advertise a bunch of new instructions to userspace, including
     support for Data Gathering Hint, Matrix Multiply and 16-bit
     floating point.

  Kexec:

   - Cleanups in preparation for relocating with the MMU enabled

   - Support for loading crash dump kernels with kexec_file_load()

  Perf and PMU drivers:

   - Cleanups and non-critical fixes for a couple of system PMU drivers

  FPU-less (aka broken) CPU support:

   - Considerable fixes to support CPUs without the FP/SIMD extensions,
     including their presence in heterogeneous systems. Good luck
     finding a 64-bit userspace that handles this.

  Modern assembly function annotations:

   - Start migrating our use of ENTRY() and ENDPROC() over to the
     new-fangled SYM_{CODE,FUNC}_{START,END} macros, which are intended
     to aid debuggers

  Kbuild:

   - Cleanup detection of LSE support in the assembler by introducing
     'as-instr'

   - Remove compressed Image files when building clean targets

  IP checksumming:

   - Implement optimised IPv4 checksumming routine when hardware offload
     is not in use. An IPv6 version is in the works, pending testing.

  Hardware errata:

   - Work around Cortex-A55 erratum #1530923

  Shadow call stack:

   - Work around some issues with Clang's integrated assembler not
     liking our perfectly reasonable assembly code

   - Avoid allocating the X18 register, so that it can be used to hold
     the shadow call stack pointer in future

  ACPI:

   - Fix ID count checking in IORT code. This may regress broken
     firmware that happened to work with the old implementation, in
     which case we'll have to revert it and try something else

   - Fix DAIF corruption on return from GHES handler with pseudo-NMIs

  Miscellaneous:

   - Whitelist some CPUs that are unaffected by Spectre-v2

   - Reduce frequency of ASID rollover when KPTI is compiled in but
     inactive

   - Reserve a couple of arch-specific PROT flags that are already used
     by Sparc and PowerPC and are planned for later use with BTI on
     arm64

   - Preparatory cleanup of our entry assembly code in preparation for
     moving more of it into C later on

   - Refactoring and cleanup"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (73 commits)
  arm64: acpi: fix DAIF manipulation with pNMI
  arm64: kconfig: Fix alignment of E0PD help text
  arm64: Use v8.5-RNG entropy for KASLR seed
  arm64: Implement archrandom.h for ARMv8.5-RNG
  arm64: kbuild: remove compressed images on 'make ARCH=arm64 (dist)clean'
  arm64: entry: Avoid empty alternatives entries
  arm64: Kconfig: select HAVE_FUTEX_CMPXCHG
  arm64: csum: Fix pathological zero-length calls
  arm64: entry: cleanup sp_el0 manipulation
  arm64: entry: cleanup el0 svc handler naming
  arm64: entry: mark all entry code as notrace
  arm64: assembler: remove smp_dmb macro
  arm64: assembler: remove inherit_daif macro
  ACPI/IORT: Fix 'Number of IDs' handling in iort_id_map()
  mm: Reserve asm-generic prot flags 0x10 and 0x20 for arch use
  arm64: Use macros instead of hard-coded constants for MAIR_EL1
  arm64: Add KRYO{3,4}XX CPU cores to spectre-v2 safe list
  arm64: kernel: avoid x18 in __cpu_soft_restart
  arm64: kvm: stop treating register x18 as caller save
  arm64/lib: copy_page: avoid x18 register in assembler code
  ...
2020-01-27 08:58:19 -08:00
Andrew Jones
290a6bb06d arm64: KVM: Add UAPI notes for swapped registers
Two UAPI system register IDs do not derive their values from the
ARM system register encodings. This is because their values were
accidentally swapped. As the IDs are API, they cannot be changed.
Add WARNING notes to point them out.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Andrew Jones <drjones@redhat.com>
[maz: turned XXX into WARNING]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200120130825.28838-1-drjones@redhat.com
2020-01-23 10:38:14 +00:00
Marc Zyngier
0e20f5e255 KVM: arm/arm64: Cleanup MMIO handling
Our MMIO handling is a bit odd, in the sense that it uses an
intermediate per-vcpu structure to store the various decoded
information that describe the access.

But the same information is readily available in the HSR/ESR_EL2
field, and we actually use this field to populate the structure.

Let's simplify the whole thing by getting rid of the superfluous
structure and save a (tiny) bit of space in the vcpu structure.

[32bit fix courtesy of Olof Johansson <olof@lixom.net>]
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-01-23 10:38:14 +00:00
Mark Rutland
e533dbe9dc arm64: acpi: fix DAIF manipulation with pNMI
Since commit:

  d44f1b8dd7 ("arm64: KVM/mm: Move SEA handling behind a single 'claim' interface")

... the top-level APEI SEA handler has the shape:

1. current_flags = arch_local_save_flags()
2. local_daif_restore(DAIF_ERRCTX)
3. <GHES handler>
4. local_daif_restore(current_flags)

However, since commit:

  4a503217ce ("arm64: irqflags: Use ICC_PMR_EL1 for interrupt masking")

... when pseudo-NMIs (pNMIs) are in use, arch_local_save_flags() will save
the PMR value rather than the DAIF flags.

The combination of these two commits means that the APEI SEA handler will
erroneously attempt to restore the PMR value into DAIF. Fix this by
factoring local_daif_save_flags() out of local_daif_save(), so that we
can consistently save DAIF in step #1, regardless of whether pNMIs are in
use.

Both commits were introduced concurrently in v5.0.

Cc: <stable@vger.kernel.org>
Fixes: 4a503217ce ("arm64: irqflags: Use ICC_PMR_EL1 for interrupt masking")
Fixes: d44f1b8dd7 ("arm64: KVM/mm: Move SEA handling behind a single 'claim' interface")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-22 14:41:22 +00:00
Marc Zyngier
5e5168461c irqchip/gic-v4.1: VPE table (aka GICR_VPROPBASER) allocation
GICv4.1 defines a new VPE table that is potentially shared between
both the ITSs and the redistributors, following complicated affinity
rules.

To make things more confusing, the programming of this table at
the redistributor level is reusing the GICv4.0 GICR_VPROPBASER register
for something completely different.

The code flow is somewhat complexified by the need to respect the
affinities required by the HW, meaning that tables can either be
inherited from a previously discovered ITS or redistributor.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20191224111055.11836-6-maz@kernel.org
2020-01-22 14:22:19 +00:00
Will Deacon
bc20606594 Merge branch 'for-next/rng' into for-next/core
* for-next/rng: (2 commits)
  arm64: Use v8.5-RNG entropy for KASLR seed
  ...
2020-01-22 11:38:53 +00:00
Will Deacon
ab3906c531 Merge branch 'for-next/errata' into for-next/core
* for-next/errata: (3 commits)
  arm64: Workaround for Cortex-A55 erratum 1530923
  ...
2020-01-22 11:35:05 +00:00
Will Deacon
aa246c056c Merge branch 'for-next/asm-annotations' into for-next/core
* for-next/asm-annotations: (6 commits)
  arm64: kernel: Correct annotation of end of el0_sync
  ...
2020-01-22 11:34:21 +00:00
Will Deacon
4f6cdf296c Merge branches 'for-next/acpi', 'for-next/cpufeatures', 'for-next/csum', 'for-next/e0pd', 'for-next/entry', 'for-next/kbuild', 'for-next/kexec/cleanup', 'for-next/kexec/file-kdump', 'for-next/misc', 'for-next/nofpsimd', 'for-next/perf' and 'for-next/scs' into for-next/core
* for-next/acpi:
  ACPI/IORT: Fix 'Number of IDs' handling in iort_id_map()

* for-next/cpufeatures: (2 commits)
  arm64: Introduce ID_ISAR6 CPU register
  ...

* for-next/csum: (2 commits)
  arm64: csum: Fix pathological zero-length calls
  ...

* for-next/e0pd: (7 commits)
  arm64: kconfig: Fix alignment of E0PD help text
  ...

* for-next/entry: (5 commits)
  arm64: entry: cleanup sp_el0 manipulation
  ...

* for-next/kbuild: (4 commits)
  arm64: kbuild: remove compressed images on 'make ARCH=arm64 (dist)clean'
  ...

* for-next/kexec/cleanup: (11 commits)
  Revert "arm64: kexec: make dtb_mem always enabled"
  ...

* for-next/kexec/file-kdump: (2 commits)
  arm64: kexec_file: add crash dump support
  ...

* for-next/misc: (12 commits)
  arm64: entry: Avoid empty alternatives entries
  ...

* for-next/nofpsimd: (7 commits)
  arm64: nofpsmid: Handle TIF_FOREIGN_FPSTATE flag cleanly
  ...

* for-next/perf: (2 commits)
  perf/imx_ddr: Fix cpu hotplug state cleanup
  ...

* for-next/scs: (6 commits)
  arm64: kernel: avoid x18 in __cpu_soft_restart
  ...
2020-01-22 11:32:31 +00:00
Mark Brown
2e8e1ea88c arm64: Use v8.5-RNG entropy for KASLR seed
When seeding KALSR on a system where we have architecture level random
number generation make use of that entropy, mixing it in with the seed
passed by the bootloader. Since this is run very early in init before
feature detection is complete we open code rather than use archrandom.h.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-22 09:54:52 +00:00
Richard Henderson
1a50ec0b3b arm64: Implement archrandom.h for ARMv8.5-RNG
Expose the ID_AA64ISAR0.RNDR field to userspace, as the RNG system
registers are always available at EL0.

Implement arch_get_random_seed_long using RNDR.  Given that the
TRNG is likely to be a shared resource between cores, and VMs,
do not explicitly force re-seeding with RNDRRS.  In order to avoid
code complexity and potential issues with hetrogenous systems only
provide values after cpufeature has finalized the system capabilities.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
[Modified to only function after cpufeature has finalized the system
capabilities and move all the code into the header -- broonie]
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
[will: Advertise HWCAP via /proc/cpuinfo]
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-22 09:54:18 +00:00
Ingo Molnar
a786810cc8 Merge tag 'v5.5-rc7' into efi/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-20 08:05:16 +01:00
Mark Rutland
1cfbb484de KVM: arm/arm64: Correct AArch32 SPSR on exception entry
Confusingly, there are three SPSR layouts that a kernel may need to deal
with:

(1) An AArch64 SPSR_ELx view of an AArch64 pstate
(2) An AArch64 SPSR_ELx view of an AArch32 pstate
(3) An AArch32 SPSR_* view of an AArch32 pstate

When the KVM AArch32 support code deals with SPSR_{EL2,HYP}, it's either
dealing with #2 or #3 consistently. On arm64 the PSR_AA32_* definitions
match the AArch64 SPSR_ELx view, and on arm the PSR_AA32_* definitions
match the AArch32 SPSR_* view.

However, when we inject an exception into an AArch32 guest, we have to
synthesize the AArch32 SPSR_* that the guest will see. Thus, an AArch64
host needs to synthesize layout #3 from layout #2.

This patch adds a new host_spsr_to_spsr32() helper for this, and makes
use of it in the KVM AArch32 support code. For arm64 we need to shuffle
the DIT bit around, and remove the SS bit, while for arm we can use the
value as-is.

I've open-coded the bit manipulation for now to avoid having to rework
the existing PSR_* definitions into PSR64_AA32_* and PSR32_AA32_*
definitions. I hope to perform a more thorough refactoring in future so
that we can handle pstate view manipulation more consistently across the
kernel tree.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200108134324.46500-4-mark.rutland@arm.com
2020-01-19 18:06:14 +00:00
Mark Rutland
3c2483f154 KVM: arm/arm64: Correct CPSR on exception entry
When KVM injects an exception into a guest, it generates the CPSR value
from scratch, configuring CPSR.{M,A,I,T,E}, and setting all other
bits to zero.

This isn't correct, as the architecture specifies that some CPSR bits
are (conditionally) cleared or set upon an exception, and others are
unchanged from the original context.

This patch adds logic to match the architectural behaviour. To make this
simple to follow/audit/extend, documentation references are provided,
and bits are configured in order of their layout in SPSR_EL2. This
layout can be seen in the diagram on ARM DDI 0487E.a page C5-426.

Note that this code is used by both arm and arm64, and is intended to
fuction with the SPSR_EL2 and SPSR_HYP layouts.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200108134324.46500-3-mark.rutland@arm.com
2020-01-19 18:06:14 +00:00