Commit Graph

34592 Commits

Author SHA1 Message Date
He Zhe
edec6e015a KVM: LAPIC: Mark hrtimer for period or oneshot mode to expire in hard interrupt context
apic->lapic_timer.timer was initialized with HRTIMER_MODE_ABS_HARD but
started later with HRTIMER_MODE_ABS, which may cause the following warning
in PREEMPT_RT kernel.

WARNING: CPU: 1 PID: 2957 at kernel/time/hrtimer.c:1129 hrtimer_start_range_ns+0x348/0x3f0
CPU: 1 PID: 2957 Comm: qemu-system-x86 Not tainted 5.4.23-rt11 #1
Hardware name: Supermicro SYS-E300-9A-8C/A2SDi-8C-HLN4F, BIOS 1.1a 09/18/2018
RIP: 0010:hrtimer_start_range_ns+0x348/0x3f0
Code: 4d b8 0f 94 c1 0f b6 c9 e8 35 f1 ff ff 4c 8b 45
      b0 e9 3b fd ff ff e8 d7 3f fa ff 48 98 4c 03 34
      c5 a0 26 bf 93 e9 a1 fd ff ff <0f> 0b e9 fd fc ff
      ff 65 8b 05 fa b7 90 6d 89 c0 48 0f a3 05 60 91
RSP: 0018:ffffbc60026ffaf8 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff9d81657d4110 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000006cc7987bcf RDI: ffff9d81657d4110
RBP: ffffbc60026ffb58 R08: 0000000000000001 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000000 R12: 0000006cc7987bcf
R13: 0000000000000000 R14: 0000006cc7987bcf R15: ffffbc60026d6a00
FS: 00007f401daed700(0000) GS:ffff9d81ffa40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000ffffffff CR3: 0000000fa7574000 CR4: 00000000003426e0
Call Trace:
? kvm_release_pfn_clean+0x22/0x60 [kvm]
start_sw_timer+0x85/0x230 [kvm]
? vmx_vmexit+0x1b/0x30 [kvm_intel]
kvm_lapic_switch_to_sw_timer+0x72/0x80 [kvm]
vmx_pre_block+0x1cb/0x260 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_vmexit+0x1b/0x30 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_vmexit+0x1b/0x30 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_vmexit+0x1b/0x30 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_vmexit+0x1b/0x30 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_vmexit+0x1b/0x30 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_vmexit+0x1b/0x30 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_vmexit+0x1b/0x30 [kvm_intel]
? vmx_vmexit+0xf/0x30 [kvm_intel]
? vmx_sync_pir_to_irr+0x9e/0x100 [kvm_intel]
? kvm_apic_has_interrupt+0x46/0x80 [kvm]
kvm_arch_vcpu_ioctl_run+0x85b/0x1fa0 [kvm]
? _raw_spin_unlock_irqrestore+0x18/0x50
? _copy_to_user+0x2c/0x30
kvm_vcpu_ioctl+0x235/0x660 [kvm]
? rt_spin_unlock+0x2c/0x50
do_vfs_ioctl+0x3e4/0x650
? __fget+0x7a/0xa0
ksys_ioctl+0x67/0x90
__x64_sys_ioctl+0x1a/0x20
do_syscall_64+0x4d/0x120
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f4027cc54a7
Code: 00 00 90 48 8b 05 e9 59 0c 00 64 c7 00 26 00 00
      00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00
      00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff
      73 01 c3 48 8b 0d b9 59 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007f401dae9858 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00005558bd029690 RCX: 00007f4027cc54a7
RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000d
RBP: 00007f4028b72000 R08: 00005558bc829ad0 R09: 00000000ffffffff
R10: 00005558bcf90ca0 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 00005558bce1c840
--[ end trace 0000000000000002 ]--

Signed-off-by: He Zhe <zhe.he@windriver.com>
Message-Id: <1584687967-332859-1-git-send-email-zhe.he@windriver.com>
Reviewed-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-23 09:01:14 -04:00
Tom Lendacky
2e2409afe5 KVM: SVM: Issue WBINVD after deactivating an SEV guest
Currently, CLFLUSH is used to flush SEV guest memory before the guest is
terminated (or a memory hotplug region is removed). However, CLFLUSH is
not enough to ensure that SEV guest tagged data is flushed from the cache.

With 33af3a7ef9 ("KVM: SVM: Reduce WBINVD/DF_FLUSH invocations"), the
original WBINVD was removed. This then exposed crashes at random times
because of a cache flush race with a page that had both a hypervisor and
a guest tag in the cache.

Restore the WBINVD when destroying an SEV guest and add a WBINVD to the
svm_unregister_enc_region() function to ensure hotplug memory is flushed
when removed. The DF_FLUSH can still be avoided at this point.

Fixes: 33af3a7ef9 ("KVM: SVM: Reduce WBINVD/DF_FLUSH invocations")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <c8bf9087ca3711c5770bdeaafa3e45b717dc5ef4.1584720426.git.thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-23 09:01:04 -04:00
Wei Huang
077168e241 x86/mce/amd: Add PPIN support for AMD MCE
Newer AMD CPUs support a feature called protected processor
identification number (PPIN). This feature can be detected via
CPUID_Fn80000008_EBX[23].

However, CPUID alone is not enough to read the processor identification
number - MSR_AMD_PPIN_CTL also needs to be configured properly. If, for
any reason, MSR_AMD_PPIN_CTL[PPIN_EN] can not be turned on, such as
disabled in BIOS, the CPU capability bit X86_FEATURE_AMD_PPIN needs to
be cleared.

When the X86_FEATURE_AMD_PPIN capability is available, the
identification number is issued together with the MCE error info in
order to keep track of the source of MCE errors.

 [ bp: Massage. ]

Co-developed-by: Smita Koralahalli Channabasappa <smita.koralahallichannabasappa@amd.com>
Signed-off-by: Smita Koralahalli Channabasappa <smita.koralahallichannabasappa@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200321193800.3666964-1-wei.huang2@amd.com
2020-03-22 11:03:47 +01:00
Joerg Roedel
763802b53a x86/mm: split vmalloc_sync_all()
Commit 3f8fd02b1b ("mm/vmalloc: Sync unmappings in
__purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in
the vunmap() code-path.  While this change was necessary to maintain
correctness on x86-32-pae kernels, it also adds additional cycles for
architectures that don't need it.

Specifically on x86-64 with CONFIG_VMAP_STACK=y some people reported
severe performance regressions in micro-benchmarks because it now also
calls the x86-64 implementation of vmalloc_sync_all() on vunmap().  But
the vmalloc_sync_all() implementation on x86-64 is only needed for newly
created mappings.

To avoid the unnecessary work on x86-64 and to gain the performance
back, split up vmalloc_sync_all() into two functions:

	* vmalloc_sync_mappings(), and
	* vmalloc_sync_unmappings()

Most call-sites to vmalloc_sync_all() only care about new mappings being
synchronized.  The only exception is the new call-site added in the
above mentioned commit.

Shile Zhang directed us to a report of an 80% regression in reaim
throughput.

Fixes: 3f8fd02b1b ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Borislav Petkov <bp@suse.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	[GHES]
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20191009124418.8286-1-joro@8bytes.org
Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/
Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-21 18:56:06 -07:00
Peter Zijlstra
46db36abc3 x86/entry: Rename ___preempt_schedule
Because moar '_' isn't always moar readable.

git grep -l "___preempt_schedule\(_notrace\)*" | while read file;
do
	sed -ie 's/___preempt_schedule\(_notrace\)*/preempt_schedule\1_thunk/g' $file;
done

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/20200320115858.995685950@infradead.org
2020-03-21 16:03:53 +01:00
Brian Gerst
ffd75b373f x86: Remove unneeded includes
Clean up includes of and in <asm/syscalls.h>

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200313195144.164260-19-brgerst@gmail.com
2020-03-21 16:03:25 +01:00
Brian Gerst
0f78ff1711 x86/entry: Drop asmlinkage from syscalls
asmlinkage is no longer required since the syscall ABI is now fully under
x86 architecture control.  This makes the 32-bit native syscalls a bit more
effecient by passing in regs via EAX instead of on the stack.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200313195144.164260-18-brgerst@gmail.com
2020-03-21 16:03:25 +01:00
Brian Gerst
25c619e59b x86/entry/32: Enable pt_regs based syscalls
Enable pt_regs based syscalls for 32-bit.  This makes the 32-bit native
kernel consistent with the 64-bit kernel, and improves the syscall
interface by not needing to push all 6 potential arguments onto the stack.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Link: https://lkml.kernel.org/r/20200313195144.164260-17-brgerst@gmail.com
2020-03-21 16:03:24 +01:00
Brian Gerst
121b32a58a x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments
For the 32-bit syscall interface, 64-bit arguments (loff_t) are passed via
a pair of 32-bit registers.  These register pairs end up in consecutive stack
slots, which matches the C ABI for 64-bit arguments.  But when accessing the
registers directly from pt_regs, the wrapper needs to manually reassemble the
64-bit value.  These wrappers already exist for 32-bit compat, so make them
available to 32-bit native in preparation for enabling pt_regs-based syscalls.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Link: https://lkml.kernel.org/r/20200313195144.164260-16-brgerst@gmail.com
2020-03-21 16:03:24 +01:00
Brian Gerst
866128a996 x86/entry/32: Rename 32-bit specific syscalls
Rename the syscalls that only exist for 32-bit from x86_* to ia32_* to make it
clear they are for 32-bit only.  Also rename the functions to match the syscall
name.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Link: https://lkml.kernel.org/r/20200313195144.164260-15-brgerst@gmail.com
2020-03-21 16:03:23 +01:00
Brian Gerst
a845a6cf1d x86/entry/32: Clean up syscall_32.tbl
After removal of the __ia32_ prefix, remove compat entries that are now
identical to the native entry.

Converted with this script and fixing up whitespace:

while read nr abi name entry compat; do
    if [ "${nr:0:1}" = "#" ]; then
        echo $nr $abi $name $entry $compat
        continue
    fi
    if [ "$entry" = "$compat" ]; then
        compat=""
    fi
    echo "$nr	$abi	$name		$entry		$compat"
done

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200313195144.164260-14-brgerst@gmail.com
2020-03-21 16:03:23 +01:00
Brian Gerst
cab56d3484 x86/entry: Remove ABI prefixes from functions in syscall tables
Move the ABI prefixes to the __SYSCALL_[abi]() macros.  This allows removal
of the need to strip the prefix for UML.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200313195144.164260-13-brgerst@gmail.com
2020-03-21 16:03:23 +01:00
Brian Gerst
8210efcb15 x86/entry/64: Add __SYSCALL_COMMON()
Add a __SYSCALL_COMMON() macro to the syscall table, which simplifies syscalltbl.sh.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200313195144.164260-12-brgerst@gmail.com
2020-03-21 16:03:22 +01:00
Brian Gerst
b5592e5c0d x86/entry: Remove syscall qualifier support
Syscall qualifier support is no longer needed.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Link: https://lkml.kernel.org/r/20200313195144.164260-11-brgerst@gmail.com
2020-03-21 16:03:22 +01:00
Brian Gerst
d3b1b776ee x86/entry/64: Remove ptregs qualifier from syscall table
Now that the fast syscall path is removed, the ptregs qualifier is unused.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Link: https://lkml.kernel.org/r/20200313195144.164260-10-brgerst@gmail.com
2020-03-21 16:03:21 +01:00
Brian Gerst
0872098804 x86/entry: Move max syscall number calculation to syscallhdr.sh
Instead of using an array in asm-offsets to calculate the max syscall
number, calculate it when writing out the syscall headers.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200313195144.164260-9-brgerst@gmail.com
2020-03-21 16:03:21 +01:00
Brian Gerst
2e487c3579 x86/entry/64: Split X32 syscall table into its own file
Since X32 has its own syscall table now, move it to a separate file.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Link: https://lkml.kernel.org/r/20200313195144.164260-8-brgerst@gmail.com
2020-03-21 16:03:21 +01:00
Brian Gerst
cc42c045af x86/entry/64: Move sys_ni_syscall stub to common.c
so it can be available to multiple syscall tables.  Also directly return
-ENOSYS instead of bouncing to the generic sys_ni_syscall().

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200313195144.164260-7-brgerst@gmail.com
2020-03-21 16:03:20 +01:00
Brian Gerst
27dd84fafc x86/entry/64: Use syscall wrappers for x32_rt_sigreturn
Add missing syscall wrapper for x32_rt_sigreturn().

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200313195144.164260-6-brgerst@gmail.com
2020-03-21 16:03:20 +01:00
Brian Gerst
a74d187c2d x86/entry: Refactor SYS_NI macros
Pull the common code out from the SYS_NI macros into a new __SYS_NI macro.
Also conditionalize the X64 version in preparation for enabling syscall
wrappers on 32-bit native kernels.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200313195144.164260-5-brgerst@gmail.com
2020-03-21 16:03:20 +01:00
Brian Gerst
6cc8d2b286 x86/entry: Refactor COND_SYSCALL macros
Pull the common code out from the COND_SYSCALL macros into a new
__COND_SYSCALL macro.  Also conditionalize the X64 version in preparation
for enabling syscall wrappers on 32-bit native kernels.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200313195144.164260-4-brgerst@gmail.com
2020-03-21 16:03:19 +01:00
Brian Gerst
d2b5de495e x86/entry: Refactor SYSCALL_DEFINE0 macros
Pull the common code out from the SYSCALL_DEFINE0 macros into a new
__SYS_STUB0 macro.  Also conditionalize the X64 version in preparation for
enabling syscall wrappers on 32-bit native kernels.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200313195144.164260-3-brgerst@gmail.com
2020-03-21 16:03:19 +01:00
Brian Gerst
4399e0cf49 x86/entry: Refactor SYSCALL_DEFINEx macros
Pull the common code out from the SYSCALL_DEFINEx macros into a new
__SYS_STUBx macro.  Also conditionalize the X64 version in preparation for
enabling syscall wrappers on 32-bit native kernels.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200313195144.164260-2-brgerst@gmail.com
2020-03-21 16:03:18 +01:00
Vincenzo Frascino
abc22418db x86/vdso: Enable x86 to use common headers
Enable x86 to use only the common headers in the implementation
of the vDSO library.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200320145351.32292-24-vincenzo.frascino@arm.com
2020-03-21 15:24:02 +01:00
Vincenzo Frascino
659a9faa3f x86: Introduce asm/vdso/clocksource.h
The vDSO library should only include the necessary headers required for
a userspace library (UAPI and a minimal set of kernel headers). To make
this possible it is necessary to isolate from the kernel headers the
common parts that are strictly necessary to build the library.

Introduce asm/vdso/clocksource.h to contain all the arm64 specific
functions that are suitable for vDSO inclusion.

This header will be required by a future patch that will generalize
vdso/clocksource.h.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200320145351.32292-5-vincenzo.frascino@arm.com
2020-03-21 15:23:54 +01:00
Paolo Bonzini
2da1ed62d5 KVM: SVM: document KVM_MEM_ENCRYPT_OP, let userspace detect if SEV is available
Userspace has no way to query if SEV has been disabled with the
sev module parameter of kvm-amd.ko.  Actually it has one, but it
is a hack: do ioctl(KVM_MEM_ENCRYPT_OP, NULL) and check if it
returns EFAULT.  Make it a little nicer by returning zero for
SEV enabled and NULL argument, and while at it document the
ioctl arguments.

Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-20 13:47:52 -04:00
Paolo Bonzini
d332945418 KVM: x86: remove bogus user-triggerable WARN_ON
The WARN_ON is essentially comparing a user-provided value with 0.  It is
trivial to trigger it just by passing garbage to KVM_SET_CLOCK.  Guests
can break if you do so, but the same applies to every KVM_SET_* ioctl.
So, if it hurts when you do like this, just do not do it.

Reported-by: syzbot+00be5da1d75f1cc95f6b@syzkaller.appspotmail.com
Fixes: 9446e6fce0 ("KVM: x86: fix WARN_ON check of an unsigned less than zero")
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-20 13:43:21 -04:00
Greg Kroah-Hartman
4445eb6d94 Stable shared branch between EFI and driver tree
Stable shared branch to ease the integration of Hans's series to support
 device firmware loaded from EFI boot service memory regions.
 
 [PATCH v12 00/10] efi/firmware/platform-x86: Add EFI embedded fw support
 https://lore.kernel.org/linux-efi/20200115163554.101315-1-hdegoede@redhat.com/
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEnNKg2mrY9zMBdeK7wjcgfpV0+n0FAl5eJNoACgkQwjcgfpV0
 +n16Fwf/fXCS+xefhIeXZuUQsQexDsofHYrWlt9oS74KF6iqxVDdfSRZHZvAT/Hr
 r1pYpMFSKhRy/u8hhTz1RxwoJXiwQg+yPKwLAMvt+xx2BaNJzLFPvWX8euHYDubM
 mWfrjStgandAcNzBDBIYYdG/fSYjlzq/xWF+rlYnnhMNa6lcYhecwgxmt0iYtMnB
 S31473zE7DZE0PyV9vEEMyaEbQJYprKrIGoaVpbQ80Y2f2MDNaft+7/EGXx5Hxex
 pHZrBdkCL1v7ej7pg8bcxqid682fle5tnogzxf5jo0xMMSXnT5xVPg4OL3rY7kwD
 Ba4cLaJD4Q1fFZ1GwPfa59PrDnUIfA==
 =sj1e
 -----END PGP SIGNATURE-----

Merge tag 'stable-shared-branch-for-driver-tree' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into driver-core-next

Ard writes:

Stable shared branch between EFI and driver tree

Stable shared branch to ease the integration of Hans's series to support
device firmware loaded from EFI boot service memory regions.

[PATCH v12 00/10] efi/firmware/platform-x86: Add EFI embedded fw support
https://lore.kernel.org/linux-efi/20200115163554.101315-1-hdegoede@redhat.com/

* tag 'stable-shared-branch-for-driver-tree' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: Add embedded peripheral firmware support
  efi: Export boot-services code and data as debugfs-blobs
2020-03-20 14:50:48 +01:00
Kan Liang
3442a9ecb8 perf/x86/intel/uncore: Factor out __snr_uncore_mmio_init_box
The IMC uncore unit in Ice Lake server can only be accessed by MMIO,
which is similar as Snow Ridge.
Factor out __snr_uncore_mmio_init_box which can be shared with Ice Lake
server in the following patch.

No functional changes.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1584470314-46657-2-git-send-email-kan.liang@linux.intel.com
2020-03-20 13:06:23 +01:00
Kan Liang
bc88a2fe21 perf/x86/intel/uncore: Add box_offsets for free-running counters
The offset between uncore boxes of free-running counters varies, e.g.
IIO free-running counters on Ice Lake server.

Add box_offsets, an array of offsets between adjacent uncore boxes.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1584470314-46657-1-git-send-email-kan.liang@linux.intel.com
2020-03-20 13:06:23 +01:00
Peter Zijlstra
d8a7386897 x86/optprobe: Fix OPTPROBE vs UACCESS
While looking at an objtool UACCESS warning, it suddenly occurred to me
that it is entirely possible to have an OPTPROBE right in the middle of
an UACCESS region.

In this case we must of course clear FLAGS.AC while running the KPROBE.
Luckily the trampoline already saves/restores [ER]FLAGS, so all we need
to do is inject a CLAC. Unfortunately we cannot use ALTERNATIVE() in the
trampoline text, so we have to frob that manually.

Fixes: ca0bbc70f147 ("sched/x86_64: Don't save flags on context switch")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lkml.kernel.org/r/20200305092130.GU2596@hirez.programming.kicks-ass.net
2020-03-20 13:06:22 +01:00
Ingo Molnar
409e1a3140 Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-19 15:01:45 +01:00
Guenter Roeck
bac59d18c7 x86/setup: Fix static memory detection
When booting x86 images in qemu, the following warning is seen randomly
if DEBUG_LOCKDEP is enabled.

  WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:1119
	  lockdep_register_key+0xc0/0x100

static_obj() returns true if an address is between _stext and _end.

On x86, this includes the brk memory space. Problem is that this memory
block is not static on x86; its unused portions are released after init
and can be allocated. This results in the observed warning if a lockdep
object is allocated from this memory.

Solve the problem by implementing arch_is_kernel_initmem_freed() for
x86 and have it return true if an address is within the released memory
range.

The same problem was solved for s390 with commit

  7a5da02de8 ("locking/lockdep: check for freed initmem in static_obj()"),

which introduced arch_is_kernel_initmem_freed().

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200131021159.9178-1-linux@roeck-us.net
2020-03-19 11:58:13 +01:00
Borislav Petkov
870b4333a6 x86/ioremap: Fix CONFIG_EFI=n build
In order to use efi_mem_type(), one needs CONFIG_EFI enabled. Otherwise
that function is undefined. Use IS_ENABLED() to check and avoid the
ifdeffery as the compiler optimizes away the following unreachable code
then.

Fixes: 985e537a40 ("x86/ioremap: Map EFI runtime services data as encrypted for SEV")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/7561e981-0d9b-d62c-0ef2-ce6007aff1ab@infradead.org
2020-03-19 10:55:56 +01:00
Kim Phillips
e48667b865 perf/amd/uncore: Add support for Family 19h L3 PMU
Family 19h introduces change in slice, core and thread specification in
its L3 Performance Event Select (ChL3PmcCfg) h/w register. The change is
incompatible with Family 17h's version of the register.

Introduce a new path in l3_thread_slice_mask() to do things differently
for Family 19h vs. Family 17h, otherwise the new hardware doesn't get
programmed correctly.

Instead of a linear core--thread bitmask, Family 19h takes an encoded
core number, and a separate thread mask. There are new bits that are set
for all cores and all slices, of which only the latter is used, since
the driver counts events for all slices on behalf of the specified CPU.

Also update amd_uncore_init() to base its L2/NB vs. L3/Data Fabric mode
decision based on Family 17h or above, not just 17h and 18h: the Family
19h Data Fabric PMC is compatible with the Family 17h DF PMC.

 [ bp: Touchups. ]

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200313231024.17601-3-kim.phillips@amd.com
2020-03-17 13:01:03 +01:00
Kim Phillips
9689dbbeae perf/amd/uncore: Make L3 thread mask code more readable
Convert the l3_thread_slice_mask() function to use the more readable
topology_* helper functions, more intuitive variable names like shift
and thread_mask, and BIT_ULL().

No functional changes.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200313231024.17601-2-kim.phillips@amd.com
2020-03-17 13:00:49 +01:00
Kim Phillips
4dcc3df825 perf/amd/uncore: Prepare L3 thread mask code for Family 19h
In order to better accommodate the upcoming Family 19h, given
the 80-char line limit, move the existing code into a new
l3_thread_slice_mask() function.

No functional changes.

 [ bp: Touchups. ]

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200313231024.17601-1-kim.phillips@amd.com
2020-03-17 13:00:29 +01:00
Borislav Petkov
19d33357ec x86/amd_nb, char/amd64-agp: Use amd_nb_num() accessor
... to find whether there are northbridges present on the
system. Convert the last forgotten user and therefore, unexport
amd_nb_misc_ids[] too.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Michal Kubecek <mkubecek@suse.cz>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lkml.kernel.org/r/20200316150725.925-1-bp@alien8.de
2020-03-17 10:25:58 +01:00
Linus Torvalds
ec181b7f30 Two fixes for x86:
- Map EFI runtime service data as encrypted when SEV is enabled otherwise
     e.g. SMBIOS data cannot be properly decoded by dmidecode.
 
   - Remove the warning in the vector management code which triggered when a
     managed interrupt affinity changed outside of a CPU hotplug
     operation. The warning was correct until the recent core code change
     that introduced a CPU isolation feature which needs to migrate managed
     interrupts away from online CPUs under certain conditions to achieve the
     isolation.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl5uRi8THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoSH9EACToDM3iADmLZnP4dookJpPWvxazCio
 UclqaIUE7k2Wg/EPmE0oNTQCxqh42rTX6Ifo5WaiCJbxIFZKGMhe02BwmQffilaS
 dOlxuEEeLQq3S4Ai10Mq7wcp5uVHCE/+IhaphwFrdPn/w99O0SZf/bpZMveh6xgR
 Qw3vMLav9FXpWqvnDTw0Vcrcd9sEnZ/iaLrXVDFAnwZggrUqq26Ia4DqUlOaiHGC
 DHESmYFlHcFqfzd6BOJXbsJqedL56Qav0n7zsIqz6B34cLyc8QOqnSn2HxzncP22
 BLPVLvdLi7yqrWIoVgSefcAJq1wcE+Vl9V6mvjxMK4GieYZ91WdLKIbvqUPRZvhU
 viDzZ7NCsg6TmQBD6ilvYrMNB9ds+GNl/1dZ9c854zuvnTcnKqRq9CE6djnlqaLw
 AfHQQJ+kPjrnVyyPnyYBqrWgfsVJ3ueE8BEPtTfruL2CDQLrwiScwCNZ3qQmZ6Bx
 r00wbx+QtATHiZ97pwR1FJr1gyuZE6q3tY3gnb5ORIY19DfkwzRprKpE+Z++3N1H
 Z5Vc7A67CcQe6uCwyViJZuamNgBaXvFmbDDjt3d8N4KKnLK647WyW0XutabQppWa
 Jueq9XJX2V752y81i2Gf2+/U7xGOK0C4QajRMbqiizRBHKiG1JXpi9yCrdqNldEP
 ocz5HASe634nng==
 =KeLM
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "Two fixes for x86:

   - Map EFI runtime service data as encrypted when SEV is enabled.

     Otherwise e.g. SMBIOS data cannot be properly decoded by dmidecode.

   - Remove the warning in the vector management code which triggered
     when a managed interrupt affinity changed outside of a CPU hotplug
     operation.

     The warning was correct until the recent core code change that
     introduced a CPU isolation feature which needs to migrate managed
     interrupts away from online CPUs under certain conditions to
     achieve the isolation"

* tag 'x86-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vector: Remove warning on managed interrupt migration
  x86/ioremap: Map EFI runtime services data as encrypted for SEV
2020-03-15 12:52:56 -07:00
Linus Torvalds
e99bc917fe A pile of perf fixes:
- AMD uncore driver:
 
     Replace the open coded sanity check with the core variant, which
     provides the correct error code and also leaves a hint in dmesg
 
   - tools:
 
     - Fix the stdio input handling with glibc versions >= 2.28
 
     - Unbreak the futex-wake benchmark which was reduced to 0 test threads
       due to the conversion to cpumaps
 
     - Initialize sigaction structs before invoking sys_sigactio()
 
     - Plug the mapfile memory leak in perf jevents
 
     - Fix off by one relative directory includes
 
     - Fix an undefined string comparison in perf diff
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl5uQuETHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoVeLEAC3lJ8jRzGfETQJFyS4C+vj1r+Jglvq
 Hi7Zd8hLDAd+F/aO2/DMgHkKLqpq+sj9qjnPv0Mu/eAS2AbOC3Q4Nz1vm0mxfmyB
 D6+/t3O2t01hyCJ70g8z7HgJclYyLc+JU72F37UcMCBJNHKFUx6ZrgMOPFRwebc6
 aUgyObX5YJ7h35Bl0kYLB0z4q1Znvus3YlFxrEOF78Xldx7zjTJOBsXoDdBjcWVP
 axtvhOnI3aR8E08a+1nbOmE79qSkscneXY7pg0FVDs9/Zq+38BEOVlzDC5aRG3Rm
 4fmty+NO3zOe663kNAGTJ/UQu1fIXGn+6rZ+5lH2pdtgkdeZN6zoVNQFVZrCarhC
 9Skrgz2dZ7DQe6/VwM7Z20oChh5V9q/207Rr2w/6+hmtQ/mnriWpXODZxPevc8kN
 KYHj3Lmo63MrSWIp4Qm4U6wMC9LOGZDUojPs0zbd3prhPoRGVlivTbkQ497Rht00
 BW8TCFhKhIqQJyE72KPI1zlmb0piihCHmMUi1XtuRi+3LpGFPQGXHBAxVrT9HJuF
 1zGr9VeiY8XtHWBdYoD176aOD8wO36mABHkDo2DY7AmkyI8OefGj5EFwtnr+e1aF
 F1LRYw+IGn4kMn35NZVNiJUisGzVWGIrWGVCGlTdoKgm3hhVyoRuPKCCzV2GVXd+
 3hjvmSY9aFmrMw==
 =uJcr
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:
 "A pile of perf fixes:

  Kernel side:

   - AMD uncore driver: Replace the open coded sanity check with the
     core variant, which provides the correct error code and also leaves
     a hint in dmesg

  Tooling:

   - Fix the stdio input handling with glibc versions >= 2.28

   - Unbreak the futex-wake benchmark which was reduced to 0 test
     threads due to the conversion to cpumaps

   - Initialize sigaction structs before invoking sys_sigactio()

   - Plug the mapfile memory leak in perf jevents

   - Fix off by one relative directory includes

   - Fix an undefined string comparison in perf diff"

* tag 'perf-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/amd/uncore: Replace manual sampling check with CAP_NO_INTERRUPT flag
  tools: Fix off-by 1 relative directory includes
  perf jevents: Fix leak of mapfile memory
  perf bench: Clear struct sigaction before sigaction() syscall
  perf bench futex-wake: Restore thread count default to online CPU count
  perf top: Fix stdio interface input handling with glibc 2.28+
  perf diff: Fix undefined string comparision spotted by clang's -Wstring-compare
  perf symbols: Don't try to find a vmlinux file when looking for kernel modules
  perf bench: Share some global variables to fix build with gcc 10
  perf parse-events: Use asprintf() instead of strncpy() to read tracepoint files
  perf env: Do not return pointers to local variables
  perf tests bp_account: Make global variable static
2020-03-15 12:50:15 -07:00
Linus Torvalds
52ac3777fc Two RAS related fixes:
- Shut down the per CPU thermal throttling poll work properly when a CPU
     goes offline. The missing shutdown caused the poll work to be migrated
     to a unbound worker which triggered warnings about the usage of
     smp_processor_id() in preemptible context
 
   - Fix the PPIN feature initialization which missed to enable the
     functionality when PPIN_CTL was enabled but the MSR locked against
     updates.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl5uRHUTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoVVGD/0WEjZoB8yhwez6u0YNFhUkjfP8JFC1
 mGdWMoevyH3Tb+DQNX3cW95t2O7IxP0N6OUNnYYQ9Tlqwt6r0ptJpNnXO7CV2+Jh
 5lxpw/Uv2kQv69BNDK9qPDhiIBPzZQCg/utDTVdIyG0y+XU0q/IZqXh+XedAJsVr
 P3U7KC//NwTYnlpPWjDsG26GHSguV4kj+Lwi88nfh1DJ7eawb8AF4k965pLmOoF9
 g13EFxv2FW1/uq+QJq5ophQIH/pPI/T67rhIyLWxFsCByBzVKjm4BBgXH4gb+QIn
 OofVQcaWCpZCOq2ZTNfHWdPvJK2ziig9w+twbArb7Cb9aOgp3Oe1zbp2VD4nKu4+
 0G5E2Vdv6qRrEIk5LUTqlyOIogd5xPSufaCGF/HC/qXqBxqwWD0tUvjtYyRwwy+Y
 u90bo90zlMjUoDirgtZrjYe0bXuy3xJ+FxZ5OxovGRxLn4qqBqEJGrXYvB0LIlpd
 3x+YeHB4T2pwC6Ya5Odi6RKhwMKpro24dDMJ9jIR1u/NwIgJ2elSO9bsw6SZ823e
 /Mwns7CC/7xtjOCJXPlyj4Uw0TzwTbp1W9Kb0OqJo6q+ntvxbAhoMf32FDxg0OKC
 h4trc3FZt+e2a0l8R8e3nNeAvnS0fM1P4vtg18EcX8SqlSoALJkS3XUO4WeCFLBh
 F9jOt/LSf+4okQ==
 =9W7B
 -----END PGP SIGNATURE-----

Merge tag 'ras-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS fixes from Thomas Gleixner:
 "Two RAS related fixes:

   - Shut down the per CPU thermal throttling poll work properly when a
     CPU goes offline.

     The missing shutdown caused the poll work to be migrated to a
     unbound worker which triggered warnings about the usage of
     smp_processor_id() in preemptible context

   - Fix the PPIN feature initialization which missed to enable the
     functionality when PPIN_CTL was enabled but the MSR locked against
     updates"

* tag 'ras-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Fix logic and comments around MSR_PPIN_CTL
  x86/mce/therm_throt: Undo thermal polling properly on CPU offline
2020-03-15 12:44:23 -07:00
Linus Torvalds
6693075e0f Bugfixes, x86+s390.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJebMXnAAoJEL/70l94x66D3fYIAJ1r+o2qgzadwEqoXTvlihjB
 ujX1jOs20EJJ56VhTtXF/wZQc+7VeKCjpIqNv4WaeSYPUhzFGyL9t5tw1YdRDCwY
 u6gklxruIzZodgp+vCoTkPyyUylVmY50sY/yBIJ4F8qOaMxhTEE1aXzGuaOrYqVO
 MmIlAltEKQzdXPO1SVPD7triGPgUTj+DRxrlyRrGt2ItiMUincCz9K6TDyXFib0r
 SSCVFNYtYmzu/bV/E4/Sphi2BxCQEem5DIFWLcngzN8Wy5oCoRVzPGugT4Q9eXWt
 ZtWIDh473JGiXBLYmDq4REJsRSca+7s/YiiLSiQwYfByhIPJpVEoy54fcdaZflo=
 =T4AD
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Bugfixes for x86 and s390"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: nVMX: avoid NULL pointer dereference with incorrect EVMCS GPAs
  KVM: x86: Initializing all kvm_lapic_irq fields in ioapic_write_indirect
  KVM: VMX: Condition ENCLS-exiting enabling on CPU support for SGX1
  KVM: s390: Also reset registers in sync regs for initial cpu reset
  KVM: fix Kconfig menu text for -Werror
  KVM: x86: remove stale comment from struct x86_emulate_ctxt
  KVM: x86: clear stale x86_emulate_ctxt->intercept value
  KVM: SVM: Fix the svm vmexit code for WRMSR
  KVM: X86: Fix dereference null cpufreq policy
2020-03-14 15:45:26 -07:00
Paolo Bonzini
018cabb694 Merge branch 'kvm-null-pointer-fix' into kvm-master 2020-03-14 12:49:37 +01:00
Vitaly Kuznetsov
95fa10103d KVM: nVMX: avoid NULL pointer dereference with incorrect EVMCS GPAs
When an EVMCS enabled L1 guest on KVM will tries doing enlightened VMEnter
with EVMCS GPA = 0 the host crashes because the

evmcs_gpa != vmx->nested.hv_evmcs_vmptr

condition in nested_vmx_handle_enlightened_vmptrld() will evaluate to
false (as nested.hv_evmcs_vmptr is zeroed after init). The crash will
happen on vmx->nested.hv_evmcs pointer dereference.

Another problematic EVMCS ptr value is '-1' but it only causes host crash
after nested_release_evmcs() invocation. The problem is exactly the same as
with '0', we mistakenly think that the EVMCS pointer hasn't changed and
thus nested.hv_evmcs_vmptr is valid.

Resolve the issue by adding an additional !vmx->nested.hv_evmcs
check to nested_vmx_handle_enlightened_vmptrld(), this way we will
always be trying kvm_vcpu_map() when nested.hv_evmcs is NULL
and this is supposed to catch all invalid EVMCS GPAs.

Also, initialize hv_evmcs_vmptr to '0' in nested_release_evmcs()
to be consistent with initialization where we don't currently
set hv_evmcs_vmptr to '-1'.

Cc: stable@vger.kernel.org
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-14 12:49:27 +01:00
Nitesh Narayan Lal
0c22056f8c KVM: x86: Initializing all kvm_lapic_irq fields in ioapic_write_indirect
Previously all fields of structure kvm_lapic_irq were not initialized
before it was passed to kvm_bitmap_or_dest_vcpus(). Which will cause
an issue when any of those fields are used for processing a request.
For example not initializing the msi_redir_hint field before passing
to the kvm_bitmap_or_dest_vcpus(), may lead to a misbehavior of
kvm_apic_map_get_dest_lapic(). This will specifically happen when the
kvm_lowest_prio_delivery() returns TRUE due to a non-zero garbage
value of msi_redir_hint, which should not happen as the request belongs
to APIC fixed delivery mode and we do not want to deliver the
interrupt only to the lowest priority candidate.

This patch initializes all the fields of kvm_lapic_irq based on the
values of ioapic redirect_entry object before passing it on to
kvm_bitmap_or_dest_vcpus().

Fixes: 7ee30bc132 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs")
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
[Set level to false since the value doesn't really matter. Suggested
 by Vitaly Kuznetsov. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-14 10:46:01 +01:00
Jan Engelhardt
ecb9c79099 acpi/x86: ignore unspecified bit positions in the ACPI global lock field
The value in "new" is constructed from "old" such that all bits defined
as reserved by the ACPI spec[1] are left untouched. But if those bits
do not happen to be all zero, "new < 3" will not evaluate to true.

The firmware of the laptop(s) Medion MD63490 / Akoya P15648 comes with
garbage inside the "FACS" ACPI table. The starting value is
old=0x4944454d, therefore new=0x4944454e, which is >= 3. Mask off
the reserved bits.

[1] https://uefi.org/sites/default/files/resources/ACPI_6_2.pdf

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206553
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-03-14 10:41:56 +01:00
Alex Hung
1ffb8d032d acpi/x86: add a kernel parameter to disable ACPI BGRT
BGRT is for displaying seamless OEM logo from booting to login screen;
however, this mechanism does not always work well on all configurations
and the OEM logo can be displayed multiple times. This looks worse than
without BGRT enabled.

This patch adds a kernel parameter to disable BGRT in boot time. This is
easier than re-compiling a kernel with CONFIG_ACPI_BGRT disabled.

Signed-off-by: Alex Hung <alex.hung@canonical.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-03-14 10:36:49 +01:00
Sean Christopherson
7a57c09bb1 KVM: VMX: Condition ENCLS-exiting enabling on CPU support for SGX1
Enable ENCLS-exiting (and thus set vmcs.ENCLS_EXITING_BITMAP) only if
the CPU supports SGX1.  Per Intel's SDM, all ENCLS leafs #UD if SGX1
is not supported[*], i.e. intercepting ENCLS to inject a #UD is
unnecessary.

Avoiding ENCLS-exiting even when it is reported as supported by the CPU
works around a reported issue where SGX is "hard" disabled after an S3
suspend/resume cycle, i.e. CPUID.0x7.SGX=0 and the VMCS field/control
are enumerated as unsupported.  While the root cause of the S3 issue is
unknown, it's definitely _not_ a KVM (or kernel) bug, i.e. this is a
workaround for what is most likely a hardware or firmware issue.  As a
bonus side effect, KVM saves a VMWRITE when first preparing vmcs01 and
vmcs02.

Note, SGX must be disabled in BIOS to take advantage of this workaround

[*] The additional ENCLS CPUID check on SGX1 exists so that SGX can be
    globally "soft" disabled post-reset, e.g. if #MC bits in MCi_CTL are
    cleared.  Soft disabled meaning disabling SGX without clearing the
    primary CPUID bit (in leaf 0x7) and without poking into non-SGX
    CPU paths, e.g. for the VMCS controls.

Fixes: 0b665d3040 ("KVM: vmx: Inject #UD for SGX ENCLS instruction in guest")
Reported-by: Toni Spets <toni.spets@iki.fi>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-14 10:34:51 +01:00
Alexey Dobriyan
fa0fca68e1 x86/acpi: make "asmlinkage" part first thing in the function definition
g++ insists that function declaration must start with extern "C"
(which asmlinkage expands to).

gcc doesn't care.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-03-14 10:29:07 +01:00
David S. Miller
242a6df688 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2020-03-12

The following pull-request contains BPF updates for your *net* tree.

We've added 12 non-merge commits during the last 8 day(s) which contain
a total of 12 files changed, 161 insertions(+), 15 deletions(-).

The main changes are:

1) Andrii fixed two bugs in cgroup-bpf.

2) John fixed sockmap.

3) Luke fixed x32 jit.

4) Martin fixed two issues in struct_ops.

5) Yonghong fixed bpf_send_signal.

6) Yoshiki fixed BTF enum.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-13 11:13:45 -07:00
Peter Xu
469ff207b4 x86/vector: Remove warning on managed interrupt migration
The vector management code assumes that managed interrupts cannot be
migrated away from an online CPU. free_moved_vector() has a WARN_ON_ONCE()
which triggers when a managed interrupt vector association on a online CPU
is cleared. The CPU offline code uses a different mechanism which cannot
trigger this.

This assumption is not longer correct because the new CPU isolation feature
which affects the placement of managed interrupts must be able to move a
managed interrupt away from an online CPU.

There are two reasons why this can happen:

  1) When the interrupt is activated the affinity mask which was
     established in irq_create_affinity_masks() is handed in to
     the vector allocation code. This mask contains all CPUs to which
     the interrupt can be made affine to, but this does not take the
     CPU isolation 'managed_irq' mask into account.

     When the interrupt is finally requested by the device driver then the
     affinity is checked again and the CPU isolation 'managed_irq' mask is
     taken into account, which moves the interrupt to a non-isolated CPU if
     possible.

  2) The interrupt can be affine to an isolated CPU because the
     non-isolated CPUs in the calculated affinity mask are not online.

     Once a non-isolated CPU which is in the mask comes online the
     interrupt is migrated to this non-isolated CPU

In both cases the regular online migration mechanism is used which triggers
the WARN_ON_ONCE() in free_moved_vector().

Case #1 could have been addressed by taking the isolation mask into
account, but that would require a massive code change in the activation
logic and the eventual migration event was accepted as a reasonable
tradeoff when the isolation feature was developed. But even if #1 would be
addressed, #2 would still trigger it.

Of course the warning in free_moved_vector() was overlooked at that time
and the above two cases which have been discussed during patch review have
obviously never been tested before the final submission.

So keep it simple and remove the warning.

[ tglx: Rewrote changelog and added a comment to free_moved_vector() ]

Fixes: 11ea68f553 ("genirq, sched/isolation: Isolate from handling managed interrupts")
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>                                                                                                                                                                       
Link: https://lkml.kernel.org/r/20200312205830.81796-1-peterx@redhat.com
2020-03-13 15:29:26 +01:00
Linus Torvalds
2644bc8569 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
 "Fix a build problem with x86/curve25519"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: x86/curve25519 - support assemblers with no adx support
2020-03-12 09:25:55 -07:00
Kim Phillips
f967140dfb perf/amd/uncore: Replace manual sampling check with CAP_NO_INTERRUPT flag
Enable the sampling check in kernel/events/core.c::perf_event_open(),
which returns the more appropriate -EOPNOTSUPP.

BEFORE:

  $ sudo perf record -a -e instructions,l3_request_g1.caching_l3_cache_accesses true
  Error:
  The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (l3_request_g1.caching_l3_cache_accesses).
  /bin/dmesg | grep -i perf may provide additional information.

With nothing relevant in dmesg.

AFTER:

  $ sudo perf record -a -e instructions,l3_request_g1.caching_l3_cache_accesses true
  Error:
  l3_request_g1.caching_l3_cache_accesses: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'

Fixes: c43ca5091a ("perf/x86/amd: Add support for AMD NB and L2I "uncore" counters")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200311191323.13124-1-kim.phillips@amd.com
2020-03-12 14:08:50 +01:00
Kim Phillips
753039ef8b x86/cpu/amd: Call init_amd_zn() om Family 19h processors too
Family 19h CPUs are Zen-based and still share most architectural
features with Family 17h CPUs, and therefore still need to call
init_amd_zn() e.g., to set the RECLAIM_DISTANCE override.

init_amd_zn() also sets X86_FEATURE_ZEN, which today is only used
in amd_set_core_ssb_state(), which isn't called on some late
model Family 17h CPUs, nor on any Family 19h CPUs:
X86_FEATURE_AMD_SSBD replaces X86_FEATURE_LS_CFG_SSBD on those
later model CPUs, where the SSBD mitigation is done via the
SPEC_CTRL MSR instead of the LS_CFG MSR.

Family 19h CPUs also don't have the erratum where the CPB feature
bit isn't set, but that code can stay unchanged and run safely
on Family 19h.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200311191451.13221-1-kim.phillips@amd.com
2020-03-12 12:13:44 +01:00
Hans de Goede
fac01d1172 x86/tsc_msr: Make MSR derived TSC frequency more accurate
The "Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 4:
Model-Specific Registers" has the following table for the values from
freq_desc_byt:

   000B: 083.3 MHz
   001B: 100.0 MHz
   010B: 133.3 MHz
   011B: 116.7 MHz
   100B: 080.0 MHz

Notice how for e.g the 83.3 MHz value there are 3 significant digits, which
translates to an accuracy of a 1000 ppm, where as a typical crystal
oscillator is 20 - 100 ppm, so the accuracy of the frequency format used in
the Software Developer’s Manual is not really helpful.

As far as we know Bay Trail SoCs use a 25 MHz crystal and Cherry Trail
uses a 19.2 MHz crystal, the crystal is the source clock for a root PLL
which outputs 1600 and 100 MHz. It is unclear if the root PLL outputs are
used directly by the CPU clock PLL or if there is another PLL in between.

This does not matter though, we can model the chain of PLLs as a single PLL
with a quotient equal to the quotients of all PLLs in the chain multiplied.

So we can create a simplified model of the CPU clock setup using a
reference clock of 100 MHz plus a quotient which gets us as close to the
frequency from the SDM as possible.

For the 83.3 MHz example from above this would give 100 MHz * 5 / 6 = 83
and 1/3 MHz, which matches exactly what has been measured on actual
hardware.

Use a simplified PLL model with a reference clock of 100 MHz for all Bay
and Cherry Trail models.

This has been tested on the following models:

              CPU freq before:        CPU freq after:
Intel N2840   2165.800 MHz            2166.667 MHz
Intel Z3736   1332.800 MHz            1333.333 MHz
Intel Z3775   1466.300 MHz            1466.667 MHz
Intel Z8350   1440.000 MHz            1440.000 MHz
Intel Z8750   1600.000 MHz            1600.000 MHz

This fixes the time drifting by about 1 second per hour (20 - 30 seconds
per day) on (some) devices which rely on the tsc_msr.c code to determine
the TSC frequency.

Reported-by: Vipul Kumar <vipulk0511@gmail.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200223140610.59612-3-hdegoede@redhat.com
2020-03-11 22:57:40 +01:00
Hans de Goede
c8810e2ffc x86/tsc_msr: Fix MSR_FSB_FREQ mask for Cherry Trail devices
According to the "Intel 64 and IA-32 Architectures Software Developer's
Manual Volume 4: Model-Specific Registers" on Cherry Trail (Airmont)
devices the 4 lowest bits of the MSR_FSB_FREQ mask indicate the bus freq
unlike on e.g. Bay Trail where only the lowest 3 bits are used.

This is also the reason why MAX_NUM_FREQS is defined as 9, since Cherry
Trail SoCs have 9 possible frequencies, so the lo value from the MSR needs
to be masked with 0x0f, not with 0x07 otherwise the 9th frequency will get
interpreted as the 1st.

Bump MAX_NUM_FREQS to 16 to avoid any possibility of addressing the array
out of bounds and makes the mask part of the cpufreq struct so it can be
set it per model.

While at it also log an error when the index points to an uninitialized
part of the freqs lookup-table.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200223140610.59612-2-hdegoede@redhat.com
2020-03-11 22:57:39 +01:00
Hans de Goede
812c2d7506 x86/tsc_msr: Use named struct initializers
Use named struct initializers for the freq_desc struct-s initialization
and change the "u8 msr_plat" to a "bool use_msr_plat" to make its meaning
more clear instead of relying on a comment to explain it.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200223140610.59612-1-hdegoede@redhat.com
2020-03-11 22:57:39 +01:00
Hans de Goede
17e5888e4e x86: Select HARDIRQS_SW_RESEND on x86
Modern x86 laptops are starting to use GPIO pins as interrupts more
and more, e.g. touchpads and touchscreens have almost all moved away
from PS/2 and USB to using I2C with a GPIO pin as interrupt.
Modern x86 laptops also have almost all moved to using s2idle instead
of using the system S3 ACPI power state to suspend.

The Intel and AMD pinctrl drivers do not define irq_retrigger handlers
for the irqchips they register, this is causing edge triggered interrupts
which happen while suspended using s2idle to get lost.

One specific example of this is the lid switch on some devices, lid
switches used to be handled by the embedded-controller, but now the
lid open/closed sensor is sometimes directly connected to a GPIO pin.
On most devices the ACPI code for this looks like this:

Method (_E00, ...) {
	Notify (LID0, 0x80) // Status Change
}

Where _E00 is an ACPI event handler for changes on both edges of the GPIO
connected to the lid sensor, this event handler is then combined with an
_LID method which directly reads the pin. When the device is resumed by
opening the lid, the GPIO interrupt will wake the system, but because the
pinctrl irqchip doesn't have an irq_retrigger handler, the Notify will not
happen. This is not a problem in the case the _LID method directly reads
the GPIO, because the drivers/acpi/button.c code will call _LID on resume
anyways.

But some devices have an event handler for the GPIO connected to the
lid sensor which looks like this:

Method (_E00, ...) {
	if (LID_GPIO == One)
		LIDS = One
	else
		LIDS = Zero
	Notify (LID0, 0x80) // Status Change
}

And the _LID method returns the cached LIDS value, since on open we
do not re-run the edge-interrupt handler when we re-enable IRQS on resume
(because of the missing irq_retrigger handler), _LID now will keep
reporting closed, as LIDS was never changed to reflect the open status,
this causes userspace to re-resume the laptop again shortly after opening
the lid.

The Intel GPIO controllers do not allow implementing irq_retrigger without
emulating it in software, at which point we are better of just using the
generic HARDIRQS_SW_RESEND mechanism rather then re-implementing software
emulation for this separately in aprox. 14 different pinctrl drivers.

Select HARDIRQS_SW_RESEND to solve the problem of edge-triggered GPIO
interrupts not being re-triggered on resume when they were triggered during
suspend (s2idle) and/or when they were the cause of the wakeup.

This requires

 008f1d60fe ("x86/apic/vector: Force interupt handler invocation to irq context")
 c16816acd0 ("genirq: Add protection against unsafe usage of generic_handle_irq()")

to protect the APIC based interrupts from being wreckaged by a software
resend.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200123210242.53367-1-hdegoede@redhat.com
2020-03-11 22:39:39 +01:00
Tom Lendacky
985e537a40 x86/ioremap: Map EFI runtime services data as encrypted for SEV
The dmidecode program fails to properly decode the SMBIOS data supplied
by OVMF/UEFI when running in an SEV guest. The SMBIOS area, under SEV, is
encrypted and resides in reserved memory that is marked as EFI runtime
services data.

As a result, when memremap() is attempted for the SMBIOS data, it
can't be mapped as regular RAM (through try_ram_remap()) and, since
the address isn't part of the iomem resources list, it isn't mapped
encrypted through the fallback ioremap().

Add a new __ioremap_check_other() to deal with memory types like
EFI_RUNTIME_SERVICES_DATA which are not covered by the resource ranges.

This allows any runtime services data which has been created encrypted,
to be mapped encrypted too.

 [ bp: Move functionality to a separate function. ]

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: <stable@vger.kernel.org> # 5.3
Link: https://lkml.kernel.org/r/2d9e16eb5b53dc82665c95c6764b7407719df7a0.1582645327.git.thomas.lendacky@amd.com
2020-03-11 15:54:54 +01:00
Thomas Gleixner
810f80a61b x86/entry/64: Trace irqflags unconditionally as ON when returning to user space
User space cannot disable interrupts any longer so trace return to user space
unconditionally as IRQS_ON.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Link: https://lkml.kernel.org/r/20200308222609.314596327@linutronix.de
2020-03-10 13:56:32 +01:00
Thomas Gleixner
74a4882d72 x86/entry/32: Remove unused label restore_nocheck
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Link: https://lkml.kernel.org/r/20200308222609.219366430@linutronix.de
2020-03-10 13:56:32 +01:00
Tony Luck
d8ecca4043 x86/mce/dev-mcelog: Dynamically allocate space for machine check records
We have had a hard coded limit of 32 machine check records since the
dawn of time.  But as numbers of cores increase, it is possible for
more than 32 errors to be reported before a user process reads from
/dev/mcelog. In this case the additional errors are lost.

Keep 32 as the minimum. But tune the maximum value up based on the
number of processors.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200218184408.GA23048@agluck-desk2.amr.corp.intel.com
2020-03-10 10:25:14 +01:00
Tony W Wang-oc
bdb04a1abb x86/Kconfig: Drop vendor dependency for X86_UMIP
Some Centaur family 7 CPUs and Zhaoxin family 7 CPUs support the UMIP
feature too. The text size growth which UMIP adds is ~1K and distro
kernels enable it anyway so remove the vendor dependency.

 [ bp: Rewrite commit message. ]

Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/1583733990-2587-1-git-send-email-TonyWWang-oc@zhaoxin.com
2020-03-10 10:10:53 +01:00
Thomas Gleixner
008f1d60fe x86/apic/vector: Force interupt handler invocation to irq context
Sathyanarayanan reported that the PCI-E AER error injection mechanism
can result in a NULL pointer dereference in apic_ack_edge():

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
 RIP: 0010:apic_ack_edge+0x1e/0x40
 Call Trace:
   handle_edge_irq+0x7d/0x1e0
   generic_handle_irq+0x27/0x30
   aer_inject_write+0x53a/0x720

It crashes in irq_complete_move() which dereferences get_irq_regs() which
is obviously NULL when this is called from non interrupt context.

Of course the pointer could be checked, but that just papers over the real
issue. Invoking the low level interrupt handling mechanism from random code
can wreckage the fragile interrupt affinity mechanism of x86 as interrupts
can only be moved in interrupt context or with special care when a CPU goes
offline and the move has to be enforced.

In the best case this triggers the warning in the MSI affinity setter, but
if the call happens on the correct CPU it just corrupts state and might
prevent further interrupt delivery for the affected device.

Mark the APIC interrupts as unsuitable for being invoked in random contexts.

This prevents the AER injection from proliferating the wreckage, but that's
less broken than the current state of affairs and more correct than just
papering over the problem by sprinkling random checks all over the place
and silently corrupting state.

Reported-by: sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200306130623.684591280@linutronix.de
2020-03-08 11:06:40 +01:00
Ard Biesheuvel
57648adb31 efi/x86: Preserve %ebx correctly in efi_set_virtual_address_map()
Commit:

  59f2a619a2 ("efi: Add 'runtime' pointer to struct efi")

modified the assembler routine called by efi_set_virtual_address_map(),
to grab the 'runtime' EFI service pointer while running with paging
disabled (which is tricky to do in C code)

After the change, register %ebx is not restored correctly, resulting
in all kinds of weird behavior, so fix that.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200304133515.15035-1-ardb@kernel.org
Link: https://lore.kernel.org/r/20200308080859.21568-22-ardb@kernel.org
2020-03-08 09:58:23 +01:00
Arvind Sankar
d5cdf4cfea efi/x86: Don't relocate the kernel unless necessary
Add alignment slack to the PE image size, so that we can realign the
decompression buffer within the space allocated for the image.

Only relocate the kernel if it has been loaded at an unsuitable address:

 - Below LOAD_PHYSICAL_ADDR, or
 - Above 64T for 64-bit and 512MiB for 32-bit

For 32-bit, the upper limit is conservative, but the exact limit can be
difficult to calculate.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200303221205.4048668-6-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-20-ardb@kernel.org
2020-03-08 09:58:22 +01:00
Arvind Sankar
964124a97b efi/x86: Remove extra headroom for setup block
The following commit:

  223e3ee56f ("efi/x86: add headroom to decompressor BSS to account for setup block")

added headroom to the PE image to account for the setup block, which
wasn't used for the decompression buffer.

Now that the decompression buffer is located at the start of the image,
and includes the setup block, this is no longer required.

Add a check to make sure that the head section of the compressed kernel
won't overwrite itself while relocating. This is only for
future-proofing as with current limits on the setup and the actual size
of the head section, this can never happen.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200303221205.4048668-5-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-19-ardb@kernel.org
2020-03-08 09:58:21 +01:00
Arvind Sankar
26725192c4 efi/x86: Add kernel preferred address to PE header
Store the kernel's link address as ImageBase in the PE header. Note that
the PE specification requires the ImageBase to be 64k aligned. The
preferred address should almost always satisfy that, except for 32-bit
kernel if the configuration has been customized.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200303221205.4048668-4-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-18-ardb@kernel.org
2020-03-08 09:58:20 +01:00
Arvind Sankar
1887c9b653 efi/x86: Decompress at start of PE image load address
When booted via PE loader, define image_offset to hold the offset of
startup_32() from the start of the PE image, and use it as the start of
the decompression buffer.

[ mingo: Fixed the grammar in the comments. ]

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200303221205.4048668-3-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-17-ardb@kernel.org
2020-03-08 09:58:19 +01:00
Arvind Sankar
8ef44be393 x86/boot/compressed/32: Save the output address instead of recalculating it
In preparation for being able to decompress into a buffer starting at a
different address than startup_32, save the calculated output address
instead of recalculating it later.

We now keep track of three addresses:

	%edx: startup_32 as we were loaded by bootloader
	%ebx: new location of compressed kernel
	%ebp: start of decompression buffer

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200303221205.4048668-2-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-16-ardb@kernel.org
2020-03-08 09:58:19 +01:00
Arvind Sankar
81a34892c2 x86/boot: Use unsigned comparison for addresses
The load address is compared with LOAD_PHYSICAL_ADDR using a signed
comparison currently (using jge instruction).

When loading a 64-bit kernel using the new efi32_pe_entry() point added by:

  97aa276579 ("efi/x86: Add true mixed mode entry point into .compat section")

using Qemu with -m 3072, the firmware actually loads us above 2Gb,
resulting in a very early crash.

Use the JAE instruction to perform a unsigned comparison instead, as physical
addresses should be considered unsigned.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200301230436.2246909-6-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-14-ardb@kernel.org
2020-03-08 09:58:17 +01:00
Arvind Sankar
8acf63efa1 efi/x86: Avoid using code32_start
code32_start is meant for 16-bit real-mode bootloaders to inform the
kernel where the 32-bit protected mode code starts. Nothing in the
protected mode kernel except the EFI stub uses it.

efi_main() currently returns boot_params, with code32_start set inside it
to tell efi_stub_entry() where startup_32 is located. Since it was invoked
by efi_stub_entry() in the first place, boot_params is already known.
Return the address of startup_32 instead.

This will allow a 64-bit kernel to live above 4Gb, for example, and it's
cleaner as well.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200301230436.2246909-5-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-13-ardb@kernel.org
2020-03-08 09:58:17 +01:00
Arvind Sankar
3fab43318f efi/x86: Make efi32_pe_entry() more readable
Set up a proper frame pointer in efi32_pe_entry() so that it's easier to
calculate offsets for arguments.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200301230436.2246909-4-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-12-ardb@kernel.org
2020-03-08 09:58:16 +01:00
Arvind Sankar
71ff44ac6c efi/x86: Respect 32-bit ABI in efi32_pe_entry()
verify_cpu() clobbers BX and DI. In case we have to return error, we need
to preserve them to respect the 32-bit calling convention.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200301230436.2246909-3-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-11-ardb@kernel.org
2020-03-08 09:58:15 +01:00
Arvind Sankar
3cdcd6899e efi/x86: Annotate the LOADED_IMAGE_PROTOCOL_GUID with SYM_DATA
Use SYM_DATA*() macros to annotate this constant, and explicitly align it
to 4-byte boundary. Use lower-case for hexadecimal data.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200301230436.2246909-2-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-10-ardb@kernel.org
2020-03-08 09:58:14 +01:00
Ingo Molnar
6120681bdf Merge branch 'efi/urgent' into efi/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-08 09:57:58 +01:00
Ingo Molnar
3be5f0d286 Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/core
More EFI updates for v5.7

 - Incorporate a stable branch with the EFI pieces of Hans's work on
   loading device firmware from EFI boot service memory regions

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-08 09:23:36 +01:00
Luke Nelson
80f1f85036 bpf, x32: Fix bug with JMP32 JSET BPF_X checking upper bits
The current x32 BPF JIT is incorrect for JMP32 JSET BPF_X when the upper
32 bits of operand registers are non-zero in certain situations.

The problem is in the following code:

  case BPF_JMP | BPF_JSET | BPF_X:
  case BPF_JMP32 | BPF_JSET | BPF_X:
  ...

  /* and dreg_lo,sreg_lo */
  EMIT2(0x23, add_2reg(0xC0, sreg_lo, dreg_lo));
  /* and dreg_hi,sreg_hi */
  EMIT2(0x23, add_2reg(0xC0, sreg_hi, dreg_hi));
  /* or dreg_lo,dreg_hi */
  EMIT2(0x09, add_2reg(0xC0, dreg_lo, dreg_hi));

This code checks the upper bits of the operand registers regardless if
the BPF instruction is BPF_JMP32 or BPF_JMP64. Registers dreg_hi and
dreg_lo are not loaded from the stack for BPF_JMP32, however, they can
still be polluted with values from previous instructions.

The following BPF program demonstrates the bug. The jset64 instruction
loads the temporary registers and performs the jump, since ((u64)r7 &
(u64)r8) is non-zero. The jset32 should _not_ be taken, as the lower
32 bits are all zero, however, the current JIT will take the branch due
the pollution of temporary registers from the earlier jset64.

  mov64    r0, 0
  ld64     r7, 0x8000000000000000
  ld64     r8, 0x8000000000000000
  jset64   r7, r8, 1
  exit
  jset32   r7, r8, 1
  mov64    r0, 2
  exit

The expected return value of this program is 2; under the buggy x32 JIT
it returns 0. The fix is to skip using the upper 32 bits for jset32 and
compare the upper 32 bits for jset64 only.

All tests in test_bpf.ko and selftests/bpf/test_verifier continue to
pass with this change.

We found this bug using our automated verification tool, Serval.

Fixes: 69f827eb6e ("x32: bpf: implement jitting of JMP32")
Co-developed-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200305234416.31597-1-luke.r.nels@gmail.com
2020-03-06 14:17:39 +01:00
Ingo Molnar
1b10d388d0 Merge branch 'linus' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-06 12:49:56 +01:00
Ingo Molnar
1941011a8b Merge branch 'perf/urgent' into perf/core, to pick up the latest fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-06 11:56:40 +01:00
Jason A. Donenfeld
a754acc3e4 KVM: fix Kconfig menu text for -Werror
This was evidently copy and pasted from the i915 driver, but the text
wasn't updated.

Fixes: 4f337faf1c ("KVM: allow disabling -Werror")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-05 15:27:43 +01:00
Jason A. Donenfeld
1579f1bc3b crypto: x86/curve25519 - support assemblers with no adx support
Some older version of GAS do not support the ADX instructions, similarly
to how they also don't support AVX and such. This commit adds the same
build-time detection mechanisms we use for AVX and others for ADX, and
then makes sure that the curve25519 library dispatcher calls the right
functions.

Reported-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-03-05 18:28:09 +11:00
Vitaly Kuznetsov
d718fdc3e7 KVM: x86: remove stale comment from struct x86_emulate_ctxt
Commit c44b4c6ab8 ("KVM: emulate: clean up initializations in
init_decode_cache") did some field shuffling and instead of
[opcode_len, _regs) started clearing [has_seg_override, modrm).
The comment about clearing fields altogether is not true anymore.

Fixes: c44b4c6ab8 ("KVM: emulate: clean up initializations in init_decode_cache")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-03 17:38:22 +01:00
Vitaly Kuznetsov
342993f96a KVM: x86: clear stale x86_emulate_ctxt->intercept value
After commit 07721feee4 ("KVM: nVMX: Don't emulate instructions in guest
mode") Hyper-V guests on KVM stopped booting with:

 kvm_nested_vmexit:    rip fffff802987d6169 reason EPT_VIOLATION info1 181
    info2 0 int_info 0 int_info_err 0
 kvm_page_fault:       address febd0000 error_code 181
 kvm_emulate_insn:     0:fffff802987d6169: f3 a5
 kvm_emulate_insn:     0:fffff802987d6169: f3 a5 FAIL
 kvm_inj_exception:    #UD (0x0)

"f3 a5" is a "rep movsw" instruction, which should not be intercepted
at all.  Commit c44b4c6ab8 ("KVM: emulate: clean up initializations in
init_decode_cache") reduced the number of fields cleared by
init_decode_cache() claiming that they are being cleared elsewhere,
'intercept', however, is left uncleared if the instruction does not have
any of the "slow path" flags (NotImpl, Stack, Op3264, Sse, Mmx, CheckPerm,
NearBranch, No16 and of course Intercept itself).

Fixes: c44b4c6ab8 ("KVM: emulate: clean up initializations in init_decode_cache")
Fixes: 07721feee4 ("KVM: nVMX: Don't emulate instructions in guest mode")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-03 17:38:16 +01:00
Hans de Goede
f0df68d5ba efi: Add embedded peripheral firmware support
Just like with PCI options ROMs, which we save in the setup_efi_pci*
functions from arch/x86/boot/compressed/eboot.c, the EFI code / ROM itself
sometimes may contain data which is useful/necessary for peripheral drivers
to have access to.

Specifically the EFI code may contain an embedded copy of firmware which
needs to be (re)loaded into the peripheral. Normally such firmware would be
part of linux-firmware, but in some cases this is not feasible, for 2
reasons:

1) The firmware is customized for a specific use-case of the chipset / use
with a specific hardware model, so we cannot have a single firmware file
for the chipset. E.g. touchscreen controller firmwares are compiled
specifically for the hardware model they are used with, as they are
calibrated for a specific model digitizer.

2) Despite repeated attempts we have failed to get permission to
redistribute the firmware. This is especially a problem with customized
firmwares, these get created by the chip vendor for a specific ODM and the
copyright may partially belong with the ODM, so the chip vendor cannot
give a blanket permission to distribute these.

This commit adds support for finding peripheral firmware embedded in the
EFI code and makes the found firmware available through the new
efi_get_embedded_fw() function.

Support for loading these firmwares through the standard firmware loading
mechanism is added in a follow-up commit in this patch-series.

Note we check the EFI_BOOT_SERVICES_CODE for embedded firmware near the end
of start_kernel(), just before calling rest_init(), this is on purpose
because the typical EFI_BOOT_SERVICES_CODE memory-segment is too large for
early_memremap(), so the check must be done after mm_init(). This relies
on EFI_BOOT_SERVICES_CODE not being free-ed until efi_free_boot_services()
is called, which means that this will only work on x86 for now.

Reported-by: Dave Olsthoorn <dave@bewaar.me>
Suggested-by: Peter Jones <pjones@redhat.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20200115163554.101315-3-hdegoede@redhat.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-03-03 10:28:00 +01:00
Hans de Goede
0e72a6a3cf efi: Export boot-services code and data as debugfs-blobs
Sometimes it is useful to be able to dump the efi boot-services code and
data. This commit adds these as debugfs-blobs to /sys/kernel/debug/efi,
but only if efi=debug is passed on the kernel-commandline as this requires
not freeing those memory-regions, which costs 20+ MB of RAM.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20200115163554.101315-2-hdegoede@redhat.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-03-03 10:27:30 +01:00
Haiwei Li
aaca21007b KVM: SVM: Fix the svm vmexit code for WRMSR
In svm, exit_code for MSR writes is not EXIT_REASON_MSR_WRITE which
belongs to vmx.

According to amd manual, SVM_EXIT_MSR(7ch) is the exit_code of VMEXIT_MSR
due to RDMSR or WRMSR access to protected MSR. Additionally, the processor
indicates in the VMCB's EXITINFO1 whether a RDMSR(EXITINFO1=0) or
WRMSR(EXITINFO1=1) was intercepted.

Signed-off-by: Haiwei Li <lihaiwei@tencent.com>
Fixes: 1e9e2622a1 ("KVM: VMX: FIXED+PHYSICAL mode single target IPI fastpath", 2019-11-21)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-02 17:06:52 +01:00
Wanpeng Li
9a11997e75 KVM: X86: Fix dereference null cpufreq policy
Naresh Kamboju reported:

   Linux version 5.6.0-rc4 (oe-user@oe-host) (gcc version
  (GCC)) #1 SMP Sun Mar 1 22:59:08 UTC 2020
   kvm: no hardware support
   BUG: kernel NULL pointer dereference, address: 000000000000028c
   #PF: supervisor read access in kernel mode
   #PF: error_code(0x0000) - not-present page
   PGD 0 P4D 0
   Oops: 0000 [#1] SMP NOPTI
   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4 #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
  04/01/2014
   RIP: 0010:kobject_put+0x12/0x1c0
   Call Trace:
    cpufreq_cpu_put+0x15/0x20
    kvm_arch_init+0x1f6/0x2b0
    kvm_init+0x31/0x290
    ? svm_check_processor_compat+0xd/0xd
    ? svm_check_processor_compat+0xd/0xd
    svm_init+0x21/0x23
    do_one_initcall+0x61/0x2f0
    ? rdinit_setup+0x30/0x30
    ? rcu_read_lock_sched_held+0x4f/0x80
    kernel_init_freeable+0x219/0x279
    ? rest_init+0x250/0x250
    kernel_init+0xe/0x110
    ret_from_fork+0x27/0x50
   Modules linked in:
   CR2: 000000000000028c
   ---[ end trace 239abf40c55c409b ]---
   RIP: 0010:kobject_put+0x12/0x1c0

cpufreq policy which is get by cpufreq_cpu_get() can be NULL if it is failure,
this patch takes care of it.

Fixes: aaec7c03de (KVM: x86: avoid useless copy of cpufreq policy)
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-02 17:06:52 +01:00
Linus Torvalds
2873dc2547 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes: a pkeys fix for a bug that triggers with weird BIOS
  settings, and two Xen PV fixes: a paravirt interface fix, and
  pagetable dumping fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Fix dump_pagetables with Xen PV
  x86/ioperm: Add new paravirt function update_io_bitmap()
  x86/pkeys: Manually set X86_FEATURE_OSPKE to preserve existing changes
2020-03-02 06:54:54 -06:00
Linus Torvalds
e130a920f6 Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
 "Three fixes to EFI mixed boot mode, mostly related to x86-64 vmap
  stacks activated years ago, bug-fixed recently for EFI, which had
  knock-on effects of various 1:1 mapping assumptions in mixed mode.

  There's also a READ_ONCE() fix for reading an mmap-ed EFI firmware
  data field only once, out of caution"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi: READ_ONCE rng seed size before munmap
  efi/x86: Handle by-ref arguments covering multiple pages in mixed mode
  efi/x86: Remove support for EFI time and counter services in mixed mode
  efi/x86: Align GUIDs to their size in the mixed mode runtime wrapper
2020-03-02 06:41:41 -06:00
Linus Torvalds
f853ed90e2 More bugfixes, including a few remaining "make W=1" issues such
as too large frame sizes on some configurations.  On the
 ARM side, the compiler was messing up shadow stacks between
 EL1 and EL2 code, which is easily fixed with __always_inline.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJeXAT4AAoJEL/70l94x66DWywH/1kv4MmeGo6PI0Nxk/yvA7X8
 78iqIBchtxZX0v/9kqpTB7bYmHyTgmZHM+IkwtIUANDSaOvWqJwU+TLUfduOiuXF
 NxBHcZDyuMoftX5CSQ+bJ5PwxKijAdJsIkCZ13CnsTCkwcfamSGypFUCK8LacPeq
 WHvV5Ws5pFc51xrP3CH1DrRhLoulaBmt5xxqK9fxWtslrlsnm1uNza5vs8As8CzM
 apnmdRIf5p4v91Zic3PFH7/GXES0m1tjIBKdtZ4YHb8yrXV/kBsEVhhTjqE9mrUq
 qtRRl5waOFoP4yc9ey52PAbMm1x1Ho/pyunpM0xh40Yq8OPFwqXBPTnWfobSoiM=
 =LNQc
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "More bugfixes, including a few remaining "make W=1" issues such as too
  large frame sizes on some configurations.

  On the ARM side, the compiler was messing up shadow stacks between EL1
  and EL2 code, which is easily fixed with __always_inline"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: VMX: check descriptor table exits on instruction emulation
  kvm: x86: Limit the number of "kvm: disabled by bios" messages
  KVM: x86: avoid useless copy of cpufreq policy
  KVM: allow disabling -Werror
  KVM: x86: allow compiling as non-module with W=1
  KVM: Pre-allocate 1 cpumask variable per cpu for both pv tlb and pv ipis
  KVM: Introduce pv check helpers
  KVM: let declaration of kvm_get_running_vcpus match implementation
  KVM: SVM: allocate AVIC data structures based on kvm_amd module parameter
  arm64: Ask the compiler to __always_inline functions used by KVM at HYP
  KVM: arm64: Define our own swab32() to avoid a uapi static inline
  KVM: arm64: Ask the compiler to __always_inline functions used at HYP
  kvm: arm/arm64: Fold VHE entry/exit work into kvm_vcpu_run_vhe()
  KVM: arm/arm64: Fix up includes for trace.h
2020-03-01 15:16:35 -06:00
Oliver Upton
86f7e90ce8 KVM: VMX: check descriptor table exits on instruction emulation
KVM emulates UMIP on hardware that doesn't support it by setting the
'descriptor table exiting' VM-execution control and performing
instruction emulation. When running nested, this emulation is broken as
KVM refuses to emulate L2 instructions by default.

Correct this regression by allowing the emulation of descriptor table
instructions if L1 hasn't requested 'descriptor table exiting'.

Fixes: 07721feee4 ("KVM: nVMX: Don't emulate instructions in guest mode")
Reported-by: Jan Kiszka <jan.kiszka@web.de>
Cc: stable@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-01 19:26:31 +01:00
Thomas Gleixner
e441a2ae0e x86/entry/32: Remove the 0/-1 distinction from exception entries
Nothing cares about the -1 "mark as interrupt" in the errorcode of
exception entries. It's only used to fill the error code when a signal is
delivered, but this is already inconsistent vs. 64 bit as there all
exceptions which do not have an error code set it to 0. So if 32 bit
applications would care about this, then they would have noticed more than
a decade ago.

Just use 0 for all excpetions which do not have an errorcode consistently.

This does neither break /proc/$PID/syscall because this interface examines
the error code / syscall number which is on the stack and that is set to -1
(no syscall) in common_exception unconditionally for all exceptions. The
push in the entry stub is just there to fill the hardware error code slot
on the stack for consistency of the stack layout.

A transient observation of 0 is possible, but that's true for the other
exceptions which use 0 already as well and that interface is an unreliable
snapshot of dubious correctness anyway.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Link: https://lkml.kernel.org/r/87mu94m7ky.fsf@nanos.tec.linutronix.de
2020-02-29 12:45:54 +01:00
Juergen Gross
bba42affa7 x86/mm: Fix dump_pagetables with Xen PV
Commit 2ae27137b2 ("x86: mm: convert dump_pagetables to use
walk_page_range") broke Xen PV guests as the hypervisor reserved hole in
the memory map was not taken into account.

Fix that by starting the kernel range only at GUARD_HOLE_END_ADDR.

Fixes: 2ae27137b2 ("x86: mm: convert dump_pagetables to use walk_page_range")
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Julien Grall <julien@xen.org>
Link: https://lkml.kernel.org/r/20200221103851.7855-1-jgross@suse.com
2020-02-29 12:43:10 +01:00
Juergen Gross
99bcd4a6e5 x86/ioperm: Add new paravirt function update_io_bitmap()
Commit 111e7b15cf ("x86/ioperm: Extend IOPL config to control ioperm()
as well") reworked the iopl syscall to use I/O bitmaps.

Unfortunately this broke Xen PV domains using that syscall as there is
currently no I/O bitmap support in PV domains.

Add I/O bitmap support via a new paravirt function update_io_bitmap which
Xen PV domains can use to update their I/O bitmaps via a hypercall.

Fixes: 111e7b15cf ("x86/ioperm: Extend IOPL config to control ioperm() as well")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Cc: <stable@vger.kernel.org> # 5.5
Link: https://lkml.kernel.org/r/20200218154712.25490-1-jgross@suse.com
2020-02-29 12:43:09 +01:00
Arvind Sankar
c98a76eabb x86/boot/compressed: Fix reloading of GDTR post-relocation
The following commit:

  ef5a7b5eb1 ("efi/x86: Remove GDT setup from efi_main")

introduced GDT setup into the 32-bit kernel's startup_32, and reloads
the GDTR after relocating the kernel for paranoia's sake.

A followup commit:

   32d009137a ("x86/boot: Reload GDTR after copying to the end of the buffer")

introduced a similar GDTR reload in the 64-bit kernel as well.

The GDTR is adjusted by (init_size-_end), however this may not be the
correct offset to apply if the kernel was loaded at a misaligned address
or below LOAD_PHYSICAL_ADDR, as in that case the decompression buffer
has an additional offset from the original load address.

This should never happen for a conformant bootloader, but we're being
paranoid anyway, so just store the new GDT address in there instead of
adding any offsets, which is simpler as well.

Fixes: ef5a7b5eb1 ("efi/x86: Remove GDT setup from efi_main")
Fixes: 32d009137a ("x86/boot: Reload GDTR after copying to the end of the buffer")
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: linux-efi@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: https://lore.kernel.org/r/20200226230031.3011645-2-nivedita@alum.mit.edu
2020-02-29 10:19:35 +01:00
Tom Lendacky
badc61982a efi/x86: Add RNG seed EFI table to unencrypted mapping check
When booting with SME active, EFI tables must be mapped unencrypted since
they were built by UEFI in unencrypted memory. Update the list of tables
to be checked during early_memremap() processing to account for the EFI
RNG seed table.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-efi@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Link: https://lore.kernel.org/r/b64385fc13e5d7ad4b459216524f138e7879234f.1582662842.git.thomas.lendacky@amd.com
Link: https://lore.kernel.org/r/20200228121408.9075-3-ardb@kernel.org
2020-02-29 10:16:56 +01:00
Tom Lendacky
f10e80a19b efi/x86: Add TPM related EFI tables to unencrypted mapping checks
When booting with SME active, EFI tables must be mapped unencrypted since
they were built by UEFI in unencrypted memory. Update the list of tables
to be checked during early_memremap() processing to account for the EFI
TPM tables.

This fixes a bug where an EFI TPM log table has been created by UEFI, but
it lives in memory that has been marked as usable rather than reserved.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-efi@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: <stable@vger.kernel.org> # v5.4+
Link: https://lore.kernel.org/r/4144cd813f113c20cdfa511cf59500a64e6015be.1582662842.git.thomas.lendacky@amd.com
Link: https://lore.kernel.org/r/20200228121408.9075-2-ardb@kernel.org
2020-02-29 10:16:56 +01:00
Erwan Velu
ef935c25fd kvm: x86: Limit the number of "kvm: disabled by bios" messages
In older version of systemd(219), at boot time, udevadm is called with :
	/usr/bin/udevadm trigger --type=devices --action=add"

This program generates an echo "add" in /sys/devices/system/cpu/cpu<x>/uevent,
leading to the "kvm: disabled by bios" message in case of your Bios disabled
the virtualization extensions.

On a modern system running up to 256 CPU threads, this pollutes the Kernel logs.

This patch offers to ratelimit this message to avoid any userspace program triggering
this uevent printing this message too often.

This patch is only a workaround but greatly reduce the pollution without
breaking the current behavior of printing a message if some try to instantiate
KVM on a system that doesn't support it.

Note that recent versions of systemd (>239) do not have trigger this behavior.

This patch will be useful at least for some using older systemd with recent Kernels.

Signed-off-by: Erwan Velu <e.velu@criteo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28 11:37:20 +01:00
Paolo Bonzini
aaec7c03de KVM: x86: avoid useless copy of cpufreq policy
struct cpufreq_policy is quite big and it is not a good idea
to allocate one on the stack.  Just use cpufreq_cpu_get and
cpufreq_cpu_put which is even simpler.

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28 10:54:50 +01:00