mirror of
https://github.com/torvalds/linux.git
synced 2024-12-11 13:41:55 +00:00
49d5759268
- Provide a virtual cache topology to the guest to avoid inconsistencies with migration on heterogenous systems. Non secure software has no practical need to traverse the caches by set/way in the first place. - Add support for taking stage-2 access faults in parallel. This was an accidental omission in the original parallel faults implementation, but should provide a marginal improvement to machines w/o FEAT_HAFDBS (such as hardware from the fruit company). - A preamble to adding support for nested virtualization to KVM, including vEL2 register state, rudimentary nested exception handling and masking unsupported features for nested guests. - Fixes to the PSCI relay that avoid an unexpected host SVE trap when resuming a CPU when running pKVM. - VGIC maintenance interrupt support for the AIC - Improvements to the arch timer emulation, primarily aimed at reducing the trap overhead of running nested. - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the interest of CI systems. - Avoid VM-wide stop-the-world operations when a vCPU accesses its own redistributor. - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions in the host. - Aesthetic and comment/kerneldoc fixes - Drop the vestiges of the old Columbia mailing list and add [Oliver] as co-maintainer This also drags in arm64's 'for-next/sme2' branch, because both it and the PSCI relay changes touch the EL2 initialization code. RISC-V: - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE - Correctly place the guest in S-mode after redirecting a trap to the guest - Redirect illegal instruction traps to guest - SBI PMU support for guest s390: - Two patches sorting out confusion between virtual and physical addresses, which currently are the same on s390. - A new ioctl that performs cmpxchg on guest memory - A few fixes x86: - Change tdp_mmu to a read-only parameter - Separate TDP and shadow MMU page fault paths - Enable Hyper-V invariant TSC control - Fix a variety of APICv and AVIC bugs, some of them real-world, some of them affecting architecurally legal but unlikely to happen in practice - Mark APIC timer as expired if its in one-shot mode and the count underflows while the vCPU task was being migrated - Advertise support for Intel's new fast REP string features - Fix a double-shootdown issue in the emergency reboot code - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM similar treatment to VMX - Update Xen's TSC info CPUID sub-leaves as appropriate - Add support for Hyper-V's extended hypercalls, where "support" at this point is just forwarding the hypercalls to userspace - Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and MSR filters - One-off fixes and cleanups - Fix and cleanup the range-based TLB flushing code, used when KVM is running on Hyper-V - Add support for filtering PMU events using a mask. If userspace wants to restrict heavily what events the guest can use, it can now do so without needing an absurd number of filter entries - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU support is disabled - Add PEBS support for Intel Sapphire Rapids - Fix a mostly benign overflow bug in SEV's send|receive_update_data() - Move several SVM-specific flags into vcpu_svm x86 Intel: - Handle NMI VM-Exits before leaving the noinstr region - A few trivial cleanups in the VM-Enter flows - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support EPTP switching (or any other VM function) for L1 - Fix a crash when using eVMCS's enlighted MSR bitmaps Generic: - Clean up the hardware enable and initialization flow, which was scattered around multiple arch-specific hooks. Instead, just let the arch code call into generic code. Both x86 and ARM should benefit from not having to fight common KVM code's notion of how to do initialization. - Account allocations in generic kvm_arch_alloc_vm() - Fix a memory leak if coalesced MMIO unregistration fails selftests: - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to emit the correct hypercall instruction instead of relying on KVM to patch in VMMCALL - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmP2YA0UHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroPg/Qf+J6nT+TkIa+8Ei+fN1oMTDp4YuIOx mXvJ9mRK9sQ+tAUVwvDz3qN/fK5mjsYbRHIDlVc5p2Q3bCrVGDDqXPFfCcLx1u+O 9U9xjkO4JxD2LS9pc70FYOyzVNeJ8VMGOBbC2b0lkdYZ4KnUc6e/WWFKJs96bK+H duo+RIVyaMthnvbTwSv1K3qQb61n6lSJXplywS8KWFK6NZAmBiEFDAWGRYQE9lLs VcVcG0iDJNL/BQJ5InKCcvXVGskcCm9erDszPo7w4Bypa4S9AMS42DHUaRZrBJwV /WqdH7ckIz7+OSV0W1j+bKTHAFVTCjXYOM7wQykgjawjICzMSnnG9Gpskw== =goe1 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "ARM: - Provide a virtual cache topology to the guest to avoid inconsistencies with migration on heterogenous systems. Non secure software has no practical need to traverse the caches by set/way in the first place - Add support for taking stage-2 access faults in parallel. This was an accidental omission in the original parallel faults implementation, but should provide a marginal improvement to machines w/o FEAT_HAFDBS (such as hardware from the fruit company) - A preamble to adding support for nested virtualization to KVM, including vEL2 register state, rudimentary nested exception handling and masking unsupported features for nested guests - Fixes to the PSCI relay that avoid an unexpected host SVE trap when resuming a CPU when running pKVM - VGIC maintenance interrupt support for the AIC - Improvements to the arch timer emulation, primarily aimed at reducing the trap overhead of running nested - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the interest of CI systems - Avoid VM-wide stop-the-world operations when a vCPU accesses its own redistributor - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions in the host - Aesthetic and comment/kerneldoc fixes - Drop the vestiges of the old Columbia mailing list and add [Oliver] as co-maintainer RISC-V: - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE - Correctly place the guest in S-mode after redirecting a trap to the guest - Redirect illegal instruction traps to guest - SBI PMU support for guest s390: - Sort out confusion between virtual and physical addresses, which currently are the same on s390 - A new ioctl that performs cmpxchg on guest memory - A few fixes x86: - Change tdp_mmu to a read-only parameter - Separate TDP and shadow MMU page fault paths - Enable Hyper-V invariant TSC control - Fix a variety of APICv and AVIC bugs, some of them real-world, some of them affecting architecurally legal but unlikely to happen in practice - Mark APIC timer as expired if its in one-shot mode and the count underflows while the vCPU task was being migrated - Advertise support for Intel's new fast REP string features - Fix a double-shootdown issue in the emergency reboot code - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM similar treatment to VMX - Update Xen's TSC info CPUID sub-leaves as appropriate - Add support for Hyper-V's extended hypercalls, where "support" at this point is just forwarding the hypercalls to userspace - Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and MSR filters - One-off fixes and cleanups - Fix and cleanup the range-based TLB flushing code, used when KVM is running on Hyper-V - Add support for filtering PMU events using a mask. If userspace wants to restrict heavily what events the guest can use, it can now do so without needing an absurd number of filter entries - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU support is disabled - Add PEBS support for Intel Sapphire Rapids - Fix a mostly benign overflow bug in SEV's send|receive_update_data() - Move several SVM-specific flags into vcpu_svm x86 Intel: - Handle NMI VM-Exits before leaving the noinstr region - A few trivial cleanups in the VM-Enter flows - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support EPTP switching (or any other VM function) for L1 - Fix a crash when using eVMCS's enlighted MSR bitmaps Generic: - Clean up the hardware enable and initialization flow, which was scattered around multiple arch-specific hooks. Instead, just let the arch code call into generic code. Both x86 and ARM should benefit from not having to fight common KVM code's notion of how to do initialization - Account allocations in generic kvm_arch_alloc_vm() - Fix a memory leak if coalesced MMIO unregistration fails selftests: - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to emit the correct hypercall instruction instead of relying on KVM to patch in VMMCALL - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (325 commits) KVM: SVM: hyper-v: placate modpost section mismatch error KVM: x86/mmu: Make tdp_mmu_allowed static KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes KVM: arm64: nv: Filter out unsupported features from ID regs KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2 KVM: arm64: nv: Allow a sysreg to be hidden from userspace only KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2 KVM: arm64: nv: Handle SMCs taken from virtual EL2 KVM: arm64: nv: Handle trapped ERET from virtual EL2 KVM: arm64: nv: Inject HVC exceptions to the virtual EL2 KVM: arm64: nv: Support virtual EL2 exceptions KVM: arm64: nv: Handle HCR_EL2.NV system register traps KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state KVM: arm64: nv: Add EL2 system registers to vcpu context KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set KVM: arm64: nv: Introduce nested virtualization VCPU feature KVM: arm64: Use the S2 MMU context to iterate over S2 table ...
577 lines
15 KiB
C
577 lines
15 KiB
C
// SPDX-License-Identifier: GPL-2.0-only
|
|
/*
|
|
* HyperV Detection code.
|
|
*
|
|
* Copyright (C) 2010, Novell, Inc.
|
|
* Author : K. Y. Srinivasan <ksrinivasan@novell.com>
|
|
*/
|
|
|
|
#include <linux/types.h>
|
|
#include <linux/time.h>
|
|
#include <linux/clocksource.h>
|
|
#include <linux/init.h>
|
|
#include <linux/export.h>
|
|
#include <linux/hardirq.h>
|
|
#include <linux/efi.h>
|
|
#include <linux/interrupt.h>
|
|
#include <linux/irq.h>
|
|
#include <linux/kexec.h>
|
|
#include <linux/i8253.h>
|
|
#include <linux/random.h>
|
|
#include <linux/swiotlb.h>
|
|
#include <asm/processor.h>
|
|
#include <asm/hypervisor.h>
|
|
#include <asm/hyperv-tlfs.h>
|
|
#include <asm/mshyperv.h>
|
|
#include <asm/desc.h>
|
|
#include <asm/idtentry.h>
|
|
#include <asm/irq_regs.h>
|
|
#include <asm/i8259.h>
|
|
#include <asm/apic.h>
|
|
#include <asm/timer.h>
|
|
#include <asm/reboot.h>
|
|
#include <asm/nmi.h>
|
|
#include <clocksource/hyperv_timer.h>
|
|
#include <asm/numa.h>
|
|
#include <asm/coco.h>
|
|
|
|
/* Is Linux running as the root partition? */
|
|
bool hv_root_partition;
|
|
/* Is Linux running on nested Microsoft Hypervisor */
|
|
bool hv_nested;
|
|
struct ms_hyperv_info ms_hyperv;
|
|
|
|
#if IS_ENABLED(CONFIG_HYPERV)
|
|
static inline unsigned int hv_get_nested_reg(unsigned int reg)
|
|
{
|
|
if (hv_is_sint_reg(reg))
|
|
return reg - HV_REGISTER_SINT0 + HV_REGISTER_NESTED_SINT0;
|
|
|
|
switch (reg) {
|
|
case HV_REGISTER_SIMP:
|
|
return HV_REGISTER_NESTED_SIMP;
|
|
case HV_REGISTER_SIEFP:
|
|
return HV_REGISTER_NESTED_SIEFP;
|
|
case HV_REGISTER_SVERSION:
|
|
return HV_REGISTER_NESTED_SVERSION;
|
|
case HV_REGISTER_SCONTROL:
|
|
return HV_REGISTER_NESTED_SCONTROL;
|
|
case HV_REGISTER_EOM:
|
|
return HV_REGISTER_NESTED_EOM;
|
|
default:
|
|
return reg;
|
|
}
|
|
}
|
|
|
|
u64 hv_get_non_nested_register(unsigned int reg)
|
|
{
|
|
u64 value;
|
|
|
|
if (hv_is_synic_reg(reg) && hv_isolation_type_snp())
|
|
hv_ghcb_msr_read(reg, &value);
|
|
else
|
|
rdmsrl(reg, value);
|
|
return value;
|
|
}
|
|
EXPORT_SYMBOL_GPL(hv_get_non_nested_register);
|
|
|
|
void hv_set_non_nested_register(unsigned int reg, u64 value)
|
|
{
|
|
if (hv_is_synic_reg(reg) && hv_isolation_type_snp()) {
|
|
hv_ghcb_msr_write(reg, value);
|
|
|
|
/* Write proxy bit via wrmsl instruction */
|
|
if (hv_is_sint_reg(reg))
|
|
wrmsrl(reg, value | 1 << 20);
|
|
} else {
|
|
wrmsrl(reg, value);
|
|
}
|
|
}
|
|
EXPORT_SYMBOL_GPL(hv_set_non_nested_register);
|
|
|
|
u64 hv_get_register(unsigned int reg)
|
|
{
|
|
if (hv_nested)
|
|
reg = hv_get_nested_reg(reg);
|
|
|
|
return hv_get_non_nested_register(reg);
|
|
}
|
|
EXPORT_SYMBOL_GPL(hv_get_register);
|
|
|
|
void hv_set_register(unsigned int reg, u64 value)
|
|
{
|
|
if (hv_nested)
|
|
reg = hv_get_nested_reg(reg);
|
|
|
|
hv_set_non_nested_register(reg, value);
|
|
}
|
|
EXPORT_SYMBOL_GPL(hv_set_register);
|
|
|
|
static void (*vmbus_handler)(void);
|
|
static void (*hv_stimer0_handler)(void);
|
|
static void (*hv_kexec_handler)(void);
|
|
static void (*hv_crash_handler)(struct pt_regs *regs);
|
|
|
|
DEFINE_IDTENTRY_SYSVEC(sysvec_hyperv_callback)
|
|
{
|
|
struct pt_regs *old_regs = set_irq_regs(regs);
|
|
|
|
inc_irq_stat(irq_hv_callback_count);
|
|
if (vmbus_handler)
|
|
vmbus_handler();
|
|
|
|
if (ms_hyperv.hints & HV_DEPRECATING_AEOI_RECOMMENDED)
|
|
ack_APIC_irq();
|
|
|
|
set_irq_regs(old_regs);
|
|
}
|
|
|
|
void hv_setup_vmbus_handler(void (*handler)(void))
|
|
{
|
|
vmbus_handler = handler;
|
|
}
|
|
|
|
void hv_remove_vmbus_handler(void)
|
|
{
|
|
/* We have no way to deallocate the interrupt gate */
|
|
vmbus_handler = NULL;
|
|
}
|
|
|
|
/*
|
|
* Routines to do per-architecture handling of stimer0
|
|
* interrupts when in Direct Mode
|
|
*/
|
|
DEFINE_IDTENTRY_SYSVEC(sysvec_hyperv_stimer0)
|
|
{
|
|
struct pt_regs *old_regs = set_irq_regs(regs);
|
|
|
|
inc_irq_stat(hyperv_stimer0_count);
|
|
if (hv_stimer0_handler)
|
|
hv_stimer0_handler();
|
|
add_interrupt_randomness(HYPERV_STIMER0_VECTOR);
|
|
ack_APIC_irq();
|
|
|
|
set_irq_regs(old_regs);
|
|
}
|
|
|
|
/* For x86/x64, override weak placeholders in hyperv_timer.c */
|
|
void hv_setup_stimer0_handler(void (*handler)(void))
|
|
{
|
|
hv_stimer0_handler = handler;
|
|
}
|
|
|
|
void hv_remove_stimer0_handler(void)
|
|
{
|
|
/* We have no way to deallocate the interrupt gate */
|
|
hv_stimer0_handler = NULL;
|
|
}
|
|
|
|
void hv_setup_kexec_handler(void (*handler)(void))
|
|
{
|
|
hv_kexec_handler = handler;
|
|
}
|
|
|
|
void hv_remove_kexec_handler(void)
|
|
{
|
|
hv_kexec_handler = NULL;
|
|
}
|
|
|
|
void hv_setup_crash_handler(void (*handler)(struct pt_regs *regs))
|
|
{
|
|
hv_crash_handler = handler;
|
|
}
|
|
|
|
void hv_remove_crash_handler(void)
|
|
{
|
|
hv_crash_handler = NULL;
|
|
}
|
|
|
|
#ifdef CONFIG_KEXEC_CORE
|
|
static void hv_machine_shutdown(void)
|
|
{
|
|
if (kexec_in_progress && hv_kexec_handler)
|
|
hv_kexec_handler();
|
|
|
|
/*
|
|
* Call hv_cpu_die() on all the CPUs, otherwise later the hypervisor
|
|
* corrupts the old VP Assist Pages and can crash the kexec kernel.
|
|
*/
|
|
if (kexec_in_progress && hyperv_init_cpuhp > 0)
|
|
cpuhp_remove_state(hyperv_init_cpuhp);
|
|
|
|
/* The function calls stop_other_cpus(). */
|
|
native_machine_shutdown();
|
|
|
|
/* Disable the hypercall page when there is only 1 active CPU. */
|
|
if (kexec_in_progress)
|
|
hyperv_cleanup();
|
|
}
|
|
|
|
static void hv_machine_crash_shutdown(struct pt_regs *regs)
|
|
{
|
|
if (hv_crash_handler)
|
|
hv_crash_handler(regs);
|
|
|
|
/* The function calls crash_smp_send_stop(). */
|
|
native_machine_crash_shutdown(regs);
|
|
|
|
/* Disable the hypercall page when there is only 1 active CPU. */
|
|
hyperv_cleanup();
|
|
}
|
|
#endif /* CONFIG_KEXEC_CORE */
|
|
#endif /* CONFIG_HYPERV */
|
|
|
|
static uint32_t __init ms_hyperv_platform(void)
|
|
{
|
|
u32 eax;
|
|
u32 hyp_signature[3];
|
|
|
|
if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
|
|
return 0;
|
|
|
|
cpuid(HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS,
|
|
&eax, &hyp_signature[0], &hyp_signature[1], &hyp_signature[2]);
|
|
|
|
if (eax < HYPERV_CPUID_MIN || eax > HYPERV_CPUID_MAX ||
|
|
memcmp("Microsoft Hv", hyp_signature, 12))
|
|
return 0;
|
|
|
|
/* HYPERCALL and VP_INDEX MSRs are mandatory for all features. */
|
|
eax = cpuid_eax(HYPERV_CPUID_FEATURES);
|
|
if (!(eax & HV_MSR_HYPERCALL_AVAILABLE)) {
|
|
pr_warn("x86/hyperv: HYPERCALL MSR not available.\n");
|
|
return 0;
|
|
}
|
|
if (!(eax & HV_MSR_VP_INDEX_AVAILABLE)) {
|
|
pr_warn("x86/hyperv: VP_INDEX MSR not available.\n");
|
|
return 0;
|
|
}
|
|
|
|
return HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS;
|
|
}
|
|
|
|
static unsigned char hv_get_nmi_reason(void)
|
|
{
|
|
return 0;
|
|
}
|
|
|
|
#ifdef CONFIG_X86_LOCAL_APIC
|
|
/*
|
|
* Prior to WS2016 Debug-VM sends NMIs to all CPUs which makes
|
|
* it difficult to process CHANNELMSG_UNLOAD in case of crash. Handle
|
|
* unknown NMI on the first CPU which gets it.
|
|
*/
|
|
static int hv_nmi_unknown(unsigned int val, struct pt_regs *regs)
|
|
{
|
|
static atomic_t nmi_cpu = ATOMIC_INIT(-1);
|
|
|
|
if (!unknown_nmi_panic)
|
|
return NMI_DONE;
|
|
|
|
if (atomic_cmpxchg(&nmi_cpu, -1, raw_smp_processor_id()) != -1)
|
|
return NMI_HANDLED;
|
|
|
|
return NMI_DONE;
|
|
}
|
|
#endif
|
|
|
|
static unsigned long hv_get_tsc_khz(void)
|
|
{
|
|
unsigned long freq;
|
|
|
|
rdmsrl(HV_X64_MSR_TSC_FREQUENCY, freq);
|
|
|
|
return freq / 1000;
|
|
}
|
|
|
|
#if defined(CONFIG_SMP) && IS_ENABLED(CONFIG_HYPERV)
|
|
static void __init hv_smp_prepare_boot_cpu(void)
|
|
{
|
|
native_smp_prepare_boot_cpu();
|
|
#if defined(CONFIG_X86_64) && defined(CONFIG_PARAVIRT_SPINLOCKS)
|
|
hv_init_spinlocks();
|
|
#endif
|
|
}
|
|
|
|
static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
|
|
{
|
|
#ifdef CONFIG_X86_64
|
|
int i;
|
|
int ret;
|
|
#endif
|
|
|
|
native_smp_prepare_cpus(max_cpus);
|
|
|
|
#ifdef CONFIG_X86_64
|
|
for_each_present_cpu(i) {
|
|
if (i == 0)
|
|
continue;
|
|
ret = hv_call_add_logical_proc(numa_cpu_node(i), i, cpu_physical_id(i));
|
|
BUG_ON(ret);
|
|
}
|
|
|
|
for_each_present_cpu(i) {
|
|
if (i == 0)
|
|
continue;
|
|
ret = hv_call_create_vp(numa_cpu_node(i), hv_current_partition_id, i, i);
|
|
BUG_ON(ret);
|
|
}
|
|
#endif
|
|
}
|
|
#endif
|
|
|
|
static void __init ms_hyperv_init_platform(void)
|
|
{
|
|
int hv_max_functions_eax;
|
|
int hv_host_info_eax;
|
|
int hv_host_info_ebx;
|
|
int hv_host_info_ecx;
|
|
int hv_host_info_edx;
|
|
|
|
#ifdef CONFIG_PARAVIRT
|
|
pv_info.name = "Hyper-V";
|
|
#endif
|
|
|
|
/*
|
|
* Extract the features and hints
|
|
*/
|
|
ms_hyperv.features = cpuid_eax(HYPERV_CPUID_FEATURES);
|
|
ms_hyperv.priv_high = cpuid_ebx(HYPERV_CPUID_FEATURES);
|
|
ms_hyperv.misc_features = cpuid_edx(HYPERV_CPUID_FEATURES);
|
|
ms_hyperv.hints = cpuid_eax(HYPERV_CPUID_ENLIGHTMENT_INFO);
|
|
|
|
hv_max_functions_eax = cpuid_eax(HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS);
|
|
|
|
pr_info("Hyper-V: privilege flags low 0x%x, high 0x%x, hints 0x%x, misc 0x%x\n",
|
|
ms_hyperv.features, ms_hyperv.priv_high, ms_hyperv.hints,
|
|
ms_hyperv.misc_features);
|
|
|
|
ms_hyperv.max_vp_index = cpuid_eax(HYPERV_CPUID_IMPLEMENT_LIMITS);
|
|
ms_hyperv.max_lp_index = cpuid_ebx(HYPERV_CPUID_IMPLEMENT_LIMITS);
|
|
|
|
pr_debug("Hyper-V: max %u virtual processors, %u logical processors\n",
|
|
ms_hyperv.max_vp_index, ms_hyperv.max_lp_index);
|
|
|
|
/*
|
|
* Check CPU management privilege.
|
|
*
|
|
* To mirror what Windows does we should extract CPU management
|
|
* features and use the ReservedIdentityBit to detect if Linux is the
|
|
* root partition. But that requires negotiating CPU management
|
|
* interface (a process to be finalized).
|
|
*
|
|
* For now, use the privilege flag as the indicator for running as
|
|
* root.
|
|
*/
|
|
if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_CPU_MANAGEMENT) {
|
|
hv_root_partition = true;
|
|
pr_info("Hyper-V: running as root partition\n");
|
|
}
|
|
|
|
if (ms_hyperv.hints & HV_X64_HYPERV_NESTED) {
|
|
hv_nested = true;
|
|
pr_info("Hyper-V: running on a nested hypervisor\n");
|
|
}
|
|
|
|
/*
|
|
* Extract host information.
|
|
*/
|
|
if (hv_max_functions_eax >= HYPERV_CPUID_VERSION) {
|
|
hv_host_info_eax = cpuid_eax(HYPERV_CPUID_VERSION);
|
|
hv_host_info_ebx = cpuid_ebx(HYPERV_CPUID_VERSION);
|
|
hv_host_info_ecx = cpuid_ecx(HYPERV_CPUID_VERSION);
|
|
hv_host_info_edx = cpuid_edx(HYPERV_CPUID_VERSION);
|
|
|
|
pr_info("Hyper-V: Host Build %d.%d.%d.%d-%d-%d\n",
|
|
hv_host_info_ebx >> 16, hv_host_info_ebx & 0xFFFF,
|
|
hv_host_info_eax, hv_host_info_edx & 0xFFFFFF,
|
|
hv_host_info_ecx, hv_host_info_edx >> 24);
|
|
}
|
|
|
|
if (ms_hyperv.features & HV_ACCESS_FREQUENCY_MSRS &&
|
|
ms_hyperv.misc_features & HV_FEATURE_FREQUENCY_MSRS_AVAILABLE) {
|
|
x86_platform.calibrate_tsc = hv_get_tsc_khz;
|
|
x86_platform.calibrate_cpu = hv_get_tsc_khz;
|
|
}
|
|
|
|
if (ms_hyperv.priv_high & HV_ISOLATION) {
|
|
ms_hyperv.isolation_config_a = cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG);
|
|
ms_hyperv.isolation_config_b = cpuid_ebx(HYPERV_CPUID_ISOLATION_CONFIG);
|
|
ms_hyperv.shared_gpa_boundary =
|
|
BIT_ULL(ms_hyperv.shared_gpa_boundary_bits);
|
|
|
|
pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
|
|
ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
|
|
|
|
if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
|
|
static_branch_enable(&isolation_type_snp);
|
|
#ifdef CONFIG_SWIOTLB
|
|
swiotlb_unencrypted_base = ms_hyperv.shared_gpa_boundary;
|
|
#endif
|
|
}
|
|
/* Isolation VMs are unenlightened SEV-based VMs, thus this check: */
|
|
if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
|
|
if (hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE)
|
|
cc_set_vendor(CC_VENDOR_HYPERV);
|
|
}
|
|
}
|
|
|
|
if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
|
|
ms_hyperv.nested_features =
|
|
cpuid_eax(HYPERV_CPUID_NESTED_FEATURES);
|
|
pr_info("Hyper-V: Nested features: 0x%x\n",
|
|
ms_hyperv.nested_features);
|
|
}
|
|
|
|
#ifdef CONFIG_X86_LOCAL_APIC
|
|
if (ms_hyperv.features & HV_ACCESS_FREQUENCY_MSRS &&
|
|
ms_hyperv.misc_features & HV_FEATURE_FREQUENCY_MSRS_AVAILABLE) {
|
|
/*
|
|
* Get the APIC frequency.
|
|
*/
|
|
u64 hv_lapic_frequency;
|
|
|
|
rdmsrl(HV_X64_MSR_APIC_FREQUENCY, hv_lapic_frequency);
|
|
hv_lapic_frequency = div_u64(hv_lapic_frequency, HZ);
|
|
lapic_timer_period = hv_lapic_frequency;
|
|
pr_info("Hyper-V: LAPIC Timer Frequency: %#x\n",
|
|
lapic_timer_period);
|
|
}
|
|
|
|
register_nmi_handler(NMI_UNKNOWN, hv_nmi_unknown, NMI_FLAG_FIRST,
|
|
"hv_nmi_unknown");
|
|
#endif
|
|
|
|
#ifdef CONFIG_X86_IO_APIC
|
|
no_timer_check = 1;
|
|
#endif
|
|
|
|
#if IS_ENABLED(CONFIG_HYPERV) && defined(CONFIG_KEXEC_CORE)
|
|
machine_ops.shutdown = hv_machine_shutdown;
|
|
machine_ops.crash_shutdown = hv_machine_crash_shutdown;
|
|
#endif
|
|
if (ms_hyperv.features & HV_ACCESS_TSC_INVARIANT) {
|
|
/*
|
|
* Writing to synthetic MSR 0x40000118 updates/changes the
|
|
* guest visible CPUIDs. Setting bit 0 of this MSR enables
|
|
* guests to report invariant TSC feature through CPUID
|
|
* instruction, CPUID 0x800000007/EDX, bit 8. See code in
|
|
* early_init_intel() where this bit is examined. The
|
|
* setting of this MSR bit should happen before init_intel()
|
|
* is called.
|
|
*/
|
|
wrmsrl(HV_X64_MSR_TSC_INVARIANT_CONTROL, HV_EXPOSE_INVARIANT_TSC);
|
|
setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
|
|
}
|
|
|
|
/*
|
|
* Generation 2 instances don't support reading the NMI status from
|
|
* 0x61 port.
|
|
*/
|
|
if (efi_enabled(EFI_BOOT))
|
|
x86_platform.get_nmi_reason = hv_get_nmi_reason;
|
|
|
|
/*
|
|
* Hyper-V VMs have a PIT emulation quirk such that zeroing the
|
|
* counter register during PIT shutdown restarts the PIT. So it
|
|
* continues to interrupt @18.2 HZ. Setting i8253_clear_counter
|
|
* to false tells pit_shutdown() not to zero the counter so that
|
|
* the PIT really is shutdown. Generation 2 VMs don't have a PIT,
|
|
* and setting this value has no effect.
|
|
*/
|
|
i8253_clear_counter_on_shutdown = false;
|
|
|
|
#if IS_ENABLED(CONFIG_HYPERV)
|
|
/*
|
|
* Setup the hook to get control post apic initialization.
|
|
*/
|
|
x86_platform.apic_post_init = hyperv_init;
|
|
hyperv_setup_mmu_ops();
|
|
/* Setup the IDT for hypervisor callback */
|
|
alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_hyperv_callback);
|
|
|
|
/* Setup the IDT for reenlightenment notifications */
|
|
if (ms_hyperv.features & HV_ACCESS_REENLIGHTENMENT) {
|
|
alloc_intr_gate(HYPERV_REENLIGHTENMENT_VECTOR,
|
|
asm_sysvec_hyperv_reenlightenment);
|
|
}
|
|
|
|
/* Setup the IDT for stimer0 */
|
|
if (ms_hyperv.misc_features & HV_STIMER_DIRECT_MODE_AVAILABLE) {
|
|
alloc_intr_gate(HYPERV_STIMER0_VECTOR,
|
|
asm_sysvec_hyperv_stimer0);
|
|
}
|
|
|
|
# ifdef CONFIG_SMP
|
|
smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
|
|
if (hv_root_partition)
|
|
smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
|
|
# endif
|
|
|
|
/*
|
|
* Hyper-V doesn't provide irq remapping for IO-APIC. To enable x2apic,
|
|
* set x2apic destination mode to physical mode when x2apic is available
|
|
* and Hyper-V IOMMU driver makes sure cpus assigned with IO-APIC irqs
|
|
* have 8-bit APIC id.
|
|
*/
|
|
# ifdef CONFIG_X86_X2APIC
|
|
if (x2apic_supported())
|
|
x2apic_phys = 1;
|
|
# endif
|
|
|
|
/* Register Hyper-V specific clocksource */
|
|
hv_init_clocksource();
|
|
#endif
|
|
/*
|
|
* TSC should be marked as unstable only after Hyper-V
|
|
* clocksource has been initialized. This ensures that the
|
|
* stability of the sched_clock is not altered.
|
|
*/
|
|
if (!(ms_hyperv.features & HV_ACCESS_TSC_INVARIANT))
|
|
mark_tsc_unstable("running on Hyper-V");
|
|
|
|
hardlockup_detector_disable();
|
|
}
|
|
|
|
static bool __init ms_hyperv_x2apic_available(void)
|
|
{
|
|
return x2apic_supported();
|
|
}
|
|
|
|
/*
|
|
* If ms_hyperv_msi_ext_dest_id() returns true, hyperv_prepare_irq_remapping()
|
|
* returns -ENODEV and the Hyper-V IOMMU driver is not used; instead, the
|
|
* generic support of the 15-bit APIC ID is used: see __irq_msi_compose_msg().
|
|
*
|
|
* Note: for a VM on Hyper-V, the I/O-APIC is the only device which
|
|
* (logically) generates MSIs directly to the system APIC irq domain.
|
|
* There is no HPET, and PCI MSI/MSI-X interrupts are remapped by the
|
|
* pci-hyperv host bridge.
|
|
*
|
|
* Note: for a Hyper-V root partition, this will always return false.
|
|
* The hypervisor doesn't expose these HYPERV_CPUID_VIRT_STACK_* cpuids by
|
|
* default, they are implemented as intercepts by the Windows Hyper-V stack.
|
|
* Even a nested root partition (L2 root) will not get them because the
|
|
* nested (L1) hypervisor filters them out.
|
|
*/
|
|
static bool __init ms_hyperv_msi_ext_dest_id(void)
|
|
{
|
|
u32 eax;
|
|
|
|
eax = cpuid_eax(HYPERV_CPUID_VIRT_STACK_INTERFACE);
|
|
if (eax != HYPERV_VS_INTERFACE_EAX_SIGNATURE)
|
|
return false;
|
|
|
|
eax = cpuid_eax(HYPERV_CPUID_VIRT_STACK_PROPERTIES);
|
|
return eax & HYPERV_VS_PROPERTIES_EAX_EXTENDED_IOAPIC_RTE;
|
|
}
|
|
|
|
const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
|
|
.name = "Microsoft Hyper-V",
|
|
.detect = ms_hyperv_platform,
|
|
.type = X86_HYPER_MS_HYPERV,
|
|
.init.x2apic_available = ms_hyperv_x2apic_available,
|
|
.init.msi_ext_dest_id = ms_hyperv_msi_ext_dest_id,
|
|
.init.init_platform = ms_hyperv_init_platform,
|
|
};
|