linux/arch/x86/kvm
Like Xu bf328e22e4 KVM: x86: Don't sync user-written TSC against startup values
The legacy API for setting the TSC is fundamentally broken, and only
allows userspace to set a TSC "now", without any way to account for
time lost between the calculation of the value, and the kernel eventually
handling the ioctl.

To work around this, KVM has a hack which, if a TSC is set with a value
which is within a second's worth of the last TSC "written" to any vCPU in
the VM, assumes that userspace actually intended the two TSC values to be
in sync and adjusts the newly-written TSC value accordingly.

Thus, when a VMM restores a guest after suspend or migration using the
legacy API, the TSCs aren't necessarily *right*, but at least they're
in sync.

This trick falls down when restoring a guest which genuinely has been
running for less time than the 1 second of imprecision KVM allows for in
in the legacy API.  On *creation*, the first vCPU starts its TSC counting
from zero, and the subsequent vCPUs synchronize to that.  But then when
the VMM tries to restore a vCPU's intended TSC, because the VM has been
alive for less than 1 second and KVM's default TSC value for new vCPU's is
'0', the intended TSC is within a second of the last "written" TSC and KVM
incorrectly adjusts the intended TSC in an attempt to synchronize.

But further hacks can be piled onto KVM's existing hackish ABI, and
declare that the *first* value written by *userspace* (on any vCPU)
should not be subject to this "correction", i.e. KVM can assume that the
first write from userspace is not an attempt to sync up with TSC values
that only come from the kernel's default vCPU creation.

To that end: Add a flag, kvm->arch.user_set_tsc, protected by
kvm->arch.tsc_write_lock, to record that a TSC for at least one vCPU in
the VM *has* been set by userspace, and make the 1-second slop hack only
trigger if user_set_tsc is already set.

Note that userspace can explicitly request a *synchronization* of the
TSC by writing zero. For the purpose of user_set_tsc, an explicit
synchronization counts as "setting" the TSC, i.e. if userspace then
subsequently writes an explicit non-zero value which happens to be within
1 second of the previous value, the new value will be "corrected".  This
behavior is deliberate, as treating explicit synchronization as "setting"
the TSC preserves KVM's existing behaviour inasmuch as possible (KVM
always applied the 1-second "correction" regardless of whether the write
came from userspace vs. the kernel).

Reported-by: Yong He <alexyonghe@tencent.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217423
Suggested-by: Oliver Upton <oliver.upton@linux.dev>
Original-by: Oliver Upton <oliver.upton@linux.dev>
Original-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
Tested-by: Yong He <alexyonghe@tencent.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20231008025335.7419-1-likexu@tencent.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-09 17:29:52 -07:00
..
mmu KVM: x86/mmu: Stop zapping invalidated TDP MMU roots asynchronously 2023-09-23 05:35:48 -04:00
svm KVM: SVM: Do not use user return MSR support for virtualized TSC_AUX 2023-09-23 05:35:49 -04:00
vmx ARM: 2023-09-07 13:52:20 -07:00
.gitignore KVM: x86: use a separate asm-offsets.c file 2022-11-09 12:10:17 -05:00
cpuid.c KVM: x86: Add SBPB support 2023-10-04 15:19:32 -07:00
cpuid.h KVM: x86: Add SBPB support 2023-10-04 15:19:32 -07:00
debugfs.c KVM: x86: Unify pr_fmt to use module name for all KVM modules 2022-12-29 15:47:35 -05:00
emulate.c KVM: x86: Remove break statements that will never be executed 2023-08-17 11:28:00 -07:00
fpu.h
governed_features.h KVM: nSVM: Use KVM-governed feature framework to track "vNMI enabled" 2023-08-17 11:43:31 -07:00
hyperv.c KVM: x86: Remove break statements that will never be executed 2023-08-17 11:28:00 -07:00
hyperv.h KVM: x86: Hyper-V invariant TSC control 2022-12-29 15:33:29 -05:00
i8254.c KVM: x86: Unify pr_fmt to use module name for all KVM modules 2022-12-29 15:47:35 -05:00
i8254.h KVM: x86: PIT: Preserve state of speaker port data bit 2022-06-08 13:06:20 -04:00
i8259.c KVM: x86: Fix poll command 2023-06-01 13:44:13 -07:00
ioapic.c KVM: x86/ioapic: Resample the pending state of an IRQ when unmasking 2023-03-27 10:13:28 -04:00
ioapic.h
irq_comm.c KVM: x86: Unify pr_fmt to use module name for all KVM modules 2022-12-29 15:47:35 -05:00
irq.c KVM: x86: Unify pr_fmt to use module name for all KVM modules 2022-12-29 15:47:35 -05:00
irq.h
Kconfig KVM: x86: Add CONFIG_KVM_MAX_NR_VCPUS to allow up to 4096 vCPUs 2023-09-28 09:25:19 -07:00
kvm_cache_regs.h KVM: x86: Add helpers to query individual CR0/CR4 bits 2023-03-22 10:10:53 -07:00
kvm_emulate.h KVM: x86: Remove x86_emulate_ops::guest_has_long_mode 2023-08-02 15:47:27 -07:00
kvm_onhyperv.c KVM: x86/mmu: Move filling of Hyper-V's TLB range struct into Hyper-V code 2023-04-10 15:17:29 -07:00
kvm_onhyperv.h s390: 2023-05-01 12:06:20 -07:00
kvm-asm-offsets.c KVM: SVM: move MSR_IA32_SPEC_CTRL save/restore to assembly 2022-11-09 12:25:53 -05:00
lapic.c KVM x86 changes for 6.6: 2023-08-31 13:36:33 -04:00
lapic.h KVM: x86: Split out logic to generate "readable" APIC regs mask to helper 2023-01-24 10:04:35 -08:00
Makefile KVM: x86: Introduce .hv_inject_synthetic_vmexit_post_tlb_flush() nested hook 2022-11-18 12:59:13 -05:00
mmu.h KVM: x86/mmu: Don't bounce through page-track mechanism for guest PTEs 2023-08-31 13:49:00 -04:00
mtrr.c KVM: x86: Make kvm_mtrr_valid() static now that there are no external users 2023-06-01 13:41:06 -07:00
pmu.c KVM: x86/pmu: Move .hw_event_available() check out of PMC filter helper 2023-08-02 16:44:36 -07:00
pmu.h KVM: x86/pmu: Disable vPMU if the minimum num of counters isn't met 2023-06-06 17:31:44 -07:00
reverse_cpuid.h KVM: x86: Advertise AMX-COMPLEX CPUID to userspace 2023-08-03 15:40:17 -07:00
smm.c KVM: x86: Remove redundant vcpu->arch.cr0 assignments 2023-09-27 12:57:48 -07:00
smm.h KVM: x86: smm: preserve interrupt shadow in SMRAM 2022-11-09 12:31:26 -05:00
trace.h KVM: x86/xen: Add CPL to Xen hypercall tracepoint 2022-11-28 13:31:01 -05:00
tss.h
x86.c KVM: x86: Don't sync user-written TSC against startup values 2023-10-09 17:29:52 -07:00
x86.h KVM: x86: Refine calculation of guest wall clock to use a single TSC read 2023-10-05 19:36:16 -07:00
xen.c KVM: x86: Refine calculation of guest wall clock to use a single TSC read 2023-10-05 19:36:16 -07:00
xen.h KVM: x86/xen: update Xen CPUID Leaf 4 (tsc info) sub-leaves, if present 2023-01-24 10:05:20 -08:00