The newly added hypercall doesn't work on x86-32:
arch/x86/kvm/x86.c: In function 'kvm_pv_clock_pairing':
arch/x86/kvm/x86.c:6163:6: error: implicit declaration of function 'kvm_get_walltime_and_clockread';did you mean 'kvm_get_time_scale'? [-Werror=implicit-function-declaration]
This adds an #ifdef around it, matching the one around the related
functions that are also only implemented on 64-bit systems.
Fixes: 55dd00a73a ("KVM: x86: add KVM_HC_CLOCK_PAIRING hypercall")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a driver with gettime method returning hosts realtime clock.
This allows Chrony to synchronize host and guest clocks with
high precision (see results below).
chronyc> sources
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
To configure Chronyd to use PHC refclock, add the
following line to its configuration file:
refclock PHC /dev/ptpX poll 3 dpoll -2 offset 0
Where /dev/ptpX is the kvmclock PTP clock.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Emulate read and write operations to CNTP_TVAL, CNTP_CVAL and CNTP_CTL.
Now VMs are able to use the EL1 physical timer.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
KVM traps on the EL1 phys timer accesses from VMs, but it doesn't handle
those traps. This results in terminating VMs. Instead, set a handler for
the EL1 phys timer access, and inject an undefined exception as an
intermediate step.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Set a background timer for the EL1 physical timer emulation while VMs
are running, so that VMs get the physical timer interrupts in a timely
manner.
Schedule the background timer on entry to the VM and cancel it on exit.
This would not have any performance impact to the guest OSes that
currently use the virtual timer since the physical timer is always not
enabled.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
When scheduling a background timer, consider both of the virtual and
physical timer and pick the earliest expiration time.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Now that we maintain the EL1 physical timer register states of VMs,
update the physical timer interrupt level along with the virtual one.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Initialize the emulated EL1 physical timer with the default irq number.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Now that we have a separate structure for timer context, make functions
generic so that they can work with any timer context, not just the
virtual timer context. This does not change the virtual timer
functionality.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Make cntvoff per each timer context. This is helpful to abstract kvm
timer functions to work with timer context without considering timer
types (e.g. physical timer or virtual timer).
This also would pave the way for ever doing adjustments of the cntvoff
on a per-CPU basis if that should ever make sense.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Abstract virtual timer context into a separate structure and change all
callers referring to timer registers, irq state and so on. No change in
functionality.
This is about to become very handy when adding the EL1 physical timer.
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The IRQFD framework calls the architecture dependent function
twice if the corresponding GSI type is edge triggered. For ARM,
the function kvm_set_msi() is getting called twice whenever the
IRQFD receives the event signal. The rest of the code path is
trying to inject the MSI without any validation checks. No need
to call the function vgic_its_inject_msi() second time to avoid
an unnecessary overhead in IRQ queue logic. It also avoids the
possibility of VM seeing the MSI twice.
Simple fix, return -1 if the argument 'level' value is zero.
Cc: stable@vger.kernel.org
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Fix rebase breakage from commit 55dd00a73a ("KVM: x86: add
KVM_HC_CLOCK_PAIRING hypercall", 2017-01-24), courtesy of the
"I could have sworn I had pushed the right branch" department.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- enable some simd extensions for guests
- enable nx for guests
- debug log for cpu model
- PER fixes
- remove bitwise annotation from ar_t
- detect guests in operation exception program check loops
- fix potential null-pointer dereference for ucontrol guests
- also contains merge for fix that went into 4.10 to avoid conflicts
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJYmGFiAAoJEBF7vIC1phx8wzYP/2xrpknbKSLhG7Vnn6R7Mmkq
joUl4bfqKoYDVkJ2yYLkN3WfCyIx/MA7t406EYjt/6INQcGWOWR4dG8/LU4sw5xk
1EXckD2YIR1iGIasTCsyDjCpABqsldIUofaKPsNeWFtnnKfR7EK9FJowFOkdRAwk
BUaBG5drayOnLySo02E7BrN3EAqkIuDdZinpM8e25h6nU9dLjS5o8nxX5iIIItgZ
VyHDTfWAGxWMqC5s4MnKsxC01NFA8JJa1KQful199D1jZ2nsC66OobNPr3vpaLFS
Nbolls9AF6jKDLPSJQopkkWcr3BuFFYwZSYsNYDRmuUc2Ellvuf0Ug/X2yQEN4lx
VnCRo9mDNRIryWVg1h003EQSVT7rgi3pBH+T7U+N9JPwdN7RgkDOvOgbMjWO0I3m
glZhJ/l0MIfEfgtam6cu3/k5r/ZKPDoE+kGXZMvOJ4MLI534ErD9yVgEarPYzEM4
fWnOuznUHRUKARhf6zR3DCIyp39UwD+QAfoTPgyvvnUFjWaPaWsuktZ4P4e1KPTT
XDfPTqQQJScBQtwphHYVvDGyfPRv6/taQXFyQAu95O31b8OBeQvlunyivBo7E+dr
ocL0f7pqjKkaw/0qICyLbUL0rBHGNuf3E2ODyyDl3HEI3+NL8c42+M8KaMps4cas
QyhSoeENYV9Kw6UpOBn6
=0x3e
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Fixes and features for 4.11 (via kvm/next)
- enable some simd extensions for guests
- enable nx for guests
- debug log for cpu model
- PER fixes
- remove bitwise annotation from ar_t
- detect guests in operation exception program check loops
- fix potential null-pointer dereference for ucontrol guests
- also contains merge for fix that went into 4.10 to avoid conflicts
Numerous MIPS KVM fixes, improvements, and features for 4.11, many of
which continue to pave the way for VZ support, the most interesting of
which are:
- Add GVA->HPA page tables for T&E, to cache GVA mappings.
- Generate fast-path TLB refill exception handler which loads host TLB
entries from GVA page table, avoiding repeated guest memory
translation and guest TLB lookups.
- Use uaccess macros when T&E needs to access guest memory, which with
GVA page tables and the Linux TLB refill handler improves robustness
against TLB faults and fixes EVA hosts.
- Use BadInstr/BadInstrP registers when available to obtain instruction
encodings after a synchronous trap.
- Add GPA->HPA page tables to replace the inflexible linear array,
allowing for multiple sparsely arranged memory regions.
- Properly implement dirty page logging.
- Add KVM_CAP_SYNC_MMU support so that changes in GPA mappings become
effective in guests even if they are already running, allowing for
copy-on-write, KSM, idle page tracking, swapping, and guest memory
ballooning.
- Add KVM_CAP_READONLY_MEM support, so writes to specified memory
regions are treated as MMIO.
- Implement proper CP0_EBase support in T&E.
- Expose a few more missing CP0 registers to userland.
- Add KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS support, and allow up to 8
VCPUs to be created in a VM.
- Various cleanups and dropping of dead and duplicated code.
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJYlKZeAAoJEGwLaZPeOHZ6ghsQAL4HRU32rA6XKE6gKgRiIPYC
n8iHv2EuUCSKnZyM1a9o1QdJU02bBBB5TPYw+NUsqNClaiRLHsaNZR0TSze4gmcF
NVdnEIOmKluSPnbSXNstqpihJ/p6vzJuh+2eh/EpRuyunzmQ01B6ffjUalKzPtBx
EfgFv1mBnteqLgYZwlJQwj8ogX+Y92TrfdzzazJAom6MFx/lMnigPnUeiaXEG8u6
VrOr/c6Q6lMz4Yfh0xyskJWN4B4zI6PW8/G3SKvKhl8YIRQdtFAv1OfFPaSbdTko
ZdEsFO9UOr0KQu13f10pHAdwRruF7OMQ+3nRDYttdYKzWUYC6pTm77yOG/3+MNdv
KALwaQqJBglaShjuzM8WBI09sDeKgaJ8LYZOttm9Mb+ltwfKsJZPDvba67Kv5266
jkzroKuZeQC6SvAHAlQ7qKgdQr1wrqF3WwjNMmeqNR4Fiw2C3ni/8N39MY/qi7RX
NXQv/fJ6XqM37RC0XMlyu5O+zVWPf0IZ0VdCcl3kkqbvCyXq8B8u6dC+L9+wGb5r
07ZMwnYC93CFeYfrjHu/GsqKqONfAL5Pz13Y/YUlgX0phaLB+yrkq70cFLB4sA8K
KgBwDuD0Qbmdvtcd97qFvp96GMuspIcOkMEiqbD/XrblYnYddehP81Aojdw/GDdp
C62jTny1c/n95ylfbpkC
=lMjW
-----END PGP SIGNATURE-----
Merge tag 'kvm_mips_4.11_1' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/kvm-mips into HEAD
KVM: MIPS: GVA/GPA page tables, dirty logging, SYNC_MMU etc
Numerous MIPS KVM fixes, improvements, and features for 4.11, many of
which continue to pave the way for VZ support, the most interesting of
which are:
- Add GVA->HPA page tables for T&E, to cache GVA mappings.
- Generate fast-path TLB refill exception handler which loads host TLB
entries from GVA page table, avoiding repeated guest memory
translation and guest TLB lookups.
- Use uaccess macros when T&E needs to access guest memory, which with
GVA page tables and the Linux TLB refill handler improves robustness
against TLB faults and fixes EVA hosts.
- Use BadInstr/BadInstrP registers when available to obtain instruction
encodings after a synchronous trap.
- Add GPA->HPA page tables to replace the inflexible linear array,
allowing for multiple sparsely arranged memory regions.
- Properly implement dirty page logging.
- Add KVM_CAP_SYNC_MMU support so that changes in GPA mappings become
effective in guests even if they are already running, allowing for
copy-on-write, KSM, idle page tracking, swapping, and guest memory
ballooning.
- Add KVM_CAP_READONLY_MEM support, so writes to specified memory
regions are treated as MMIO.
- Implement proper CP0_EBase support in T&E.
- Expose a few more missing CP0 registers to userland.
- Add KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS support, and allow up to 8
VCPUs to be created in a VM.
- Various cleanups and dropping of dead and duplicated code.
The big feature this time is support for POWER9 using the radix-tree
MMU for host and guest. This required some changes to arch/powerpc
code, so I talked with Michael Ellerman and he created a topic branch
with this patchset, which I merged into kvm-ppc-next and which Michael
will pull into his tree. Michael also put in some patches from Nick
Piggin which fix bugs in the interrupt vector code in relocatable
kernels when coming from a KVM guest.
Other notable changes include:
* Add the ability to change the size of the hashed page table,
from David Gibson.
* XICS (interrupt controller) emulation fixes and improvements,
from Li Zhong.
* Bug fixes from myself and Thomas Huth.
These patches define some new KVM capabilities and ioctls, but there
should be no conflicts with anything else currently upstream, as far
as I am aware.
Add a hypercall to retrieve the host realtime clock and the TSC value
used to calculate that clock read.
Used to implement clock synchronization between host and guest.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
vmx_complete_nested_posted_interrupt() can't fail, let's turn it into
a void function.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kmap() can't fail, therefore it will always return a valid pointer. Let's
just get rid of the unnecessary checks.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sometimes (e.g. early boot) a guest is broken in such ways that it loops
100% delivering operation exceptions (illegal operation) but the pgm new
PSW is not set properly. This will result in code being read from
address zero, which usually contains another illegal op. Let's detect
this case and return to userspace. Instead of only detecting
this for address zero apply a heuristic that will work for any program
check new psw.
We do not want guest problem state to be able to trigger a guest panic,
e.g. by faulting on an address that is the same as the program check
new PSW, so we check for the problem state bit being off.
With proper handling in userspace we
a: get rid of CPU consumption of such broken guests
b: keep the program old PSW. This allows to find out the original illegal
operation - making debugging such early boot issues much easier than
with single stepping
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
User controlled KVM guests do not support the dirty log, as they have
no single gmap that we can check for changes.
As they have no single gmap, kvm->arch.gmap is NULL and all further
referencing to it for dirty checking will result in a NULL
dereference.
Let's return -EINVAL if a caller tries to sync dirty logs for a
UCONTROL guest.
Fixes: 15f36eb ("KVM: s390: Add proper dirty bitmap support to S390 kvm.")
Cc: <stable@vger.kernel.org> # 3.16+
Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reported-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Increase the maximum number of MIPS KVM VCPUs to 8, and implement the
KVM_CAP_NR_VCPUS and KVM_CAP_MAX_CPUS capabilities which expose the
recommended and maximum number of VCPUs to userland. The previous
maximum of 1 didn't allow for any form of SMP guests.
We calculate the values similarly to ARM, recommending as many VCPUs as
there are CPUs online in the system. This will allow userland to know
how many VCPUs it is possible to create.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Expose the CP0_IntCtl register through the KVM register access API,
which is a required register since MIPS32r2. It is currently read-only
since the VS field isn't implemented due to lack of Config3.VInt or
Config3.VEIC.
It is implemented in trap_emul.c so that a VZ implementation can allow
writes.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Expose the CP0_EntryLo0 and CP0_EntryLo1 registers through the KVM
register access API. This is fairly straightforward for trap & emulate
since we don't support the RI and XI bits. For the sake of future
proofing (particularly for VZ) it is explicitly specified that the API
always exposes the 64-bit version of these registers (i.e. with the RI
and XI bits in bit positions 63 and 62 respectively), and they are
implemented in trap_emul.c rather than mips.c to allow them to be
implemented differently for VZ.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Set the default VCPU state closer to the architectural reset state, with
PC pointing at the reset vector (uncached PA 0x1fc00000, which for KVM
T&E is VA 0x5fc00000), and with CP0_Status.BEV and CP0_Status.ERL to 1.
Although QEMU at least will overwrite this state, it makes sense to do
this now that CP0_EBase is properly implemented to check BEV, and now
that we support a sparse GPA layout potentially with a boot ROM at GPA
0x1fc00000.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
The CP0_EBase register is a standard feature of MIPS32r2, so we should
always have been implementing it properly. However the register value
was ignored and wasn't exposed to userland.
Fix the emulation of exceptions and interrupts to use the value stored
in guest CP0_EBase, and fix the masks so that the top 3 bits (rather
than the standard 2) are fixed, so that it is always in the guest KSeg0
segment.
Also add CP0_EBASE to the KVM one_reg interface so it can be accessed by
userland, also allowing the CPU number field to be written (which isn't
permitted by the guest).
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Access to various CP0 registers via the KVM register access API needs to
be implementation specific to allow restrictions to be made on changes,
for example when VZ guest registers aren't present, so move them all
into trap_emul.c in preparation for VZ.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Now that load/store faults due to read only memory regions are treated
as MMIO accesses it is safe to claim support for read only memory
regions (KVM_CAP_READONLY_MEM).
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Implement the SYNC_MMU capability for KVM MIPS, allowing changes in the
underlying user host virtual address (HVA) mappings to be promptly
reflected in the corresponding guest physical address (GPA) mappings.
This allows for several features to work with guest RAM which require
mappings to be altered or protected, such as copy-on-write, KSM (Kernel
Samepage Merging), idle page tracking, memory swapping, and guest memory
ballooning.
There are two main aspects of this change, described below.
The KVM MMU notifier architecture callbacks are implemented so we can be
notified of changes in the HVA mappings. These arrange for the guest
physical address (GPA) page tables to be modified and possibly for
derived mappings (GVA page tables and TLBs) to be flushed.
- kvm_unmap_hva[_range]() - These deal with HVA mappings being removed,
for example before a copy-on-write takes place, which requires the
corresponding GPA page table mappings to be removed too.
- kvm_set_spte_hva() - These update a GPA page table entry to match the
new HVA entry, but must be careful to respect KVM specific
configuration such as not dirtying a clean guest page which is dirty
to the host, and write protecting writable pages in read only
memslots (which will soon be supported).
- kvm[_test]_age_hva() - These update GPA page table entries to be old
(invalid) so that access can be tracked, making them young again.
The GPA page fault handling (kvm_mips_map_page) is updated to use
gfn_to_pfn_prot() (which may provide read-only pages), to handle
asynchronous page table invalidation from MMU notifier callbacks, and to
handle more cases in the fast path.
- mmu_notifier_seq is used to detect asynchronous page table
invalidations while we're holding a pfn from gfn_to_pfn_prot()
outside of kvm->mmu_lock, retrying if invalidations have taken place,
e.g. a COW or a KSM page merge.
- The fast path (_kvm_mips_map_page_fast) now handles marking old pages
as young / accessed, and disallowing dirtying of clean pages that
aren't actually writable (e.g. shared pages that should COW, and
read-only memory regions when they are enabled in a future patch).
- Due to the use of MMU notifications we no longer need to keep the
page references after we've updated the GPA page tables.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Propagate the GPA PTE protection bits on to the GVA PTEs on a mapped
fault (except _PAGE_WRITE, and filtered by the guest TLB entry), rather
than always overriding the protection. This allows dirty page tracking
to work in mapped guest segments as a clear dirty bit in the GPA PTE
will propagate to the GVA PTEs even when the guest TLB has the dirty bit
set.
Since the filtering of protection bits is now abstracted, if the buddy
GVA PTE is also valid, we obtain the corresponding GPA PTE using a
simple non-allocating walk and load that into the GVA PTE similarly
(which may itself be invalid).
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Propagate the GPA PTE protection bits on to the GVA PTEs on a KSeg0
fault (except _PAGE_WRITE), rather than always overriding the
protection. This allows dirty page tracking to work in KSeg0 as a clear
dirty bit in the GPA PTE will propagate to the GVA PTEs.
This makes it simpler to use a single kvm_mips_map_page() to obtain both
the main GPA PTE and its buddy (which may be invalid), which also allows
memory regions to be fully accessible when they don't start and end on a
2*PAGE_SIZE boundary.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Update kvm_mips_map_page() to handle logging of dirty guest physical
pages. Upcoming patches will propagate the dirty bit to the GVA page
tables.
A fast path is added for handling protection bits that can be resolved
without calling into KVM, currently just dirtying of clean pages being
written to.
The slow path marks the GPA page table entry writable only on writes,
and at the same time marks the page dirty in the dirty page logging
bitmask.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
When an existing memory region has dirty page logging enabled, make the
entire slot clean (read only) so that writes will immediately start
logging dirty pages (once the dirty bit is transferred from GPA to GVA
page tables in an upcoming patch).
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
MIPS hasn't up to this point properly supported dirty page logging, as
pages in slots with dirty logging enabled aren't made clean, and tlbmod
exceptions from writes to clean pages have been assumed to be due to
guest TLB protection and unconditionally passed to the guest.
Use the generic dirty logging helper kvm_get_dirty_log_protect() to
properly implement kvm_vm_ioctl_get_dirty_log(), similar to how ARM
does. This uses xchg to clear the dirty bits when reading them, rather
than wiping them out afterwards with a memset, which would potentially
wipe recently set bits that weren't caught by kvm_get_dirty_log(). It
also makes the pages clean again using the
kvm_arch_mmu_enable_log_dirty_pt_masked() architecture callback so that
further writes after the shadow memslot is flushed will trigger tlbmod
exceptions and dirty handling.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Add a helper function to make a range of guest physical address (GPA)
mappings in the GPA page table clean so that writes can be caught. This
will be used in a few places to manage dirty page logging.
Note that until the dirty bit is transferred from GPA page table entries
to GVA page table entries in an upcoming patch this won't trigger a TLB
modified exception on write.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Rewrite TLB modified exception handling to handle read only GPA memory
regions, instead of unconditionally passing the exception to the guest.
If the guest TLB is not the cause of the exception we call into the
normal TLB fault handling depending on the memory segment, which will
soon attempt to remap the physical page to be writable (handling dirty
page tracking or copy on write in the process).
Failing that we fall back to treating it as MMIO, due to a read only
memory region. Once the capability is enabled, this will allow read only
memory regions (such as the Malta boot flash as emulated by QEMU) to
have writes treated as MMIO, while still allowing reads to run
untrapped.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Treat unhandled accesses to guest KSeg0 as MMIO, rather than only host
KSeg0 addresses. This will allow read only memory regions (such as the
Malta boot flash as emulated by QEMU) to have writes (before reads)
treated as MMIO, and unallocated physical addresses to have all accesses
treated as MMIO.
The MMIO emulation uses the gva_to_gpa callback, so this is also updated
for trap & emulate to handle guest KSeg0 addresses.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Abstract the handling of bad guest loads and stores which may need to
trigger an MMIO, so that the same code can be used in a later patch for
guest KSeg0 addresses (TLB exception handling) as well as for host KSeg1
addresses (existing address error exception and TLB exception handling).
We now use kvm_mips_emulate_store() and kvm_mips_emulate_load() directly
rather than the more generic kvm_mips_emulate_inst(), as there is no
need to expose emulation of any other instructions.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
kvm_mips_map_page() will need to know whether the fault was due to a
read or a write in order to support dirty page tracking,
KVM_CAP_SYNC_MMU, and read only memory regions, so get that information
passed down to it via new bool write_fault arguments to various
functions.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Ignore userland writes to CP0_Config7 rather than reporting an error,
since we do allow reads of this register and it is claimed to exist in
the ioctl API.
This allows userland to blindly save and restore KVM registers without
having to special case certain registers as not being writable, for
example during live migration once dirty page logging is fixed.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Implement the kvm_arch_flush_shadow_all() and
kvm_arch_flush_shadow_memslot() KVM functions for MIPS to allow guest
physical mappings to be safely changed.
The general MIPS KVM code takes care of flushing of GPA page table
entries. kvm_arch_flush_shadow_all() flushes the whole GPA page table,
and is always called on the cleanup path so there is no need to acquire
the kvm->mmu_lock. kvm_arch_flush_shadow_memslot() flushes only the
range of mappings in the GPA page table corresponding to the slot being
flushed, and happens when memory regions are moved or deleted.
MIPS KVM implementation callbacks are added for handling the
implementation specific flushing of mappings derived from the GPA page
tables. These are implemented for trap_emul.c using
kvm_flush_remote_tlbs() which should now be functional, and will flush
the per-VCPU GVA page tables and ASIDS synchronously (before next
entering guest mode or directly accessing GVA space).
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Use the lockless GVA helpers to implement the reading of guest
instructions for emulation. This will allow it to handle asynchronous
TLB flushes when they are implemented.
This is a little more complicated than the other two cases (get_inst()
and dynamic translation) due to the need to emulate the appropriate
guest TLB exception when the address isn't present or isn't valid in the
guest TLB.
Since there are several protected cache ops that may need to be
performed safely, this is abstracted by kvm_mips_guest_cache_op() which
is passed a protected cache op function pointer and takes care of the
lockless operation and fault handling / retry if the op should fail,
taking advantage of the new errors which the protected cache ops can now
return. This allows the existing advance fault handling which relied on
host TLB lookups to be removed, along with the now unused
kvm_mips_host_tlb_lookup(),
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Use the lockless GVA helpers to implement the reading of guest
instructions for emulation. This will allow it to handle asynchronous
TLB flushes when they are implemented.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Use the lockless GVA helpers to implement the dynamic translation of
guest instructions. This will allow it to handle asynchronous TLB
flushes when they are implemented.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Add helpers to allow for lockless direct access to the GVA space, by
changing the VCPU mode to READING_SHADOW_PAGE_TABLES for the duration of
the access. This allows asynchronous TLB flush requests in future
patches to safely trigger either a TLB flush before the direct GVA space
access, or a delay until the in-progress lockless direct access is
complete.
The kvm_trap_emul_gva_lockless_begin() and
kvm_trap_emul_gva_lockless_end() helpers take care of guarding the
direct GVA accesses, and kvm_trap_emul_gva_fault() tries to handle a
uaccess fault resulting from a flush having taken place.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
The stale ASID checks taking place on VCPU load can be reduced:
- Now that we check for a stale ASID on guest re-entry, there is no need
to do so when loading the VCPU outside of guest context, since it will
happen before entering the guest. Note that a lot of KVM VCPU ioctls
will cause the VCPU to be loaded but guest context won't be entered.
- There is no need to check for a stale kernel_mm ASID when the guest is
in user mode and vice versa. In fact doing so can potentially be
problematic since the user_mm ASID regeneration may trigger a new ASID
cycle, which would cause the kern_mm ASID to become stale after it has
been checked for staleness.
Therefore only check the ASID for the mm corresponding to the current
guest mode, and only if we're already in guest context. We drop some of
the related kvm_debug() calls here too.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Add handling of TLB invalidation requests before entering guest mode.
This will allow asynchonous invalidation of the VCPU mappings when
physical memory regions are altered. Should the CPU running the VCPU
already be in guest mode an IPI will be sent to trigger a guest exit.
The reload_asid path will be used in a future patch for when GVA is
about to be directly accessed by KVM.
In the process, the stale user ASID check in the re-entry path (for lazy
user GVA flushing) is generalised to check the ASID for the current
guest mode, in case a TLB invalidation request was handled. This has the
side effect of making the ASID checks on vcpu_load too conservative,
which will be addressed in a later patch.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org