KVM/ARM Changes for v4.16
The changes for this version include icache invalidation optimizations (improving VM startup time), support for forwarded level-triggered interrupts (improved performance for timers and passthrough platform devices), a small fix for power-management notifiers, and some cosmetic changes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJacYnLAAoJEEtpOizt6ddyhHUH/1f/AHC4t6sNJJ4LAbWAjuve 77scB7vsVVpZqHUeA1i8d0vrWJQeqg8CEQ+iP/OVLC+bWVX0yeBtrt/pMJA8sXrV Jbo5kQu3NyrRUAew83rcvoqsVVf67BB/NohL7C7sQDvNp2bg2cgzxhpgNJUuUXQC WcEOhqstWo6NYJ7xYz5f+utzYQRO0YfnIzoTsoaNgDHSw/V37Ny9O0tYqTQGNYUm zZ+cRo3nFRFywbmHhIHvXkxmS0lGdACQWTzyd+qDsgiPJ463vRT6Fc035SSuqX9x MmS87cBdt1IK9yi0Firqhuy6CGgHZmnagHizE0arMv72Pcv/ucrkCDRqLQDhSMY= =bZLm -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM Changes for v4.16 The changes for this version include icache invalidation optimizations (improving VM startup time), support for forwarded level-triggered interrupts (improved performance for timers and passthrough platform devices), a small fix for power-management notifiers, and some cosmetic changes.
This commit is contained in:
commit
e53175395d
@ -1,187 +0,0 @@
|
||||
KVM/ARM VGIC Forwarded Physical Interrupts
|
||||
==========================================
|
||||
|
||||
The KVM/ARM code implements software support for the ARM Generic
|
||||
Interrupt Controller's (GIC's) hardware support for virtualization by
|
||||
allowing software to inject virtual interrupts to a VM, which the guest
|
||||
OS sees as regular interrupts. The code is famously known as the VGIC.
|
||||
|
||||
Some of these virtual interrupts, however, correspond to physical
|
||||
interrupts from real physical devices. One example could be the
|
||||
architected timer, which itself supports virtualization, and therefore
|
||||
lets a guest OS program the hardware device directly to raise an
|
||||
interrupt at some point in time. When such an interrupt is raised, the
|
||||
host OS initially handles the interrupt and must somehow signal this
|
||||
event as a virtual interrupt to the guest. Another example could be a
|
||||
passthrough device, where the physical interrupts are initially handled
|
||||
by the host, but the device driver for the device lives in the guest OS
|
||||
and KVM must therefore somehow inject a virtual interrupt on behalf of
|
||||
the physical one to the guest OS.
|
||||
|
||||
These virtual interrupts corresponding to a physical interrupt on the
|
||||
host are called forwarded physical interrupts, but are also sometimes
|
||||
referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
|
||||
|
||||
Forwarded physical interrupts are handled slightly differently compared
|
||||
to virtual interrupts generated purely by a software emulated device.
|
||||
|
||||
|
||||
The HW bit
|
||||
----------
|
||||
Virtual interrupts are signalled to the guest by programming the List
|
||||
Registers (LRs) on the GIC before running a VCPU. The LR is programmed
|
||||
with the virtual IRQ number and the state of the interrupt (Pending,
|
||||
Active, or Pending+Active). When the guest ACKs and EOIs a virtual
|
||||
interrupt, the LR state moves from Pending to Active, and finally to
|
||||
inactive.
|
||||
|
||||
The LRs include an extra bit, called the HW bit. When this bit is set,
|
||||
KVM must also program an additional field in the LR, the physical IRQ
|
||||
number, to link the virtual with the physical IRQ.
|
||||
|
||||
When the HW bit is set, KVM must EITHER set the Pending OR the Active
|
||||
bit, never both at the same time.
|
||||
|
||||
Setting the HW bit causes the hardware to deactivate the physical
|
||||
interrupt on the physical distributor when the guest deactivates the
|
||||
corresponding virtual interrupt.
|
||||
|
||||
|
||||
Forwarded Physical Interrupts Life Cycle
|
||||
----------------------------------------
|
||||
|
||||
The state of forwarded physical interrupts is managed in the following way:
|
||||
|
||||
- The physical interrupt is acked by the host, and becomes active on
|
||||
the physical distributor (*).
|
||||
- KVM sets the LR.Pending bit, because this is the only way the GICV
|
||||
interface is going to present it to the guest.
|
||||
- LR.Pending will stay set as long as the guest has not acked the interrupt.
|
||||
- LR.Pending transitions to LR.Active on the guest read of the IAR, as
|
||||
expected.
|
||||
- On guest EOI, the *physical distributor* active bit gets cleared,
|
||||
but the LR.Active is left untouched (set).
|
||||
- KVM clears the LR on VM exits when the physical distributor
|
||||
active state has been cleared.
|
||||
|
||||
(*): The host handling is slightly more complicated. For some forwarded
|
||||
interrupts (shared), KVM directly sets the active state on the physical
|
||||
distributor before entering the guest, because the interrupt is never actually
|
||||
handled on the host (see details on the timer as an example below). For other
|
||||
forwarded interrupts (non-shared) the host does not deactivate the interrupt
|
||||
when the host ISR completes, but leaves the interrupt active until the guest
|
||||
deactivates it. Leaving the interrupt active is allowed, because Linux
|
||||
configures the physical GIC with EOIMode=1, which causes EOI operations to
|
||||
perform a priority drop allowing the GIC to receive other interrupts of the
|
||||
default priority.
|
||||
|
||||
|
||||
Forwarded Edge and Level Triggered PPIs and SPIs
|
||||
------------------------------------------------
|
||||
Forwarded physical interrupts injected should always be active on the
|
||||
physical distributor when injected to a guest.
|
||||
|
||||
Level-triggered interrupts will keep the interrupt line to the GIC
|
||||
asserted, typically until the guest programs the device to deassert the
|
||||
line. This means that the interrupt will remain pending on the physical
|
||||
distributor until the guest has reprogrammed the device. Since we
|
||||
always run the VM with interrupts enabled on the CPU, a pending
|
||||
interrupt will exit the guest as soon as we switch into the guest,
|
||||
preventing the guest from ever making progress as the process repeats
|
||||
over and over. Therefore, the active state on the physical distributor
|
||||
must be set when entering the guest, preventing the GIC from forwarding
|
||||
the pending interrupt to the CPU. As soon as the guest deactivates the
|
||||
interrupt, the physical line is sampled by the hardware again and the host
|
||||
takes a new interrupt if and only if the physical line is still asserted.
|
||||
|
||||
Edge-triggered interrupts do not exhibit the same problem with
|
||||
preventing guest execution that level-triggered interrupts do. One
|
||||
option is to not use HW bit at all, and inject edge-triggered interrupts
|
||||
from a physical device as pure virtual interrupts. But that would
|
||||
potentially slow down handling of the interrupt in the guest, because a
|
||||
physical interrupt occurring in the middle of the guest ISR would
|
||||
preempt the guest for the host to handle the interrupt. Additionally,
|
||||
if you configure the system to handle interrupts on a separate physical
|
||||
core from that running your VCPU, you still have to interrupt the VCPU
|
||||
to queue the pending state onto the LR, even though the guest won't use
|
||||
this information until the guest ISR completes. Therefore, the HW
|
||||
bit should always be set for forwarded edge-triggered interrupts. With
|
||||
the HW bit set, the virtual interrupt is injected and additional
|
||||
physical interrupts occurring before the guest deactivates the interrupt
|
||||
simply mark the state on the physical distributor as Pending+Active. As
|
||||
soon as the guest deactivates the interrupt, the host takes another
|
||||
interrupt if and only if there was a physical interrupt between injecting
|
||||
the forwarded interrupt to the guest and the guest deactivating the
|
||||
interrupt.
|
||||
|
||||
Consequently, whenever we schedule a VCPU with one or more LRs with the
|
||||
HW bit set, the interrupt must also be active on the physical
|
||||
distributor.
|
||||
|
||||
|
||||
Forwarded LPIs
|
||||
--------------
|
||||
LPIs, introduced in GICv3, are always edge-triggered and do not have an
|
||||
active state. They become pending when a device signal them, and as
|
||||
soon as they are acked by the CPU, they are inactive again.
|
||||
|
||||
It therefore doesn't make sense, and is not supported, to set the HW bit
|
||||
for physical LPIs that are forwarded to a VM as virtual interrupts,
|
||||
typically virtual SPIs.
|
||||
|
||||
For LPIs, there is no other choice than to preempt the VCPU thread if
|
||||
necessary, and queue the pending state onto the LR.
|
||||
|
||||
|
||||
Putting It Together: The Architected Timer
|
||||
------------------------------------------
|
||||
The architected timer is a device that signals interrupts with level
|
||||
triggered semantics. The timer hardware is directly accessed by VCPUs
|
||||
which program the timer to fire at some point in time. Each VCPU on a
|
||||
system programs the timer to fire at different times, and therefore the
|
||||
hardware is multiplexed between multiple VCPUs. This is implemented by
|
||||
context-switching the timer state along with each VCPU thread.
|
||||
|
||||
However, this means that a scenario like the following is entirely
|
||||
possible, and in fact, typical:
|
||||
|
||||
1. KVM runs the VCPU
|
||||
2. The guest programs the time to fire in T+100
|
||||
3. The guest is idle and calls WFI (wait-for-interrupts)
|
||||
4. The hardware traps to the host
|
||||
5. KVM stores the timer state to memory and disables the hardware timer
|
||||
6. KVM schedules a soft timer to fire in T+(100 - time since step 2)
|
||||
7. KVM puts the VCPU thread to sleep (on a waitqueue)
|
||||
8. The soft timer fires, waking up the VCPU thread
|
||||
9. KVM reprograms the timer hardware with the VCPU's values
|
||||
10. KVM marks the timer interrupt as active on the physical distributor
|
||||
11. KVM injects a forwarded physical interrupt to the guest
|
||||
12. KVM runs the VCPU
|
||||
|
||||
Notice that KVM injects a forwarded physical interrupt in step 11 without
|
||||
the corresponding interrupt having actually fired on the host. That is
|
||||
exactly why we mark the timer interrupt as active in step 10, because
|
||||
the active state on the physical distributor is part of the state
|
||||
belonging to the timer hardware, which is context-switched along with
|
||||
the VCPU thread.
|
||||
|
||||
If the guest does not idle because it is busy, the flow looks like this
|
||||
instead:
|
||||
|
||||
1. KVM runs the VCPU
|
||||
2. The guest programs the time to fire in T+100
|
||||
4. At T+100 the timer fires and a physical IRQ causes the VM to exit
|
||||
(note that this initially only traps to EL2 and does not run the host ISR
|
||||
until KVM has returned to the host).
|
||||
5. With interrupts still disabled on the CPU coming back from the guest, KVM
|
||||
stores the virtual timer state to memory and disables the virtual hw timer.
|
||||
6. KVM looks at the timer state (in memory) and injects a forwarded physical
|
||||
interrupt because it concludes the timer has expired.
|
||||
7. KVM marks the timer interrupt as active on the physical distributor
|
||||
7. KVM enables the timer, enables interrupts, and runs the VCPU
|
||||
|
||||
Notice that again the forwarded physical interrupt is injected to the
|
||||
guest without having actually been handled on the host. In this case it
|
||||
is because the physical interrupt is never actually seen by the host because the
|
||||
timer is disabled upon guest return, and the virtual forwarded interrupt is
|
||||
injected on the KVM guest entry path.
|
@ -131,7 +131,7 @@ static inline bool mode_has_spsr(struct kvm_vcpu *vcpu)
|
||||
static inline bool vcpu_mode_priv(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
unsigned long cpsr_mode = vcpu->arch.ctxt.gp_regs.usr_regs.ARM_cpsr & MODE_MASK;
|
||||
return cpsr_mode > USR_MODE;;
|
||||
return cpsr_mode > USR_MODE;
|
||||
}
|
||||
|
||||
static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu *vcpu)
|
||||
|
@ -48,6 +48,8 @@
|
||||
KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
|
||||
#define KVM_REQ_IRQ_PENDING KVM_ARCH_REQ(1)
|
||||
|
||||
DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
|
||||
|
||||
u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
|
||||
int __attribute_const__ kvm_target_cpu(void);
|
||||
int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
|
||||
|
@ -21,7 +21,6 @@
|
||||
#include <linux/compiler.h>
|
||||
#include <linux/kvm_host.h>
|
||||
#include <asm/cp15.h>
|
||||
#include <asm/kvm_mmu.h>
|
||||
#include <asm/vfp.h>
|
||||
|
||||
#define __hyp_text __section(.hyp.text) notrace
|
||||
@ -69,6 +68,8 @@
|
||||
#define HIFAR __ACCESS_CP15(c6, 4, c0, 2)
|
||||
#define HPFAR __ACCESS_CP15(c6, 4, c0, 4)
|
||||
#define ICIALLUIS __ACCESS_CP15(c7, 0, c1, 0)
|
||||
#define BPIALLIS __ACCESS_CP15(c7, 0, c1, 6)
|
||||
#define ICIMVAU __ACCESS_CP15(c7, 0, c5, 1)
|
||||
#define ATS1CPR __ACCESS_CP15(c7, 0, c8, 0)
|
||||
#define TLBIALLIS __ACCESS_CP15(c8, 0, c3, 0)
|
||||
#define TLBIALL __ACCESS_CP15(c8, 0, c7, 0)
|
||||
|
@ -37,6 +37,8 @@
|
||||
|
||||
#include <linux/highmem.h>
|
||||
#include <asm/cacheflush.h>
|
||||
#include <asm/cputype.h>
|
||||
#include <asm/kvm_hyp.h>
|
||||
#include <asm/pgalloc.h>
|
||||
#include <asm/stage2_pgtable.h>
|
||||
|
||||
@ -83,6 +85,18 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd)
|
||||
return pmd;
|
||||
}
|
||||
|
||||
static inline pte_t kvm_s2pte_mkexec(pte_t pte)
|
||||
{
|
||||
pte_val(pte) &= ~L_PTE_XN;
|
||||
return pte;
|
||||
}
|
||||
|
||||
static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd)
|
||||
{
|
||||
pmd_val(pmd) &= ~PMD_SECT_XN;
|
||||
return pmd;
|
||||
}
|
||||
|
||||
static inline void kvm_set_s2pte_readonly(pte_t *pte)
|
||||
{
|
||||
pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
|
||||
@ -93,6 +107,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
|
||||
return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
|
||||
}
|
||||
|
||||
static inline bool kvm_s2pte_exec(pte_t *pte)
|
||||
{
|
||||
return !(pte_val(*pte) & L_PTE_XN);
|
||||
}
|
||||
|
||||
static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
|
||||
{
|
||||
pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
|
||||
@ -103,6 +122,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
|
||||
return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
|
||||
}
|
||||
|
||||
static inline bool kvm_s2pmd_exec(pmd_t *pmd)
|
||||
{
|
||||
return !(pmd_val(*pmd) & PMD_SECT_XN);
|
||||
}
|
||||
|
||||
static inline bool kvm_page_empty(void *ptr)
|
||||
{
|
||||
struct page *ptr_page = virt_to_page(ptr);
|
||||
@ -126,21 +150,10 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
|
||||
return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101;
|
||||
}
|
||||
|
||||
static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
|
||||
kvm_pfn_t pfn,
|
||||
unsigned long size)
|
||||
static inline void __clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
|
||||
{
|
||||
/*
|
||||
* If we are going to insert an instruction page and the icache is
|
||||
* either VIPT or PIPT, there is a potential problem where the host
|
||||
* (or another VM) may have used the same page as this guest, and we
|
||||
* read incorrect data from the icache. If we're using a PIPT cache,
|
||||
* we can invalidate just that page, but if we are using a VIPT cache
|
||||
* we need to invalidate the entire icache - damn shame - as written
|
||||
* in the ARM ARM (DDI 0406C.b - Page B3-1393).
|
||||
*
|
||||
* VIVT caches are tagged using both the ASID and the VMID and doesn't
|
||||
* need any kind of flushing (DDI 0406C.b - Page B3-1392).
|
||||
* Clean the dcache to the Point of Coherency.
|
||||
*
|
||||
* We need to do this through a kernel mapping (using the
|
||||
* user-space mapping has proved to be the wrong
|
||||
@ -155,9 +168,63 @@ static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
|
||||
|
||||
kvm_flush_dcache_to_poc(va, PAGE_SIZE);
|
||||
|
||||
if (icache_is_pipt())
|
||||
__cpuc_coherent_user_range((unsigned long)va,
|
||||
(unsigned long)va + PAGE_SIZE);
|
||||
size -= PAGE_SIZE;
|
||||
pfn++;
|
||||
|
||||
kunmap_atomic(va);
|
||||
}
|
||||
}
|
||||
|
||||
static inline void __invalidate_icache_guest_page(kvm_pfn_t pfn,
|
||||
unsigned long size)
|
||||
{
|
||||
u32 iclsz;
|
||||
|
||||
/*
|
||||
* If we are going to insert an instruction page and the icache is
|
||||
* either VIPT or PIPT, there is a potential problem where the host
|
||||
* (or another VM) may have used the same page as this guest, and we
|
||||
* read incorrect data from the icache. If we're using a PIPT cache,
|
||||
* we can invalidate just that page, but if we are using a VIPT cache
|
||||
* we need to invalidate the entire icache - damn shame - as written
|
||||
* in the ARM ARM (DDI 0406C.b - Page B3-1393).
|
||||
*
|
||||
* VIVT caches are tagged using both the ASID and the VMID and doesn't
|
||||
* need any kind of flushing (DDI 0406C.b - Page B3-1392).
|
||||
*/
|
||||
|
||||
VM_BUG_ON(size & ~PAGE_MASK);
|
||||
|
||||
if (icache_is_vivt_asid_tagged())
|
||||
return;
|
||||
|
||||
if (!icache_is_pipt()) {
|
||||
/* any kind of VIPT cache */
|
||||
__flush_icache_all();
|
||||
return;
|
||||
}
|
||||
|
||||
/*
|
||||
* CTR IminLine contains Log2 of the number of words in the
|
||||
* cache line, so we can get the number of words as
|
||||
* 2 << (IminLine - 1). To get the number of bytes, we
|
||||
* multiply by 4 (the number of bytes in a 32-bit word), and
|
||||
* get 4 << (IminLine).
|
||||
*/
|
||||
iclsz = 4 << (read_cpuid(CPUID_CACHETYPE) & 0xf);
|
||||
|
||||
while (size) {
|
||||
void *va = kmap_atomic_pfn(pfn);
|
||||
void *end = va + PAGE_SIZE;
|
||||
void *addr = va;
|
||||
|
||||
do {
|
||||
write_sysreg(addr, ICIMVAU);
|
||||
addr += iclsz;
|
||||
} while (addr < end);
|
||||
|
||||
dsb(ishst);
|
||||
isb();
|
||||
|
||||
size -= PAGE_SIZE;
|
||||
pfn++;
|
||||
@ -165,9 +232,11 @@ static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
|
||||
kunmap_atomic(va);
|
||||
}
|
||||
|
||||
if (!icache_is_pipt() && !icache_is_vivt_asid_tagged()) {
|
||||
/* any kind of VIPT cache */
|
||||
__flush_icache_all();
|
||||
/* Check if we need to invalidate the BTB */
|
||||
if ((read_cpuid_ext(CPUID_EXT_MMFR1) >> 28) != 4) {
|
||||
write_sysreg(0, BPIALLIS);
|
||||
dsb(ishst);
|
||||
isb();
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -102,8 +102,8 @@ extern pgprot_t pgprot_s2_device;
|
||||
#define PAGE_HYP_EXEC _MOD_PROT(pgprot_kernel, L_PTE_HYP | L_PTE_RDONLY)
|
||||
#define PAGE_HYP_RO _MOD_PROT(pgprot_kernel, L_PTE_HYP | L_PTE_RDONLY | L_PTE_XN)
|
||||
#define PAGE_HYP_DEVICE _MOD_PROT(pgprot_hyp_device, L_PTE_HYP)
|
||||
#define PAGE_S2 _MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY)
|
||||
#define PAGE_S2_DEVICE _MOD_PROT(pgprot_s2_device, L_PTE_S2_RDONLY)
|
||||
#define PAGE_S2 _MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY | L_PTE_XN)
|
||||
#define PAGE_S2_DEVICE _MOD_PROT(pgprot_s2_device, L_PTE_S2_RDONLY | L_PTE_XN)
|
||||
|
||||
#define __PAGE_NONE __pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN | L_PTE_NONE)
|
||||
#define __PAGE_SHARED __pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
|
||||
|
@ -18,6 +18,7 @@
|
||||
|
||||
#include <asm/kvm_asm.h>
|
||||
#include <asm/kvm_hyp.h>
|
||||
#include <asm/kvm_mmu.h>
|
||||
|
||||
__asm__(".arch_extension virt");
|
||||
|
||||
|
@ -19,6 +19,7 @@
|
||||
*/
|
||||
|
||||
#include <asm/kvm_hyp.h>
|
||||
#include <asm/kvm_mmu.h>
|
||||
|
||||
/**
|
||||
* Flush per-VMID TLBs
|
||||
|
@ -25,13 +25,13 @@
|
||||
isb
|
||||
.endm
|
||||
|
||||
.macro uaccess_ttbr0_disable, tmp1
|
||||
.macro uaccess_ttbr0_disable, tmp1, tmp2
|
||||
alternative_if_not ARM64_HAS_PAN
|
||||
__uaccess_ttbr0_disable \tmp1
|
||||
alternative_else_nop_endif
|
||||
.endm
|
||||
|
||||
.macro uaccess_ttbr0_enable, tmp1, tmp2
|
||||
.macro uaccess_ttbr0_enable, tmp1, tmp2, tmp3
|
||||
alternative_if_not ARM64_HAS_PAN
|
||||
save_and_disable_irq \tmp2 // avoid preemption
|
||||
__uaccess_ttbr0_enable \tmp1
|
||||
@ -39,18 +39,18 @@ alternative_if_not ARM64_HAS_PAN
|
||||
alternative_else_nop_endif
|
||||
.endm
|
||||
#else
|
||||
.macro uaccess_ttbr0_disable, tmp1
|
||||
.macro uaccess_ttbr0_disable, tmp1, tmp2
|
||||
.endm
|
||||
|
||||
.macro uaccess_ttbr0_enable, tmp1, tmp2
|
||||
.macro uaccess_ttbr0_enable, tmp1, tmp2, tmp3
|
||||
.endm
|
||||
#endif
|
||||
|
||||
/*
|
||||
* These macros are no-ops when UAO is present.
|
||||
*/
|
||||
.macro uaccess_disable_not_uao, tmp1
|
||||
uaccess_ttbr0_disable \tmp1
|
||||
.macro uaccess_disable_not_uao, tmp1, tmp2
|
||||
uaccess_ttbr0_disable \tmp1, \tmp2
|
||||
alternative_if ARM64_ALT_PAN_NOT_UAO
|
||||
SET_PSTATE_PAN(1)
|
||||
alternative_else_nop_endif
|
||||
|
@ -387,6 +387,27 @@ alternative_endif
|
||||
dsb \domain
|
||||
.endm
|
||||
|
||||
/*
|
||||
* Macro to perform an instruction cache maintenance for the interval
|
||||
* [start, end)
|
||||
*
|
||||
* start, end: virtual addresses describing the region
|
||||
* label: A label to branch to on user fault.
|
||||
* Corrupts: tmp1, tmp2
|
||||
*/
|
||||
.macro invalidate_icache_by_line start, end, tmp1, tmp2, label
|
||||
icache_line_size \tmp1, \tmp2
|
||||
sub \tmp2, \tmp1, #1
|
||||
bic \tmp2, \start, \tmp2
|
||||
9997:
|
||||
USER(\label, ic ivau, \tmp2) // invalidate I line PoU
|
||||
add \tmp2, \tmp2, \tmp1
|
||||
cmp \tmp2, \end
|
||||
b.lo 9997b
|
||||
dsb ish
|
||||
isb
|
||||
.endm
|
||||
|
||||
/*
|
||||
* reset_pmuserenr_el0 - reset PMUSERENR_EL0 if PMUv3 present
|
||||
*/
|
||||
|
@ -52,6 +52,12 @@
|
||||
* - start - virtual start address
|
||||
* - end - virtual end address
|
||||
*
|
||||
* invalidate_icache_range(start, end)
|
||||
*
|
||||
* Invalidate the I-cache in the region described by start, end.
|
||||
* - start - virtual start address
|
||||
* - end - virtual end address
|
||||
*
|
||||
* __flush_cache_user_range(start, end)
|
||||
*
|
||||
* Ensure coherency between the I-cache and the D-cache in the
|
||||
@ -66,6 +72,7 @@
|
||||
* - size - region size
|
||||
*/
|
||||
extern void flush_icache_range(unsigned long start, unsigned long end);
|
||||
extern int invalidate_icache_range(unsigned long start, unsigned long end);
|
||||
extern void __flush_dcache_area(void *addr, size_t len);
|
||||
extern void __inval_dcache_area(void *addr, size_t len);
|
||||
extern void __clean_dcache_area_poc(void *addr, size_t len);
|
||||
|
@ -47,6 +47,8 @@
|
||||
KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
|
||||
#define KVM_REQ_IRQ_PENDING KVM_ARCH_REQ(1)
|
||||
|
||||
DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
|
||||
|
||||
int __attribute_const__ kvm_target_cpu(void);
|
||||
int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
|
||||
int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext);
|
||||
|
@ -20,7 +20,6 @@
|
||||
|
||||
#include <linux/compiler.h>
|
||||
#include <linux/kvm_host.h>
|
||||
#include <asm/kvm_mmu.h>
|
||||
#include <asm/sysreg.h>
|
||||
|
||||
#define __hyp_text __section(.hyp.text) notrace
|
||||
|
@ -173,6 +173,18 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd)
|
||||
return pmd;
|
||||
}
|
||||
|
||||
static inline pte_t kvm_s2pte_mkexec(pte_t pte)
|
||||
{
|
||||
pte_val(pte) &= ~PTE_S2_XN;
|
||||
return pte;
|
||||
}
|
||||
|
||||
static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd)
|
||||
{
|
||||
pmd_val(pmd) &= ~PMD_S2_XN;
|
||||
return pmd;
|
||||
}
|
||||
|
||||
static inline void kvm_set_s2pte_readonly(pte_t *pte)
|
||||
{
|
||||
pteval_t old_pteval, pteval;
|
||||
@ -191,6 +203,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
|
||||
return (pte_val(*pte) & PTE_S2_RDWR) == PTE_S2_RDONLY;
|
||||
}
|
||||
|
||||
static inline bool kvm_s2pte_exec(pte_t *pte)
|
||||
{
|
||||
return !(pte_val(*pte) & PTE_S2_XN);
|
||||
}
|
||||
|
||||
static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
|
||||
{
|
||||
kvm_set_s2pte_readonly((pte_t *)pmd);
|
||||
@ -201,6 +218,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
|
||||
return kvm_s2pte_readonly((pte_t *)pmd);
|
||||
}
|
||||
|
||||
static inline bool kvm_s2pmd_exec(pmd_t *pmd)
|
||||
{
|
||||
return !(pmd_val(*pmd) & PMD_S2_XN);
|
||||
}
|
||||
|
||||
static inline bool kvm_page_empty(void *ptr)
|
||||
{
|
||||
struct page *ptr_page = virt_to_page(ptr);
|
||||
@ -230,21 +252,25 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
|
||||
return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
|
||||
}
|
||||
|
||||
static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
|
||||
kvm_pfn_t pfn,
|
||||
unsigned long size)
|
||||
static inline void __clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
|
||||
{
|
||||
void *va = page_address(pfn_to_page(pfn));
|
||||
|
||||
kvm_flush_dcache_to_poc(va, size);
|
||||
}
|
||||
|
||||
static inline void __invalidate_icache_guest_page(kvm_pfn_t pfn,
|
||||
unsigned long size)
|
||||
{
|
||||
if (icache_is_aliasing()) {
|
||||
/* any kind of VIPT cache */
|
||||
__flush_icache_all();
|
||||
} else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
|
||||
/* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
|
||||
flush_icache_range((unsigned long)va,
|
||||
(unsigned long)va + size);
|
||||
void *va = page_address(pfn_to_page(pfn));
|
||||
|
||||
invalidate_icache_range((unsigned long)va,
|
||||
(unsigned long)va + size);
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -177,9 +177,11 @@
|
||||
*/
|
||||
#define PTE_S2_RDONLY (_AT(pteval_t, 1) << 6) /* HAP[2:1] */
|
||||
#define PTE_S2_RDWR (_AT(pteval_t, 3) << 6) /* HAP[2:1] */
|
||||
#define PTE_S2_XN (_AT(pteval_t, 2) << 53) /* XN[1:0] */
|
||||
|
||||
#define PMD_S2_RDONLY (_AT(pmdval_t, 1) << 6) /* HAP[2:1] */
|
||||
#define PMD_S2_RDWR (_AT(pmdval_t, 3) << 6) /* HAP[2:1] */
|
||||
#define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */
|
||||
|
||||
/*
|
||||
* Memory Attribute override for Stage-2 (MemAttr[3:0])
|
||||
|
@ -60,8 +60,8 @@
|
||||
#define PAGE_HYP_RO __pgprot(_PAGE_DEFAULT | PTE_HYP | PTE_RDONLY | PTE_HYP_XN)
|
||||
#define PAGE_HYP_DEVICE __pgprot(PROT_DEVICE_nGnRE | PTE_HYP)
|
||||
|
||||
#define PAGE_S2 __pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_NORMAL) | PTE_S2_RDONLY)
|
||||
#define PAGE_S2_DEVICE __pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_DEVICE_nGnRE) | PTE_S2_RDONLY | PTE_UXN)
|
||||
#define PAGE_S2 __pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_NORMAL) | PTE_S2_RDONLY | PTE_S2_XN)
|
||||
#define PAGE_S2_DEVICE __pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_DEVICE_nGnRE) | PTE_S2_RDONLY | PTE_S2_XN)
|
||||
|
||||
#define PAGE_NONE __pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_RDONLY | PTE_PXN | PTE_UXN)
|
||||
#define PAGE_SHARED __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_UXN | PTE_WRITE)
|
||||
|
@ -21,6 +21,7 @@
|
||||
#include <asm/debug-monitors.h>
|
||||
#include <asm/kvm_asm.h>
|
||||
#include <asm/kvm_hyp.h>
|
||||
#include <asm/kvm_mmu.h>
|
||||
|
||||
#define read_debug(r,n) read_sysreg(r##n##_el1)
|
||||
#define write_debug(v,r,n) write_sysreg(v, r##n##_el1)
|
||||
|
@ -21,6 +21,7 @@
|
||||
#include <asm/kvm_asm.h>
|
||||
#include <asm/kvm_emulate.h>
|
||||
#include <asm/kvm_hyp.h>
|
||||
#include <asm/kvm_mmu.h>
|
||||
#include <asm/fpsimd.h>
|
||||
#include <asm/debug-monitors.h>
|
||||
|
||||
|
@ -16,6 +16,7 @@
|
||||
*/
|
||||
|
||||
#include <asm/kvm_hyp.h>
|
||||
#include <asm/kvm_mmu.h>
|
||||
#include <asm/tlbflush.h>
|
||||
|
||||
static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm)
|
||||
|
@ -50,7 +50,7 @@ uao_user_alternative 9f, strh, sttrh, wzr, x0, 2
|
||||
b.mi 5f
|
||||
uao_user_alternative 9f, strb, sttrb, wzr, x0, 0
|
||||
5: mov x0, #0
|
||||
uaccess_disable_not_uao x2
|
||||
uaccess_disable_not_uao x2, x3
|
||||
ret
|
||||
ENDPROC(__clear_user)
|
||||
|
||||
|
@ -67,7 +67,7 @@ ENTRY(__arch_copy_from_user)
|
||||
uaccess_enable_not_uao x3, x4
|
||||
add end, x0, x2
|
||||
#include "copy_template.S"
|
||||
uaccess_disable_not_uao x3
|
||||
uaccess_disable_not_uao x3, x4
|
||||
mov x0, #0 // Nothing to copy
|
||||
ret
|
||||
ENDPROC(__arch_copy_from_user)
|
||||
|
@ -68,7 +68,7 @@ ENTRY(raw_copy_in_user)
|
||||
uaccess_enable_not_uao x3, x4
|
||||
add end, x0, x2
|
||||
#include "copy_template.S"
|
||||
uaccess_disable_not_uao x3
|
||||
uaccess_disable_not_uao x3, x4
|
||||
mov x0, #0
|
||||
ret
|
||||
ENDPROC(raw_copy_in_user)
|
||||
|
@ -66,7 +66,7 @@ ENTRY(__arch_copy_to_user)
|
||||
uaccess_enable_not_uao x3, x4
|
||||
add end, x0, x2
|
||||
#include "copy_template.S"
|
||||
uaccess_disable_not_uao x3
|
||||
uaccess_disable_not_uao x3, x4
|
||||
mov x0, #0
|
||||
ret
|
||||
ENDPROC(__arch_copy_to_user)
|
||||
|
@ -49,7 +49,7 @@ ENTRY(flush_icache_range)
|
||||
* - end - virtual end address of region
|
||||
*/
|
||||
ENTRY(__flush_cache_user_range)
|
||||
uaccess_ttbr0_enable x2, x3
|
||||
uaccess_ttbr0_enable x2, x3, x4
|
||||
dcache_line_size x2, x3
|
||||
sub x3, x2, #1
|
||||
bic x4, x0, x3
|
||||
@ -60,19 +60,10 @@ user_alt 9f, "dc cvau, x4", "dc civac, x4", ARM64_WORKAROUND_CLEAN_CACHE
|
||||
b.lo 1b
|
||||
dsb ish
|
||||
|
||||
icache_line_size x2, x3
|
||||
sub x3, x2, #1
|
||||
bic x4, x0, x3
|
||||
1:
|
||||
USER(9f, ic ivau, x4 ) // invalidate I line PoU
|
||||
add x4, x4, x2
|
||||
cmp x4, x1
|
||||
b.lo 1b
|
||||
dsb ish
|
||||
isb
|
||||
invalidate_icache_by_line x0, x1, x2, x3, 9f
|
||||
mov x0, #0
|
||||
1:
|
||||
uaccess_ttbr0_disable x1
|
||||
uaccess_ttbr0_disable x1, x2
|
||||
ret
|
||||
9:
|
||||
mov x0, #-EFAULT
|
||||
@ -80,6 +71,27 @@ USER(9f, ic ivau, x4 ) // invalidate I line PoU
|
||||
ENDPROC(flush_icache_range)
|
||||
ENDPROC(__flush_cache_user_range)
|
||||
|
||||
/*
|
||||
* invalidate_icache_range(start,end)
|
||||
*
|
||||
* Ensure that the I cache is invalid within specified region.
|
||||
*
|
||||
* - start - virtual start address of region
|
||||
* - end - virtual end address of region
|
||||
*/
|
||||
ENTRY(invalidate_icache_range)
|
||||
uaccess_ttbr0_enable x2, x3, x4
|
||||
|
||||
invalidate_icache_by_line x0, x1, x2, x3, 2f
|
||||
mov x0, xzr
|
||||
1:
|
||||
uaccess_ttbr0_disable x1, x2
|
||||
ret
|
||||
2:
|
||||
mov x0, #-EFAULT
|
||||
b 1b
|
||||
ENDPROC(invalidate_icache_range)
|
||||
|
||||
/*
|
||||
* __flush_dcache_area(kaddr, size)
|
||||
*
|
||||
|
@ -101,12 +101,12 @@ ENTRY(privcmd_call)
|
||||
* need the explicit uaccess_enable/disable if the TTBR0 PAN emulation
|
||||
* is enabled (it implies that hardware UAO and PAN disabled).
|
||||
*/
|
||||
uaccess_ttbr0_enable x6, x7
|
||||
uaccess_ttbr0_enable x6, x7, x8
|
||||
hvc XEN_IMM
|
||||
|
||||
/*
|
||||
* Disable userspace access from kernel once the hyp call completed.
|
||||
*/
|
||||
uaccess_ttbr0_disable x6
|
||||
uaccess_ttbr0_disable x6, x7
|
||||
ret
|
||||
ENDPROC(privcmd_call);
|
||||
|
@ -90,6 +90,8 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu);
|
||||
|
||||
void kvm_timer_init_vhe(void);
|
||||
|
||||
bool kvm_arch_timer_get_input_level(int vintid);
|
||||
|
||||
#define vcpu_vtimer(v) (&(v)->arch.timer_cpu.vtimer)
|
||||
#define vcpu_ptimer(v) (&(v)->arch.timer_cpu.ptimer)
|
||||
|
||||
|
@ -130,6 +130,17 @@ struct vgic_irq {
|
||||
u8 priority;
|
||||
enum vgic_irq_config config; /* Level or edge */
|
||||
|
||||
/*
|
||||
* Callback function pointer to in-kernel devices that can tell us the
|
||||
* state of the input level of mapped level-triggered IRQ faster than
|
||||
* peaking into the physical GIC.
|
||||
*
|
||||
* Always called in non-preemptible section and the functions can use
|
||||
* kvm_arm_get_running_vcpu() to get the vcpu pointer for private
|
||||
* IRQs.
|
||||
*/
|
||||
bool (*get_input_level)(int vintid);
|
||||
|
||||
void *owner; /* Opaque pointer to reserve an interrupt
|
||||
for in-kernel devices. */
|
||||
};
|
||||
@ -331,7 +342,7 @@ void kvm_vgic_init_cpu_hardware(void);
|
||||
int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
|
||||
bool level, void *owner);
|
||||
int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
|
||||
u32 vintid);
|
||||
u32 vintid, bool (*get_input_level)(int vindid));
|
||||
int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid);
|
||||
bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid);
|
||||
|
||||
|
@ -97,15 +97,13 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
|
||||
pr_warn_once("Spurious arch timer IRQ on non-VCPU thread\n");
|
||||
return IRQ_NONE;
|
||||
}
|
||||
|
||||
vtimer = vcpu_vtimer(vcpu);
|
||||
if (kvm_timer_should_fire(vtimer))
|
||||
kvm_timer_update_irq(vcpu, true, vtimer);
|
||||
|
||||
if (!vtimer->irq.level) {
|
||||
vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
|
||||
if (kvm_timer_irq_can_fire(vtimer))
|
||||
kvm_timer_update_irq(vcpu, true, vtimer);
|
||||
}
|
||||
|
||||
if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
|
||||
if (static_branch_unlikely(&userspace_irqchip_in_use) &&
|
||||
unlikely(!irqchip_in_kernel(vcpu->kvm)))
|
||||
kvm_vtimer_update_mask_user(vcpu);
|
||||
|
||||
return IRQ_HANDLED;
|
||||
@ -231,6 +229,16 @@ static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
|
||||
{
|
||||
u64 cval, now;
|
||||
|
||||
if (timer_ctx->loaded) {
|
||||
u32 cnt_ctl;
|
||||
|
||||
/* Only the virtual timer can be loaded so far */
|
||||
cnt_ctl = read_sysreg_el0(cntv_ctl);
|
||||
return (cnt_ctl & ARCH_TIMER_CTRL_ENABLE) &&
|
||||
(cnt_ctl & ARCH_TIMER_CTRL_IT_STAT) &&
|
||||
!(cnt_ctl & ARCH_TIMER_CTRL_IT_MASK);
|
||||
}
|
||||
|
||||
if (!kvm_timer_irq_can_fire(timer_ctx))
|
||||
return false;
|
||||
|
||||
@ -245,15 +253,7 @@ bool kvm_timer_is_pending(struct kvm_vcpu *vcpu)
|
||||
struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
|
||||
struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
|
||||
|
||||
if (vtimer->irq.level || ptimer->irq.level)
|
||||
return true;
|
||||
|
||||
/*
|
||||
* When this is called from withing the wait loop of kvm_vcpu_block(),
|
||||
* the software view of the timer state is up to date (timer->loaded
|
||||
* is false), and so we can simply check if the timer should fire now.
|
||||
*/
|
||||
if (!vtimer->loaded && kvm_timer_should_fire(vtimer))
|
||||
if (kvm_timer_should_fire(vtimer))
|
||||
return true;
|
||||
|
||||
return kvm_timer_should_fire(ptimer);
|
||||
@ -271,9 +271,9 @@ void kvm_timer_update_run(struct kvm_vcpu *vcpu)
|
||||
/* Populate the device bitmap with the timer states */
|
||||
regs->device_irq_level &= ~(KVM_ARM_DEV_EL1_VTIMER |
|
||||
KVM_ARM_DEV_EL1_PTIMER);
|
||||
if (vtimer->irq.level)
|
||||
if (kvm_timer_should_fire(vtimer))
|
||||
regs->device_irq_level |= KVM_ARM_DEV_EL1_VTIMER;
|
||||
if (ptimer->irq.level)
|
||||
if (kvm_timer_should_fire(ptimer))
|
||||
regs->device_irq_level |= KVM_ARM_DEV_EL1_PTIMER;
|
||||
}
|
||||
|
||||
@ -286,7 +286,8 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
|
||||
trace_kvm_timer_update_irq(vcpu->vcpu_id, timer_ctx->irq.irq,
|
||||
timer_ctx->irq.level);
|
||||
|
||||
if (likely(irqchip_in_kernel(vcpu->kvm))) {
|
||||
if (!static_branch_unlikely(&userspace_irqchip_in_use) ||
|
||||
likely(irqchip_in_kernel(vcpu->kvm))) {
|
||||
ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
|
||||
timer_ctx->irq.irq,
|
||||
timer_ctx->irq.level,
|
||||
@ -324,12 +325,20 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
|
||||
struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
|
||||
struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
|
||||
struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
|
||||
bool level;
|
||||
|
||||
if (unlikely(!timer->enabled))
|
||||
return;
|
||||
|
||||
if (kvm_timer_should_fire(vtimer) != vtimer->irq.level)
|
||||
kvm_timer_update_irq(vcpu, !vtimer->irq.level, vtimer);
|
||||
/*
|
||||
* The vtimer virtual interrupt is a 'mapped' interrupt, meaning part
|
||||
* of its lifecycle is offloaded to the hardware, and we therefore may
|
||||
* not have lowered the irq.level value before having to signal a new
|
||||
* interrupt, but have to signal an interrupt every time the level is
|
||||
* asserted.
|
||||
*/
|
||||
level = kvm_timer_should_fire(vtimer);
|
||||
kvm_timer_update_irq(vcpu, level, vtimer);
|
||||
|
||||
if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
|
||||
kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
|
||||
@ -337,6 +346,12 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
|
||||
phys_timer_emulate(vcpu);
|
||||
}
|
||||
|
||||
static void __timer_snapshot_state(struct arch_timer_context *timer)
|
||||
{
|
||||
timer->cnt_ctl = read_sysreg_el0(cntv_ctl);
|
||||
timer->cnt_cval = read_sysreg_el0(cntv_cval);
|
||||
}
|
||||
|
||||
static void vtimer_save_state(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
|
||||
@ -348,10 +363,8 @@ static void vtimer_save_state(struct kvm_vcpu *vcpu)
|
||||
if (!vtimer->loaded)
|
||||
goto out;
|
||||
|
||||
if (timer->enabled) {
|
||||
vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
|
||||
vtimer->cnt_cval = read_sysreg_el0(cntv_cval);
|
||||
}
|
||||
if (timer->enabled)
|
||||
__timer_snapshot_state(vtimer);
|
||||
|
||||
/* Disable the virtual timer */
|
||||
write_sysreg_el0(0, cntv_ctl);
|
||||
@ -448,8 +461,7 @@ static void kvm_timer_vcpu_load_vgic(struct kvm_vcpu *vcpu)
|
||||
bool phys_active;
|
||||
int ret;
|
||||
|
||||
phys_active = vtimer->irq.level ||
|
||||
kvm_vgic_map_is_active(vcpu, vtimer->irq.irq);
|
||||
phys_active = kvm_vgic_map_is_active(vcpu, vtimer->irq.irq);
|
||||
|
||||
ret = irq_set_irqchip_state(host_vtimer_irq,
|
||||
IRQCHIP_STATE_ACTIVE,
|
||||
@ -496,8 +508,8 @@ bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
|
||||
vlevel = sregs->device_irq_level & KVM_ARM_DEV_EL1_VTIMER;
|
||||
plevel = sregs->device_irq_level & KVM_ARM_DEV_EL1_PTIMER;
|
||||
|
||||
return vtimer->irq.level != vlevel ||
|
||||
ptimer->irq.level != plevel;
|
||||
return kvm_timer_should_fire(vtimer) != vlevel ||
|
||||
kvm_timer_should_fire(ptimer) != plevel;
|
||||
}
|
||||
|
||||
void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
|
||||
@ -529,54 +541,27 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
|
||||
set_cntvoff(0);
|
||||
}
|
||||
|
||||
static void unmask_vtimer_irq(struct kvm_vcpu *vcpu)
|
||||
/*
|
||||
* With a userspace irqchip we have to check if the guest de-asserted the
|
||||
* timer and if so, unmask the timer irq signal on the host interrupt
|
||||
* controller to ensure that we see future timer signals.
|
||||
*/
|
||||
static void unmask_vtimer_irq_user(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
|
||||
|
||||
if (unlikely(!irqchip_in_kernel(vcpu->kvm))) {
|
||||
kvm_vtimer_update_mask_user(vcpu);
|
||||
return;
|
||||
}
|
||||
|
||||
/*
|
||||
* If the guest disabled the timer without acking the interrupt, then
|
||||
* we must make sure the physical and virtual active states are in
|
||||
* sync by deactivating the physical interrupt, because otherwise we
|
||||
* wouldn't see the next timer interrupt in the host.
|
||||
*/
|
||||
if (!kvm_vgic_map_is_active(vcpu, vtimer->irq.irq)) {
|
||||
int ret;
|
||||
ret = irq_set_irqchip_state(host_vtimer_irq,
|
||||
IRQCHIP_STATE_ACTIVE,
|
||||
false);
|
||||
WARN_ON(ret);
|
||||
__timer_snapshot_state(vtimer);
|
||||
if (!kvm_timer_should_fire(vtimer)) {
|
||||
kvm_timer_update_irq(vcpu, false, vtimer);
|
||||
kvm_vtimer_update_mask_user(vcpu);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* kvm_timer_sync_hwstate - sync timer state from cpu
|
||||
* @vcpu: The vcpu pointer
|
||||
*
|
||||
* Check if any of the timers have expired while we were running in the guest,
|
||||
* and inject an interrupt if that was the case.
|
||||
*/
|
||||
void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
|
||||
|
||||
/*
|
||||
* If we entered the guest with the vtimer output asserted we have to
|
||||
* check if the guest has modified the timer so that we should lower
|
||||
* the line at this point.
|
||||
*/
|
||||
if (vtimer->irq.level) {
|
||||
vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
|
||||
vtimer->cnt_cval = read_sysreg_el0(cntv_cval);
|
||||
if (!kvm_timer_should_fire(vtimer)) {
|
||||
kvm_timer_update_irq(vcpu, false, vtimer);
|
||||
unmask_vtimer_irq(vcpu);
|
||||
}
|
||||
}
|
||||
unmask_vtimer_irq_user(vcpu);
|
||||
}
|
||||
|
||||
int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
|
||||
@ -807,6 +792,19 @@ static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu)
|
||||
return true;
|
||||
}
|
||||
|
||||
bool kvm_arch_timer_get_input_level(int vintid)
|
||||
{
|
||||
struct kvm_vcpu *vcpu = kvm_arm_get_running_vcpu();
|
||||
struct arch_timer_context *timer;
|
||||
|
||||
if (vintid == vcpu_vtimer(vcpu)->irq.irq)
|
||||
timer = vcpu_vtimer(vcpu);
|
||||
else
|
||||
BUG(); /* We only map the vtimer so far */
|
||||
|
||||
return kvm_timer_should_fire(timer);
|
||||
}
|
||||
|
||||
int kvm_timer_enable(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
|
||||
@ -828,7 +826,8 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
ret = kvm_vgic_map_phys_irq(vcpu, host_vtimer_irq, vtimer->irq.irq);
|
||||
ret = kvm_vgic_map_phys_irq(vcpu, host_vtimer_irq, vtimer->irq.irq,
|
||||
kvm_arch_timer_get_input_level);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
|
@ -71,17 +71,17 @@ static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);
|
||||
|
||||
static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
BUG_ON(preemptible());
|
||||
__this_cpu_write(kvm_arm_running_vcpu, vcpu);
|
||||
}
|
||||
|
||||
DEFINE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
|
||||
|
||||
/**
|
||||
* kvm_arm_get_running_vcpu - get the vcpu running on the current CPU.
|
||||
* Must be called from non-preemptible context
|
||||
*/
|
||||
struct kvm_vcpu *kvm_arm_get_running_vcpu(void)
|
||||
{
|
||||
BUG_ON(preemptible());
|
||||
return __this_cpu_read(kvm_arm_running_vcpu);
|
||||
}
|
||||
|
||||
@ -295,6 +295,9 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
|
||||
|
||||
void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
if (vcpu->arch.has_run_once && unlikely(!irqchip_in_kernel(vcpu->kvm)))
|
||||
static_branch_dec(&userspace_irqchip_in_use);
|
||||
|
||||
kvm_mmu_free_memory_caches(vcpu);
|
||||
kvm_timer_vcpu_terminate(vcpu);
|
||||
kvm_pmu_vcpu_destroy(vcpu);
|
||||
@ -532,14 +535,22 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
|
||||
|
||||
vcpu->arch.has_run_once = true;
|
||||
|
||||
/*
|
||||
* Map the VGIC hardware resources before running a vcpu the first
|
||||
* time on this VM.
|
||||
*/
|
||||
if (unlikely(irqchip_in_kernel(kvm) && !vgic_ready(kvm))) {
|
||||
ret = kvm_vgic_map_resources(kvm);
|
||||
if (ret)
|
||||
return ret;
|
||||
if (likely(irqchip_in_kernel(kvm))) {
|
||||
/*
|
||||
* Map the VGIC hardware resources before running a vcpu the
|
||||
* first time on this VM.
|
||||
*/
|
||||
if (unlikely(!vgic_ready(kvm))) {
|
||||
ret = kvm_vgic_map_resources(kvm);
|
||||
if (ret)
|
||||
return ret;
|
||||
}
|
||||
} else {
|
||||
/*
|
||||
* Tell the rest of the code that there are userspace irqchip
|
||||
* VMs in the wild.
|
||||
*/
|
||||
static_branch_inc(&userspace_irqchip_in_use);
|
||||
}
|
||||
|
||||
ret = kvm_timer_enable(vcpu);
|
||||
@ -680,18 +691,29 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
|
||||
kvm_vgic_flush_hwstate(vcpu);
|
||||
|
||||
/*
|
||||
* If we have a singal pending, or need to notify a userspace
|
||||
* irqchip about timer or PMU level changes, then we exit (and
|
||||
* update the timer level state in kvm_timer_update_run
|
||||
* below).
|
||||
* Exit if we have a signal pending so that we can deliver the
|
||||
* signal to user space.
|
||||
*/
|
||||
if (signal_pending(current) ||
|
||||
kvm_timer_should_notify_user(vcpu) ||
|
||||
kvm_pmu_should_notify_user(vcpu)) {
|
||||
if (signal_pending(current)) {
|
||||
ret = -EINTR;
|
||||
run->exit_reason = KVM_EXIT_INTR;
|
||||
}
|
||||
|
||||
/*
|
||||
* If we're using a userspace irqchip, then check if we need
|
||||
* to tell a userspace irqchip about timer or PMU level
|
||||
* changes and if so, exit to userspace (the actual level
|
||||
* state gets updated in kvm_timer_update_run and
|
||||
* kvm_pmu_update_run below).
|
||||
*/
|
||||
if (static_branch_unlikely(&userspace_irqchip_in_use)) {
|
||||
if (kvm_timer_should_notify_user(vcpu) ||
|
||||
kvm_pmu_should_notify_user(vcpu)) {
|
||||
ret = -EINTR;
|
||||
run->exit_reason = KVM_EXIT_INTR;
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Ensure we set mode to IN_GUEST_MODE after we disable
|
||||
* interrupts and before the final VCPU requests check.
|
||||
@ -704,7 +726,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
|
||||
kvm_request_pending(vcpu)) {
|
||||
vcpu->mode = OUTSIDE_GUEST_MODE;
|
||||
kvm_pmu_sync_hwstate(vcpu);
|
||||
kvm_timer_sync_hwstate(vcpu);
|
||||
if (static_branch_unlikely(&userspace_irqchip_in_use))
|
||||
kvm_timer_sync_hwstate(vcpu);
|
||||
kvm_vgic_sync_hwstate(vcpu);
|
||||
local_irq_enable();
|
||||
preempt_enable();
|
||||
@ -748,7 +771,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
|
||||
* we don't want vtimer interrupts to race with syncing the
|
||||
* timer virtual interrupt state.
|
||||
*/
|
||||
kvm_timer_sync_hwstate(vcpu);
|
||||
if (static_branch_unlikely(&userspace_irqchip_in_use))
|
||||
kvm_timer_sync_hwstate(vcpu);
|
||||
|
||||
/*
|
||||
* We may have taken a host interrupt in HYP mode (ie
|
||||
@ -1277,6 +1301,7 @@ static int hyp_init_cpu_pm_notifier(struct notifier_block *self,
|
||||
cpu_hyp_reset();
|
||||
|
||||
return NOTIFY_OK;
|
||||
case CPU_PM_ENTER_FAILED:
|
||||
case CPU_PM_EXIT:
|
||||
if (__this_cpu_read(kvm_arm_hardware_enabled))
|
||||
/* The hardware was enabled before suspend. */
|
||||
|
@ -21,6 +21,7 @@
|
||||
|
||||
#include <asm/kvm_emulate.h>
|
||||
#include <asm/kvm_hyp.h>
|
||||
#include <asm/kvm_mmu.h>
|
||||
|
||||
static void __hyp_text save_elrsr(struct kvm_vcpu *vcpu, void __iomem *base)
|
||||
{
|
||||
|
@ -926,6 +926,25 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
|
||||
return 0;
|
||||
}
|
||||
|
||||
static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
|
||||
{
|
||||
pmd_t *pmdp;
|
||||
pte_t *ptep;
|
||||
|
||||
pmdp = stage2_get_pmd(kvm, NULL, addr);
|
||||
if (!pmdp || pmd_none(*pmdp) || !pmd_present(*pmdp))
|
||||
return false;
|
||||
|
||||
if (pmd_thp_or_huge(*pmdp))
|
||||
return kvm_s2pmd_exec(pmdp);
|
||||
|
||||
ptep = pte_offset_kernel(pmdp, addr);
|
||||
if (!ptep || pte_none(*ptep) || !pte_present(*ptep))
|
||||
return false;
|
||||
|
||||
return kvm_s2pte_exec(ptep);
|
||||
}
|
||||
|
||||
static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
|
||||
phys_addr_t addr, const pte_t *new_pte,
|
||||
unsigned long flags)
|
||||
@ -1257,10 +1276,14 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
|
||||
kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
|
||||
}
|
||||
|
||||
static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
|
||||
unsigned long size)
|
||||
static void clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
|
||||
{
|
||||
__coherent_cache_guest_page(vcpu, pfn, size);
|
||||
__clean_dcache_guest_page(pfn, size);
|
||||
}
|
||||
|
||||
static void invalidate_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
|
||||
{
|
||||
__invalidate_icache_guest_page(pfn, size);
|
||||
}
|
||||
|
||||
static void kvm_send_hwpoison_signal(unsigned long address,
|
||||
@ -1286,7 +1309,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
|
||||
unsigned long fault_status)
|
||||
{
|
||||
int ret;
|
||||
bool write_fault, writable, hugetlb = false, force_pte = false;
|
||||
bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false;
|
||||
unsigned long mmu_seq;
|
||||
gfn_t gfn = fault_ipa >> PAGE_SHIFT;
|
||||
struct kvm *kvm = vcpu->kvm;
|
||||
@ -1298,7 +1321,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
|
||||
unsigned long flags = 0;
|
||||
|
||||
write_fault = kvm_is_write_fault(vcpu);
|
||||
if (fault_status == FSC_PERM && !write_fault) {
|
||||
exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
|
||||
VM_BUG_ON(write_fault && exec_fault);
|
||||
|
||||
if (fault_status == FSC_PERM && !write_fault && !exec_fault) {
|
||||
kvm_err("Unexpected L2 read permission error\n");
|
||||
return -EFAULT;
|
||||
}
|
||||
@ -1391,7 +1417,19 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
|
||||
new_pmd = kvm_s2pmd_mkwrite(new_pmd);
|
||||
kvm_set_pfn_dirty(pfn);
|
||||
}
|
||||
coherent_cache_guest_page(vcpu, pfn, PMD_SIZE);
|
||||
|
||||
if (fault_status != FSC_PERM)
|
||||
clean_dcache_guest_page(pfn, PMD_SIZE);
|
||||
|
||||
if (exec_fault) {
|
||||
new_pmd = kvm_s2pmd_mkexec(new_pmd);
|
||||
invalidate_icache_guest_page(pfn, PMD_SIZE);
|
||||
} else if (fault_status == FSC_PERM) {
|
||||
/* Preserve execute if XN was already cleared */
|
||||
if (stage2_is_exec(kvm, fault_ipa))
|
||||
new_pmd = kvm_s2pmd_mkexec(new_pmd);
|
||||
}
|
||||
|
||||
ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
|
||||
} else {
|
||||
pte_t new_pte = pfn_pte(pfn, mem_type);
|
||||
@ -1401,7 +1439,19 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
|
||||
kvm_set_pfn_dirty(pfn);
|
||||
mark_page_dirty(kvm, gfn);
|
||||
}
|
||||
coherent_cache_guest_page(vcpu, pfn, PAGE_SIZE);
|
||||
|
||||
if (fault_status != FSC_PERM)
|
||||
clean_dcache_guest_page(pfn, PAGE_SIZE);
|
||||
|
||||
if (exec_fault) {
|
||||
new_pte = kvm_s2pte_mkexec(new_pte);
|
||||
invalidate_icache_guest_page(pfn, PAGE_SIZE);
|
||||
} else if (fault_status == FSC_PERM) {
|
||||
/* Preserve execute if XN was already cleared */
|
||||
if (stage2_is_exec(kvm, fault_ipa))
|
||||
new_pte = kvm_s2pte_mkexec(new_pte);
|
||||
}
|
||||
|
||||
ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
|
||||
}
|
||||
|
||||
|
@ -1034,10 +1034,8 @@ static int vgic_its_cmd_handle_mapd(struct kvm *kvm, struct vgic_its *its,
|
||||
|
||||
device = vgic_its_alloc_device(its, device_id, itt_addr,
|
||||
num_eventid_bits);
|
||||
if (IS_ERR(device))
|
||||
return PTR_ERR(device);
|
||||
|
||||
return 0;
|
||||
return PTR_ERR_OR_ZERO(device);
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -16,6 +16,7 @@
|
||||
#include <linux/kvm.h>
|
||||
#include <linux/kvm_host.h>
|
||||
#include <kvm/iodev.h>
|
||||
#include <kvm/arm_arch_timer.h>
|
||||
#include <kvm/arm_vgic.h>
|
||||
|
||||
#include "vgic.h"
|
||||
@ -122,10 +123,43 @@ unsigned long vgic_mmio_read_pending(struct kvm_vcpu *vcpu,
|
||||
return value;
|
||||
}
|
||||
|
||||
/*
|
||||
* This function will return the VCPU that performed the MMIO access and
|
||||
* trapped from within the VM, and will return NULL if this is a userspace
|
||||
* access.
|
||||
*
|
||||
* We can disable preemption locally around accessing the per-CPU variable,
|
||||
* and use the resolved vcpu pointer after enabling preemption again, because
|
||||
* even if the current thread is migrated to another CPU, reading the per-CPU
|
||||
* value later will give us the same value as we update the per-CPU variable
|
||||
* in the preempt notifier handlers.
|
||||
*/
|
||||
static struct kvm_vcpu *vgic_get_mmio_requester_vcpu(void)
|
||||
{
|
||||
struct kvm_vcpu *vcpu;
|
||||
|
||||
preempt_disable();
|
||||
vcpu = kvm_arm_get_running_vcpu();
|
||||
preempt_enable();
|
||||
return vcpu;
|
||||
}
|
||||
|
||||
/* Must be called with irq->irq_lock held */
|
||||
static void vgic_hw_irq_spending(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
|
||||
bool is_uaccess)
|
||||
{
|
||||
if (is_uaccess)
|
||||
return;
|
||||
|
||||
irq->pending_latch = true;
|
||||
vgic_irq_set_phys_active(irq, true);
|
||||
}
|
||||
|
||||
void vgic_mmio_write_spending(struct kvm_vcpu *vcpu,
|
||||
gpa_t addr, unsigned int len,
|
||||
unsigned long val)
|
||||
{
|
||||
bool is_uaccess = !vgic_get_mmio_requester_vcpu();
|
||||
u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
|
||||
int i;
|
||||
unsigned long flags;
|
||||
@ -134,17 +168,45 @@ void vgic_mmio_write_spending(struct kvm_vcpu *vcpu,
|
||||
struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
|
||||
|
||||
spin_lock_irqsave(&irq->irq_lock, flags);
|
||||
irq->pending_latch = true;
|
||||
|
||||
if (irq->hw)
|
||||
vgic_hw_irq_spending(vcpu, irq, is_uaccess);
|
||||
else
|
||||
irq->pending_latch = true;
|
||||
vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
|
||||
vgic_put_irq(vcpu->kvm, irq);
|
||||
}
|
||||
}
|
||||
|
||||
/* Must be called with irq->irq_lock held */
|
||||
static void vgic_hw_irq_cpending(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
|
||||
bool is_uaccess)
|
||||
{
|
||||
if (is_uaccess)
|
||||
return;
|
||||
|
||||
irq->pending_latch = false;
|
||||
|
||||
/*
|
||||
* We don't want the guest to effectively mask the physical
|
||||
* interrupt by doing a write to SPENDR followed by a write to
|
||||
* CPENDR for HW interrupts, so we clear the active state on
|
||||
* the physical side if the virtual interrupt is not active.
|
||||
* This may lead to taking an additional interrupt on the
|
||||
* host, but that should not be a problem as the worst that
|
||||
* can happen is an additional vgic injection. We also clear
|
||||
* the pending state to maintain proper semantics for edge HW
|
||||
* interrupts.
|
||||
*/
|
||||
vgic_irq_set_phys_pending(irq, false);
|
||||
if (!irq->active)
|
||||
vgic_irq_set_phys_active(irq, false);
|
||||
}
|
||||
|
||||
void vgic_mmio_write_cpending(struct kvm_vcpu *vcpu,
|
||||
gpa_t addr, unsigned int len,
|
||||
unsigned long val)
|
||||
{
|
||||
bool is_uaccess = !vgic_get_mmio_requester_vcpu();
|
||||
u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
|
||||
int i;
|
||||
unsigned long flags;
|
||||
@ -154,7 +216,10 @@ void vgic_mmio_write_cpending(struct kvm_vcpu *vcpu,
|
||||
|
||||
spin_lock_irqsave(&irq->irq_lock, flags);
|
||||
|
||||
irq->pending_latch = false;
|
||||
if (irq->hw)
|
||||
vgic_hw_irq_cpending(vcpu, irq, is_uaccess);
|
||||
else
|
||||
irq->pending_latch = false;
|
||||
|
||||
spin_unlock_irqrestore(&irq->irq_lock, flags);
|
||||
vgic_put_irq(vcpu->kvm, irq);
|
||||
@ -181,27 +246,24 @@ unsigned long vgic_mmio_read_active(struct kvm_vcpu *vcpu,
|
||||
return value;
|
||||
}
|
||||
|
||||
static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
|
||||
bool new_active_state)
|
||||
/* Must be called with irq->irq_lock held */
|
||||
static void vgic_hw_irq_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
|
||||
bool active, bool is_uaccess)
|
||||
{
|
||||
struct kvm_vcpu *requester_vcpu;
|
||||
unsigned long flags;
|
||||
spin_lock_irqsave(&irq->irq_lock, flags);
|
||||
if (is_uaccess)
|
||||
return;
|
||||
|
||||
/*
|
||||
* The vcpu parameter here can mean multiple things depending on how
|
||||
* this function is called; when handling a trap from the kernel it
|
||||
* depends on the GIC version, and these functions are also called as
|
||||
* part of save/restore from userspace.
|
||||
*
|
||||
* Therefore, we have to figure out the requester in a reliable way.
|
||||
*
|
||||
* When accessing VGIC state from user space, the requester_vcpu is
|
||||
* NULL, which is fine, because we guarantee that no VCPUs are running
|
||||
* when accessing VGIC state from user space so irq->vcpu->cpu is
|
||||
* always -1.
|
||||
*/
|
||||
requester_vcpu = kvm_arm_get_running_vcpu();
|
||||
irq->active = active;
|
||||
vgic_irq_set_phys_active(irq, active);
|
||||
}
|
||||
|
||||
static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
|
||||
bool active)
|
||||
{
|
||||
unsigned long flags;
|
||||
struct kvm_vcpu *requester_vcpu = vgic_get_mmio_requester_vcpu();
|
||||
|
||||
spin_lock_irqsave(&irq->irq_lock, flags);
|
||||
|
||||
/*
|
||||
* If this virtual IRQ was written into a list register, we
|
||||
@ -213,14 +275,23 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
|
||||
* vgic_change_active_prepare) and still has to sync back this IRQ,
|
||||
* so we release and re-acquire the spin_lock to let the other thread
|
||||
* sync back the IRQ.
|
||||
*
|
||||
* When accessing VGIC state from user space, requester_vcpu is
|
||||
* NULL, which is fine, because we guarantee that no VCPUs are running
|
||||
* when accessing VGIC state from user space so irq->vcpu->cpu is
|
||||
* always -1.
|
||||
*/
|
||||
while (irq->vcpu && /* IRQ may have state in an LR somewhere */
|
||||
irq->vcpu != requester_vcpu && /* Current thread is not the VCPU thread */
|
||||
irq->vcpu->cpu != -1) /* VCPU thread is running */
|
||||
cond_resched_lock(&irq->irq_lock);
|
||||
|
||||
irq->active = new_active_state;
|
||||
if (new_active_state)
|
||||
if (irq->hw)
|
||||
vgic_hw_irq_change_active(vcpu, irq, active, !requester_vcpu);
|
||||
else
|
||||
irq->active = active;
|
||||
|
||||
if (irq->active)
|
||||
vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
|
||||
else
|
||||
spin_unlock_irqrestore(&irq->irq_lock, flags);
|
||||
|
@ -105,6 +105,26 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
|
||||
irq->pending_latch = false;
|
||||
}
|
||||
|
||||
/*
|
||||
* Level-triggered mapped IRQs are special because we only
|
||||
* observe rising edges as input to the VGIC.
|
||||
*
|
||||
* If the guest never acked the interrupt we have to sample
|
||||
* the physical line and set the line level, because the
|
||||
* device state could have changed or we simply need to
|
||||
* process the still pending interrupt later.
|
||||
*
|
||||
* If this causes us to lower the level, we have to also clear
|
||||
* the physical active state, since we will otherwise never be
|
||||
* told when the interrupt becomes asserted again.
|
||||
*/
|
||||
if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT)) {
|
||||
irq->line_level = vgic_get_phys_line_level(irq);
|
||||
|
||||
if (!irq->line_level)
|
||||
vgic_irq_set_phys_active(irq, false);
|
||||
}
|
||||
|
||||
spin_unlock_irqrestore(&irq->irq_lock, flags);
|
||||
vgic_put_irq(vcpu->kvm, irq);
|
||||
}
|
||||
@ -162,6 +182,15 @@ void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr)
|
||||
val |= GICH_LR_EOI;
|
||||
}
|
||||
|
||||
/*
|
||||
* Level-triggered mapped IRQs are special because we only observe
|
||||
* rising edges as input to the VGIC. We therefore lower the line
|
||||
* level here, so that we can take new virtual IRQs. See
|
||||
* vgic_v2_fold_lr_state for more info.
|
||||
*/
|
||||
if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT))
|
||||
irq->line_level = false;
|
||||
|
||||
/* The GICv2 LR only holds five bits of priority. */
|
||||
val |= (irq->priority >> 3) << GICH_LR_PRIORITY_SHIFT;
|
||||
|
||||
|
@ -96,6 +96,26 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
|
||||
irq->pending_latch = false;
|
||||
}
|
||||
|
||||
/*
|
||||
* Level-triggered mapped IRQs are special because we only
|
||||
* observe rising edges as input to the VGIC.
|
||||
*
|
||||
* If the guest never acked the interrupt we have to sample
|
||||
* the physical line and set the line level, because the
|
||||
* device state could have changed or we simply need to
|
||||
* process the still pending interrupt later.
|
||||
*
|
||||
* If this causes us to lower the level, we have to also clear
|
||||
* the physical active state, since we will otherwise never be
|
||||
* told when the interrupt becomes asserted again.
|
||||
*/
|
||||
if (vgic_irq_is_mapped_level(irq) && (val & ICH_LR_PENDING_BIT)) {
|
||||
irq->line_level = vgic_get_phys_line_level(irq);
|
||||
|
||||
if (!irq->line_level)
|
||||
vgic_irq_set_phys_active(irq, false);
|
||||
}
|
||||
|
||||
spin_unlock_irqrestore(&irq->irq_lock, flags);
|
||||
vgic_put_irq(vcpu->kvm, irq);
|
||||
}
|
||||
@ -145,6 +165,15 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr)
|
||||
val |= ICH_LR_EOI;
|
||||
}
|
||||
|
||||
/*
|
||||
* Level-triggered mapped IRQs are special because we only observe
|
||||
* rising edges as input to the VGIC. We therefore lower the line
|
||||
* level here, so that we can take new virtual IRQs. See
|
||||
* vgic_v3_fold_lr_state for more info.
|
||||
*/
|
||||
if (vgic_irq_is_mapped_level(irq) && (val & ICH_LR_PENDING_BIT))
|
||||
irq->line_level = false;
|
||||
|
||||
/*
|
||||
* We currently only support Group1 interrupts, which is a
|
||||
* known defect. This needs to be addressed at some point.
|
||||
|
@ -144,6 +144,38 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
|
||||
kfree(irq);
|
||||
}
|
||||
|
||||
void vgic_irq_set_phys_pending(struct vgic_irq *irq, bool pending)
|
||||
{
|
||||
WARN_ON(irq_set_irqchip_state(irq->host_irq,
|
||||
IRQCHIP_STATE_PENDING,
|
||||
pending));
|
||||
}
|
||||
|
||||
bool vgic_get_phys_line_level(struct vgic_irq *irq)
|
||||
{
|
||||
bool line_level;
|
||||
|
||||
BUG_ON(!irq->hw);
|
||||
|
||||
if (irq->get_input_level)
|
||||
return irq->get_input_level(irq->intid);
|
||||
|
||||
WARN_ON(irq_get_irqchip_state(irq->host_irq,
|
||||
IRQCHIP_STATE_PENDING,
|
||||
&line_level));
|
||||
return line_level;
|
||||
}
|
||||
|
||||
/* Set/Clear the physical active state */
|
||||
void vgic_irq_set_phys_active(struct vgic_irq *irq, bool active)
|
||||
{
|
||||
|
||||
BUG_ON(!irq->hw);
|
||||
WARN_ON(irq_set_irqchip_state(irq->host_irq,
|
||||
IRQCHIP_STATE_ACTIVE,
|
||||
active));
|
||||
}
|
||||
|
||||
/**
|
||||
* kvm_vgic_target_oracle - compute the target vcpu for an irq
|
||||
*
|
||||
@ -413,7 +445,8 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
|
||||
|
||||
/* @irq->irq_lock must be held */
|
||||
static int kvm_vgic_map_irq(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
|
||||
unsigned int host_irq)
|
||||
unsigned int host_irq,
|
||||
bool (*get_input_level)(int vindid))
|
||||
{
|
||||
struct irq_desc *desc;
|
||||
struct irq_data *data;
|
||||
@ -433,6 +466,7 @@ static int kvm_vgic_map_irq(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
|
||||
irq->hw = true;
|
||||
irq->host_irq = host_irq;
|
||||
irq->hwintid = data->hwirq;
|
||||
irq->get_input_level = get_input_level;
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -441,10 +475,11 @@ static inline void kvm_vgic_unmap_irq(struct vgic_irq *irq)
|
||||
{
|
||||
irq->hw = false;
|
||||
irq->hwintid = 0;
|
||||
irq->get_input_level = NULL;
|
||||
}
|
||||
|
||||
int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
|
||||
u32 vintid)
|
||||
u32 vintid, bool (*get_input_level)(int vindid))
|
||||
{
|
||||
struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, vintid);
|
||||
unsigned long flags;
|
||||
@ -453,7 +488,7 @@ int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
|
||||
BUG_ON(!irq);
|
||||
|
||||
spin_lock_irqsave(&irq->irq_lock, flags);
|
||||
ret = kvm_vgic_map_irq(vcpu, irq, host_irq);
|
||||
ret = kvm_vgic_map_irq(vcpu, irq, host_irq, get_input_level);
|
||||
spin_unlock_irqrestore(&irq->irq_lock, flags);
|
||||
vgic_put_irq(vcpu->kvm, irq);
|
||||
|
||||
|
@ -104,6 +104,11 @@ static inline bool irq_is_pending(struct vgic_irq *irq)
|
||||
return irq->pending_latch || irq->line_level;
|
||||
}
|
||||
|
||||
static inline bool vgic_irq_is_mapped_level(struct vgic_irq *irq)
|
||||
{
|
||||
return irq->config == VGIC_CONFIG_LEVEL && irq->hw;
|
||||
}
|
||||
|
||||
/*
|
||||
* This struct provides an intermediate representation of the fields contained
|
||||
* in the GICH_VMCR and ICH_VMCR registers, such that code exporting the GIC
|
||||
@ -140,6 +145,9 @@ vgic_get_mmio_region(struct kvm_vcpu *vcpu, struct vgic_io_device *iodev,
|
||||
struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
|
||||
u32 intid);
|
||||
void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq);
|
||||
bool vgic_get_phys_line_level(struct vgic_irq *irq);
|
||||
void vgic_irq_set_phys_pending(struct vgic_irq *irq, bool pending);
|
||||
void vgic_irq_set_phys_active(struct vgic_irq *irq, bool active);
|
||||
bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq,
|
||||
unsigned long flags);
|
||||
void vgic_kick_vcpus(struct kvm *kvm);
|
||||
|
Loading…
Reference in New Issue
Block a user